Annotating gene function by combining expression data with a modular gene network

Size: px
Start display at page:

Download "Annotating gene function by combining expression data with a modular gene network"

Transcription

1 BIOINFORMATICS Vol. 23 ISMB/ECCB 2007, pages i468 i478 doi: /bioinformatics/btm173 Annotating gene function by combining expression data with a modular gene network Motoki Shiga, Ichigaku Takigawa and Hiroshi Mamitsuka* Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji , Japan ABSTRACT Motivation: A promising and reliable approach to annotate gene function is clustering genes not only by using gene expression data but also literature information, especially gene networks. Results: We present a systematic method for gene clustering by combining these totally different two types of data, particularly focusing on network modularity, a global feature of gene networks. Our method is based on learning a probabilistic model, which we call a hidden modular random field in which the relation between hidden variables directly represents a given gene network. Our learning algorithm which minimizes an energy function considering the network modularity is practically time-efficient, regardless of using the global network property. We evaluated our method by using a metabolic network and microarray expression data, changing with microarray datasets, parameters of our model and gold standard clusters. Experimental results showed that our method outperformed other four competing methods, including k-means and existing graph partitioning methods, being statistically significant in all cases. Further detailed analysis showed that our method could group a set of genes into a cluster which corresponds to the folate metabolic pathway while other methods could not. From these results, we can say that our method is highly effective for gene clustering and annotating gene function. Contact: shiga@kuicr.kyoto-u.ac.jp 1 INTRODUCTION Recent progress in genome sciences has led to the development of DNA microarray technology which allows to monitor the expression of thousands of genes simultaneously. Microarray technology by which we can see which genes are active in a given cell or tissue promises the ability to determine the function of each gene. A current popular approach to annotate gene function from gene expression data is clustering genes by expression values based on the assumption that genes with similar expression patterns can be clustered into a group with the same gene function. Existing approaches include typical clustering methods, such as hierarchical clustering (Eisen et al., 1998), k-means (Tavazoie et al., 1999) and self-organizing maps (Tamayo et al., 1999). However, microarray expression data is inevitably noisy, making the clustering result by the above methods unstable (Kerr and Churchill, 2001; Zhang and Zhao, 2000). A possible solution to overcome this problem is to generate many array replicates which are however *To whom correspondence should be addressed. demanding because of the high experimental cost of microarray experiments. A more promising direction in current bioinformatics research is to combine microarray expression data with the existing knowledge of gene annotation derived from literature. Typical examples include model-based clustering incorporating GO (Gene Ontology) annotation as priors of model parameters (Pan, 2006), k-means clustering using GO annotation dependent distances (Huang and Pan, 2006) and hierarchical clustering based on the distance defined by both gene expressions and the shortest path on a metabolic (chemical reaction) pathway (Hanisch et al., 2002), etc. This type of combination is of interest, since dynamic behavior of genes which would be observed from microarray data can be integrated with the literature-derived biological data which is obviously static information. Specifically, dynamic information of microarray expression would change the static gene clusters to fit the experimental condition of given microarray data. However, existing approaches are not methodologically sophisticated enough in combining the two data, i.e. realvalued expression data and literature-derived data, especially gene networks. In addition, the focus of current approaches is placed on the rather local information, such as neighboring genes, of gene networks, and incorporating global information of gene networks might find more appropriate gene clusters. In light of the above, we present a new method for clustering genes using both microarray expression data, i.e. the dynamic information of a cell, and a given gene network, focusing on network modularity, a global nature of biological networks (Ravasz et al., 2002). Our method is based on learning a probabilistic model which we call a hidden module random field that is well suited to combine the two different types of data. Our model has observable variables for microarray expression data and hidden variables for gene annotation (cluster labels). Concretely, each observable variable is dependent upon a hidden variable which can be dependent upon one or more hidden variables each other, as defined by a given biological network. Learning parameters of our model is based on the EM (Expectation-Maximization) algorithm that minimizes the objective function which considers the modularity of a given gene network. Briefly speaking, our method uses not only the simple gene relations which are directly obtained from literature but also more complex information which can be derived from the entire network structure. We further stress that regardless of focusing on the global nature of a given network, our proposed algorithm for estimating probability parameters of our model is time-efficient. ß 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

2 Clustering genes with expression data and network modularity Using a metabolic network and a wide variety of microarray datasets, we validated the performance of our method, comparing with other methods and analyzed the results obtained by our method. Clusters of genes are variable depending on the condition under which genes are placed, meaning that true cluster labels are unknown. However, to evaluate the performance of our method, we generated two different sets of gene clusters as standard data. Assuming that a good clustering result will be close to them if many microarray observations are used, we compared our method with four different clustering methods: k-means with microarray expression data and three graph partitioning methods using the structure of the metabolic network. We then found that our clustering method outperformed all other competing methods, being statistically significant in all cases. We further examined the clustering result, focusing on the genes in folate biosynthesis to confirm the significance of our method. These results proved that our method of combining microarray expression data with the modularity of a gene network is highly effective for clustering genes and annotating gene functions. 2 METHOD: PROBABILISTIC MODEL Clustering is to assign a gene function (or generally a cluster label) to each gene. Starting with a hidden random field, we will explain our model for clustering genes by combining microarray expression data with a global feature of gene networks. Each node in a gene network is labeled by a gene, and an undirected edge connecting two nodes represents some relation between the two genes labeling the two nodes. 2.1 Hidden random field A hidden random field is a probabilistic model with two types of variables: observable and hidden variables. Figure 1a shows a model in which an observable variable depends on only one hidden variable and a hidden variable depends on no other variables. This model corresponds to that of a typical clustering method like k-means in which genes are clustered with microarray expression data only, assuming that the expression pattern of a gene is independent of another. However, this assumption is not necessarily true. For example, the expression patterns of two genes located neighborhood on a metabolic network are in most cases strongly correlated with each other a Observable Hidden Fig. 1. Hidden random fields with (a) independent hidden variables and with (b) relation between hidden variables. b (Kharchenko et al., 2005). Figure 1b shows a hidden random field in which hidden variables can be dependent upon each other which would be more natural for clustering genes using both gene expressions and biological networks. In particular, we use a given gene network to represent the relation of hidden variables. Mathematically, these models can be formalized as follows: let Z ¼ðz 1 ; z 2,...; z N Þ be a set of hidden variables and VðZÞ be a potential function which defines the relation between genes. The probabilistic distribution over Z is given as the Gibbs distribution: PðZÞ ¼ 1 exp VðZÞ ; ð1þ CðÞ where CðÞ ¼ P Z exp VðZÞ and is a parameter (weight) for VðZÞ. We can further define the joint probability of a set of observable variables X ¼ðx 1 ; x 2,...; x N Þ and Z as follows: PðX; ZÞ ¼PðXjZÞPðZÞ ¼ YN n¼1 pðx n jz n ÞPðZÞ Practically N is the number of genes. Given a microarray dataset, the size of x n is the number of observations (experiments), and each value taken in x n is an observable expression value. K is the number of clusters. A hidden variable z n takes a cluster label out of 1 to K. The conditional probability pðx n jz n Þ is assumed to be a normal distribution or the von Mises Fisher distribution (Mardia and Jupp, 2000). The probability of X is further given as follows: PðXÞ ¼ X PðX; ZÞ Z The model of Figure 1b has dependency relation between hidden variables as a network with nodes corresponding to genes and edges connecting nodes. The structure information of this network can be described in PðZÞ of Equation (2). 2.2 Hidden Markov random field (HMaF) A typical hidden random field is a hidden Markov random field, which we call HMaF in this article. 1 We explain this model briefly to show the significance of the dependency structure of hidden variables. In HMaF, a hidden variable for a node depends on the hidden variables for its neighboring (closest) nodes alone, satisfying with the Markov property. The potential function in this model is defined as follows: V M ðzþ ¼ XN n¼1 1 X ðz n 6¼ z i Þ; js n j i2s n 1 A general abbreviation of a hidden Markov random field is HMRF. If we follow this abbreviation, our model which we call a hidden modular random field is also HMRF. So in this article, we use HMRF for a hidden Markov random fiela hidden Markov random field while for our moded while for our model. ð2þ ð3þ i469

3 M.Shiga et al. a Observable b a b c Fig. 3. Image restoration by (a) HMaF and by (b) k-means. Hidden Fig. 2. (a) HMaF for image restoration and (b) observable noisy data and (c) the true image. where S n is a set of neighboring nodes of n and ðþ is defined as follows: ðz i 6¼ z j Þ¼ 1 when z i 6¼ z j ; 0 others As seen by Equation (3), the value of V M ðzþ increases as the neighboring hidden variables take the same value. Using this potential function, the probabilistic distribution over Z in HMaF can be defined as follows: PðZÞ ¼ 1 C M ðþ exp V MðZÞ ; ð4þ where C M ðþ ¼ P Z exp V MðZÞ. A major application of HMaF is image restoration (or image modeling), which is a problem of estimating true pixel labels from an observed image which is usually very noisy (Fjortoft et al., 2003; Zhang et al., 2001). We show the effectiveness of using the network information of hidden variables through an example of image restoration. Figure 2 shows a schematic picture of applying HMaF to image restoration, exactly according to (Zhang et al., 2001) in which observable and hidden variables are used for (b) noisy images and (c) clear true pixels, respectively. Each of (b) and (c) has 2500 (¼50 50) pixels, and a pixel can take one of three labels as a true value. As shown in Figure 2a, a typical network structure for the relation of hidden variables in image restoration is a grid structure in which a node is connected to neighboring four (left, right, up and down) nodes only, keeping the Markov property. Figure 3a shows the result obtained by applying HMaF to the data shown in Figure 2b, while Figure 3b shows that by applying k-means (which corresponds to Figure 1a) to the same data. As shown in these figures, HMaF clearly removes the noise in Figure 2b while obviously k-means cannot, indicating that the network structure assumption in HMaF works effectively for image restoration. 2.3 Our model: hidden Modular random field () for gene clustering As shown in the previous section, the network structure of hidden variables is an important assumption to estimate the values of hidden variables exactly. The hidden structure in image restoration is a grid shape in which the number of edges from a node, i.e. the degree of a node, is basically constant. In contrast, the degree of a node in a gene network varies according to the scale-free nature (Jeong et al., 2000; Ravasz et al., 2002), suggesting that the grid shape is inappropriate for a gene network and the Markov property would be insufficient. It is already reported that the genes having the same function (cluster) tend to be gathered together on a gene network, especially metabolic networks (Ravasz et al., 2002). This feature is called modularity, a global feature of gene networks. Modularity of a network is defined as a quantity which becomes larger as with increasing the number of edges in a cluster and with decreasing the number of edges between two different clusters (Guimera and Nunes Amaral, 2005; Guimera et al., 2004; Newman and Girvan, 2004). Modularity can then be mathematically defined as follows: ( RðZÞ ¼ XK l k ðzþ L d ) 2 kðzþ ð5þ 2L k¼1 where L is the total number of edges in a given network, l k ðzþ is the number of edges in cluster k and d k ðzþ is the total sum of degrees in cluster k. The first term of the right-hand side of Equation (5) increases with the number of edges inside a cluster, playing the same role as that of Equation (3) which becomes larger as the same cluster is assigned to neighboring genes on a gene network. On the other hand, the second term checks the number of edges not only in a cluster but also crossing different clusters. Totally, the right-hand side becomes larger as with increasing the number of edges in a cluster without increasing the number of edges crossing different clusters. Thus Equation (5) can reflect the modularity, a global feature of gene networks. Thus, we propose a new random field using V D ðzþð¼ N RðZÞÞ as a potential function as follows: PðZÞ ¼ 1 C D ðþ exp V DðZÞ ; ð6þ where C D ðþ ¼ P Z exp V DðZÞ. We call this hidden random field which stands for hidden modular random field. 3 METHOD: LEARNING ALGORITHM FOR 3.1 EM algorithm The EM algorithm is a local optimization method which is widely used for estimating parameters of a lot of probabilistic i470

4 Clustering genes with expression data and network modularity models with hidden variables (Dempster et al., 1977). According to the EM algorithm, we have to minimize the following so-called Q function: Qðhjh ðtþ Þ¼ X PðZjX; h ðtþ Þ log PðX; ZjhÞ Z ¼ log PðX; ZjhÞ where h ðtþ is the set of current parameters, h is the set of parameters to be estimated and a -function is used for PðZjX; h ðtþ Þ. This Q function is true of a lot of other probabilistic models with hidden variables, such as k-means (Kearns et al., 2001). We assume that the probabilistic distribution of an observable variable follows the von Mises Fisher distribution (Mardia and Jupp, 2000), since this distribution is used for clustering high-dimensional data such as text documents (Zhong and Ghosh, 2005):! pðx n jz n ¼ kþ ¼ 1 exp xt n l k pffiffiffiffiffiffiffiffiffiffi Cv x T n x ; ð8þ n where, qffiffiffiffiffiffiffiffiffiffi l k is the center of cluster k and must satisfy with l T k l k ¼ 1, and C v is a normalization constant. From Equations (2, 6 8), we can have the objective function J which should be minimized: ( ) J :¼ ð1!þ XK X xt n l k pffiffiffiffiffiffiffiffiffiffi k¼1 n2u k x T n x!v D ðzþ; ð9þ n where! satisfies with ¼! 1! and 0! 1, and U k is a set of the indices of hidden variables taking cluster k. We note that! is a weight parameter which is not trained from data, and so C D () in Equation (6) becomes a constant when we derive the above Equation (9). The EM algorithm suggests a procedure to minimize J by optimizing cluster centers l k ðk ¼ 1; 2,...; KÞ and hidden variables Z separately. More concretely, this algorithm repeats the following two steps alternately: (1) we optimize cluster centers using fixed hidden variables and (2) optimize hidden variables using fixed cluster centers. For the first step, when hidden variables Z are fixed, the objective function J is minimized by the following l k : ð7þ 3.2 ICM method The basic procedure of the ICM method repeats the following two steps until the convergence of J: (1) we first randomly permute N hidden variables. (2) In the permuted order, we update the value of the chosen hidden variable so that J should be minimized. The ICM method is a local optimization method, and so we can repeat the above procedure many times using different initial values to find the most optimal solution. We emphasize that the ICM method for our model is timeefficient, although our model deals with the global feature of a given network. The reason is as follows: we do not have to compute function J at every time of doing the above two steps. Instead, we can compute the difference between the current J and the new J. More concretely, after we choose a hidden variable, say z n and then examine a new hidden (cluster) value, say j, the difference between the current J specified by z n and the new J by j can be given by the function of these two: J ðn;jþ ðzþ ¼ ð1!þ xt n l j x T n l i pffiffiffiffiffiffiffiffiffiffi x T n x!v ðn;jþ D ðzþ; n where i is the current value of hidden variable z n and V ðn;jþ D ðzþ : ¼ 1 l ðn;jþ ðzþ l ðn;iþ ðzþ L þ dðnþ ðzþ 2L 2 d i ðzþ d j ðzþ d ðnþ ðzþ ; where, l ðn;iþ ðzþ is the number of edges coming to the nodes in cluster i and d ðnþ ðzþ is the degree of node n. From the above equations, we note that J ðn;jþ ðzþ can be computed by using only the local (neighboring) nodes of node n, although reflects the global structure of a given biological network. Thus, the practical computation time of is totally comparable to that of HMaF which just uses the local information of a given network. Finally, Figure 4 shows our entire algorithm of estimating probabilistic parameters by minimizing J. Line 1 is the input of y k l k ¼ q ffiffiffiffiffiffiffiffiffi ; y T k y k where y k ¼ 1 X x n : ju k j n2u k The second step is to optimize J when we fix the above cluster centers. We note that potential function VðZÞ is not the sum over all hidden variables z n ðn ¼ 1,...; NÞ, since hidden variables can be dependent upon each other. We then have to find the optimal combination of hidden values, but the computational complexity of this problem is NP-hard. We then use the ICM (iterative conditional modes) method (Besag, 1986) to find an approximation solution of this problem. Fig. 4. Our entire algorithm for estimating parameters of. Let O n be a set of neighbors of node n. i471

5 M.Shiga et al. initial values which are, in our experiments, obtained by repeating randomly assigning cluster labels to the nodes of a given gene network and selecting the set of cluster labels with the highest modularity which can be computed by Equation (5). Lines 3 6 are the ICM method to which, as shown in line 5, we add one constraint that the new cluster label is selected out of the labels of neighboring nodes only. This device incorporates the local information of a given network in updating labels and speeds up the efficiency of this algorithm. Lines 2 9 are the entire EM algorithm. 4 EXPERIMENTAL RESULTS 4.1 Data We used metabolic pathways for a gene network and focused on the data of Saccharomyces cerevisiae in both the metabolic gene network and microarray expression data. We will show the detail of our data subsequently Metabolic network We generated a metabolic network from KEGG (Kyoto Encyclopedia of Genes and Genomes) (Kanehisa et al., 2006), by assigning an enzyme gene to a node and connecting the two nodes of two neighboring genes on the KEGG metabolic pathway by an edge. We removed a node with no edges and an isolated network with no more than 15 nodes. The generated network had 636 nodes (genes) and 3104 edges. We note again that this metabolic network is an undirected gene network, meaning that each node corresponds to a gene Microarray expression From the GEO (Gene Expression Omnibus) database (Edgar et al., 2002), we chose five datasets of S. cerevisiae under the condition that each dataset has more than 50 experiments (observations), because a larger number of observations is more reliable when genes with similar expression patterns are clustered in the same group. We further added a dataset (Hughes et al., 2000) which has been often used in microarray analysis (Huang and Pan, 2006; Pan, 2006; Wu et al., 2002; Zhou et al., 2002) since it has 300 observations while a small number of missing values. Tables 1 shows the detail of the above six datasets. A missing value in these datasets was interpolated by using the 10-nearest least square method by (Troyanskaya et al., 2001) Standard gene cluster data Gene clusters are variable depending on the condition under which genes are placed. However, to evaluate the performance of our method for clustering genes, we generated two different sets of standard gene clusters (or true cluster labels) from KEGG and GO (Ashburner et al., 2000), which we call KEGG clusters and GO clusters, respectively. (1) KEGG clusters: The KEGG pathway maps are classified into six major categories, including metabolism. We used 10 sub-categories under the metabolism category as KEGG clusters. Tables 2 shows the list of KEGG clusters and the number of genes in the corresponding cluster. We note that a gene can be in more than one clusters. (2) GO clusters: GO is the ontology of gene functions which are classified into three categories: molecular function, Table 1. Microarray expression data summary Name observations Number of missing values Reference Brem (Brem et al., 2002) Gasch (Gasch et al., 2000) Hughes (Hughes et al., 2000) Spellman (Spellman et al., 1998) Storey (Storey et al., 2005) Yvert (Yvert et al., 2003) Table 2. Ten standard gene clusters from KEGG Cluster name Amino acid metabolism 208 Carbohydrate metabolism 226 Metabolism of cofactors and vitamins 83 Energy metabolism 56 Glycan biosynthesis and metabolism 66 Lipid metabolism 72 Nucleotide metabolism 110 Metabolism of other amino acids 42 Biosynthesis of secondary metabolites 13 Xenobiotics biodegradation and metabolism 22 Number of genes Table 3. Number of gene clusters from GO when changing the minimum size of a cluster Cut-off value Number of clusters biological process and cellular component. Out of these three major categories, we used the function labels in biological process only. In GO, the relation between gene functions forms a so-called DAG (directed acyclic graph) which is a directed graph with no cycles. Each node of the DAG corresponds to a gene function (cluster label) to which corresponding genes are assigned. Starting with the root of the DAG, we went from a parent to its child, checking the number of genes assigned to each node. Naturally, the number of genes reduces as with going to descendants. We fixed a cutoff value (minimum size of a cluster), and if the number of genes was less than the specified cutoff value, we went back to its parent and stopped there. If the number of genes at the parent is larger than 300, we did not use that node as a cluster because its size is too large to annotate gene function. Tables 3 shows the number of clusters obtained by the above procedure, changing the cutoff value. We selected the cutoff value of 75 to have 19 clusters as i472

6 Clustering genes with expression data and network modularity cell organization and biogenesis cellular biosynthesis cellular catabolism lipid metabolism carboxylic acid metabolism biologocal process 636 ( Root ) nitrogen compound biosynthesis amino acid biosynthesis 88 amine biosynthesis Fig standard gene clusters (nodes with thick lines) from GO cellular carbohydrate metabolism cellular lipid metabolism amino acid and derivative metabolism alcohol metabolism nucleobase, nucleoside, 182 nucleotide and nucleic acid metabolism 83 energy derivation by 82 oxidation of organic compounds protein biosynthesis cellular macromolecule metabolism 97 RNA metabolism protein modification biopolymer modification GO clusters, because of choosing a little different number from 10, which was the number of KEGG clusters. Figure 5 shows all nodes appeared during the above procedure to obtain 19 clusters which correspond to circles with thick lines in this figure. The number in each circle (node) in Figure 5 shows the number of corresponding genes. The name attached to each node shows the corresponding cluster name. As well as KEGG clusters, a gene can be in more than one clusters. 4.2 Performance evaluation: criterion In information theory, mutual information is defined as a quantity to measure the amount of information shared between two random variables. This criterion can be applied to check the overlap between our clustering result and standard clusters. This is because the mutual information between two sets of cluster labels becomes larger as one set of clusters is more consistent with the other set of clusters. In general, mutual information is normalized since the range taken by mutual information depends on the size of given sets of clusters (Strehl and Ghosh, 2003). We note that normalized mutual information (NMI) has been widely used in a lot of applications to measure the performance of clustering methods (Zhong and Ghosh, 2003, 2005). Let Gð¼ ðg 1,...; G K ÞÞ be a cluster set of our result, where G i be a set of genes in cluster i. Let G 0 ð¼ ðg 0 1,...; G 0 K 0ÞÞ be a standard cluster set, where G 0 i be a set of genes in standard (true) cluster i. NMI is defined as follows: NMI : ¼ MIðG; G 0 Þ pffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffi HðGÞ HðG 0 Þ where MIðX; YÞ : ¼ HðXÞþHðYÞ HðX; YÞ; HðXÞ : ¼ X PðXÞ log PðXÞ; HðX; YÞ : ¼ X X PðX; YÞ log PðX; YÞ: X Y 4.3 Performance evaluation: results Parameter setting and competing methods We checked three different sizes of clusters: K ¼ 10, 15 and 20, since two sets of standard clusters have 10 and 19 clusters. As shown in Equation (9), the objective function that must be minimized has parameter! which varies in the range between zero and one. If we set! at zero, J has the first term only, consisting of microarray expression alone. This case, our model and its learning algorithm are almost the same as clustering genes by k-means with microarray expression data, and so hereafter we call! ¼ 0 as k-means. On the other hand, if we set! at one, J has the second term alone, meaning that genes are clustered by a given biological network only. This is similar to graph partitioning in which nodes are clustered by using the connectivity of a given graph. Thus, we changed the value of parameter! from zero to one at every 0.1. We then mainly examined the performance of our method at three values: zero, one and! that gives the highest performance, which we denote! 0,! 1 and! max, respectively. Our method was further compared with two famous graph partitioning methods, called (Karypis and Kumar, 1998a) and (Karypis and Kumar, 1998b), both of which use only the structure of a given metabolic network, meaning that their NMIs are irrelevant to! and microarray datasets. This is true of! NMI We first show the result when we used KEGG clusters. Figure 6 shows the NMI of when we used Hughes as microarray expression data. This figure shows that the NMI of! max was higher than those of! 0 (k-means) and! 1 in all three cases of K. This indicates that the clustering result obtained by using both the modularity of the metabolic network and microarray expression data is more consistent with KEGG clusters than that by either of the two types of data. We can further see that the NMI values in 05!51 were higher than those by and which are shown as dashed lines in Figure 6, indicating that our clustering result i473

7 M.Shiga et al. a b c NMI Fig. 6. NMI changing with! (weight) when we used Hughes, KEGG clusters and K¼ (a)10, (b)15 and (c) 20. Table 4. NMI results summary when we used KEGG clusters Data! 0! 1! max (a) K¼10 Brem (0.2) Gasch (0.2) Hughes (0.3) Spellman (0.4) Storey (0.5) Yvert (0.2) (b) K¼15 Brem (0.2) Gasch (0.3) Hughes (0.2) Spellman (0.2) Storey (0.2) Yvert (0.3) (c) K¼20 Brem (0.4) Gasch (0.4) Hughes (0.4) Spellman (0.3) Storey (0.2) Yvert (0.2) was overlapped more with KEGG clusters than those by the two popular graph partitioning methods. These findings were true of other microarray datasets. Tables 4 summarizes the NMI values for all six microarray datasets used in our experiments. Each! max value is shown in the corresponding parenthesis. From this table, we can see that the NMI of! max was the best among five compared methods in all 18 cases. We then show the results obtained by using GO clusters as standard data. Figure 7 shows the NMI of when we used Hughes, changing K and!. The NMI values of and are also shown in this figure as dashed lines. From this figure, the advantage of over other four methods was confirmed again. The NMI values in this figure ranged roughly from 0.15 to 0.2, which were smaller than those by KEGG clusters, probably because the number of GO clusters is 19 which is larger than the number of KEGG clusters, i.e. 10. Tables 5 summarizes the NMI values obtained by all six microarray datasets. From this table, we can see that the NMI of! max was the highest among the five competing methods in all 18 cases, totally following the result by KEGG clusters. From Tables 4 and 5, we can see that! max is 0.2 in 21 cases in all 36 cases, meaning that the clustering performance of! at 0.2 is almost always close to that of! max (or the true maximum). Thus, hereafter we fix! at 0.2, which we denote! 0:2. The above results show the clear advantage of! max (and! 0:2 ) over k-means (! 0 ), and, but the difference of NMI between! max and! 1 was rather slight. We then examined the statistical significance in the difference of NMI between! 0:2 and! 1. Tables 6 shows the P-values of pairwise t-test (Kreyszig, 1970) between! 0:2 and! 1 over all six microarray datasets. This table indicates that the NMI of! 0:2 was higher than that of! 1 with a statistical significance of more than around 99.5% in all six cases. Overall we can say that our method of combining microarray expression data with the network modularity achieved a clearly higher performance in gene clustering than other four compared methods which use only either of the two data types. 4.4 Annotating gene function Once a cluster label is assigned to each gene, we can compute the P-value between each gene function in some ontology database like GO and a cluster containing the gene in question. We can then rank the gene functions according to their P-values, and the top function can be assigned to each gene of the cluster. This manner of function annotation would be helpful for a functionally putative gene. To compute the P-values and rank them, we can use a software called GO Term Finder (Boyle et al., 2004) which uses a set of genes in a cluster as input and outputs a list of gene functions ranked according to P-values. Tables 7 shows an example of function annotation, i.e. a list of gene functions with P-values for a cluster which is obtained under the condition that the microarray dataset was Hughes and K¼20. We show this example, since this cluster has the i474

8 Clustering genes with expression data and network modularity a 0.2 b c 0.2 NMI Fig. 7. NMI changing with! (weight) when we used Hughes, GO clusters and K¼ (a)10, (b)15 and (c) 20. Table 5. NMI results summary when we used GO clusters Table 6. P-values of pairwise t-test between! 0.2 and! 1 Data! 0! 1! max (a) K¼10 Brem (0.2) Gasch (0.2) Hughes (0.2) Spellman (0.2) Storey (0.1) Yvert (0.1) (b) K¼15 Brem (0.2) Gasch (0.2) Hughes (0.1) Spellman (0.2) Storey (0.2) Yvert (0.1) (c) K¼20 Brem (0.2) Gasch (0.1) Hughes (0.2) Spellman (0.3) Storey (0.2) Yvert (0.2) highest P-value, 1.62E-37, among all 20 clusters obtained under the above condition. The top of this list is protein modification. In fact, this cluster contained 45 genes including, say ALG11/YNL048W (mannosyltransferase) and ALG6/ YOR002W (glucosyltransferase) which add a sugar unit to a protein amino acid, meaning that their function is protein modification. 4.5 Analysis on clustering result Figure 8 shows the three metabolic gene networks obtained by! 0,! 0:2 and! 1 using Hughes under K ¼ 20. Each node corresponds to a gene, and the edge between two nodes indicates that the genes of these two nodes are neighboring enzyme genes on the KEGG metabolic pathway map. Number of clusters (K) KEGG clusters GO clusters E E E E E E 4 Table 7. An example of function annotation: top five functions for an obtained cluster having, say ALG11/YNL48W (mannosyltransferase) GO gene function Protein modification Biopolymer modification Protein metabolic process Cellular protein metabolic process Macromolecule biosynthetic process P-value 1.62E E E E E 29 These networks have the same structure but nodes were differently colored. Each color indicates a cluster, meaning that genes labeled with the same color fall into the same cluster. From this figure, we can easily see that nodes with the same color were widely distributed over the entire network in (a), while the nodes with the same color were gathered together in (c) because nodes were clustered by the network structure only. We then focused on the square at the top-left part in each of the three networks in Figure 8. Figure 9 shows the enlargement of each of these squares. In Figure 9, two colors, red and green, were merged in (b) while more than two colors were merged in (a) and the two colors were clearly separated in (c). As shown in the previous section, (b), i.e.! 0:2, was closest to the standard clusters among! 0,! 0:2 and! 1. We then checked orange colored genes in (b) by using the KEGG database and found that they correspond to those categorized in metabolism of cofactors and vitamins, more precisely those in the folate biosynthesis pathway. Figure 10 shows this pathway in which, i475

9 M.Shiga et al. a b c Fig. 8. Clustered genes on metabolic networks when we used Hughes, K ¼ 20 and (a)! 0,(b)! 0:2 and (c)! 1. a YDR481C b YDR481C YER171W YER171W c YDR481C YER171W YGL251C YGL163C YDR040C YGR271W YJL092W YGR267C YOR163W YOR116C YGL163C YDR040C YGL251C YGR271W YJL092W YGR267C YOR163W YOR116C YGL251C YGL163C YDR040C YGR271W YJL092W YGR267C YOR163W YOR116C YDR038C YKR080W YPL235W YJR103W YDR268W YER172C YDR045C YOR241W YJR063W YOR236W YJL101C YNL256W YMR113W YPL235W YER172C YDR038C YKR080W YJR103W YDR268W YOR241W YOR236W YJL101C YNL256W YJR063W YDR045C YMR113W YDR038C YKR080W YPL235W YJR103W YDR268W YER172C YDR045C YOR241W YJR063W YOR236W YJL101C YNL256W YMR113W Fig. 9. Enlargement of the square located at the top-left part in Figure 8 (a)! 0,(b)! 0:2 and (c)! 1. e.g. dihydroneopterin (which is located at almost the center) is generated from GTP, catalyzed by FOL2/YGR267C (GTP-cyclohydrolase I) and PHO8/YDR481C (repressible alkaline phosphatase). These two genes were colored orange in Figure 9b, while not in both (a) and (c), meaning that they were wrongly clustered under! 0 and! 1. This result is true of other orange-colored genes in (b) such as RAD54/YGL163C (DNA-dependent ATPase) and HIR2/YDR038C (Histone transcription regulator) which are also in the folate biosynthesis pathway but not colored orange in (a) and (c). These four genes were all colored dark blue in (a). A possible reason for this is that in Hughes, they must not be co-expressed with other orange colored genes such as ENA1/YDR040C (ATPase) and RAD3/YER171W (DNA helicase), though the four genes themselves might be co-expressed with each other. On the other hand, a reason why the above four genes were clustered into the green group in (c) is that since only the network structure is considered in (c), MTD1/YKR080W (methylenetetrahydrafolate dehydrogenase), a major hub colored green (located at slightly right from the center) strongly affected other genes linked to this gene. In fact, as shown in Figure 9, YGR267C, YDR481C and YGL163C are directly connected to YKR080W. Interestingly, YDR038C is not linked to YKR080W but to the above all three genes, implying that YDR038C was colored green by the three genes (YGR267C, YDR481C and YGL163C) which were already colored green by YKR080W. On contrary, a possible reason why these four genes were well clustered in (b) might be that YDR038C was first colored orange by, say MET7/YOR241W (folylpolyglutamate synthetase) and/or DFR1/YOR236W (dihydrofolate reductase), and then other three were colored orange by the link to YDR038C or by their tight co-expression with YDR038C. Thus this result and analysis also confirmed the advantage of our method for balancing the two different types of data. 5 CONCLUDING REMARKS We have presented a systematic method for gene clustering by combining expression data with the modularity of gene i476

10 Clustering genes with expression data and network modularity Formamidopyrimidine nucleoside triphosphate ,5-Diaminopyrimidine nucleoside triphosphate YGR267C ,5-Diamino-6- (5'-triphosphoryl-3',4'-trihydroxy-2'-oxopentyl) -amino-4-oxopyrimidine Amino-4-hydroxy-6- (erythro-1,2,3-trihydroxypropyl) dihydropteridine triphosphate YDR038C YDR040C YDR481C YGR271W YGL163C Dihydroneopterin YER171W YOR163W phosphate YER172C YJL092W Dihydroneopterin YGL251C YPL235W Folate YNL256W YOR236W THF-L-glutamate GTP DHF THF ,8-Dihydropteroate THF-polyglutamate Fig. 10. Folate biosynthesis pathway. networks, based on learning of a probabilistic model. Experimental results showed that our method successfully integrated the totally different two types of data for clustering genes to accurately annotate gene function. Interesting future work is to use another or other multiple types of gene networks which are not necessarily curated, since KEGG is a curated database, and the KEGG network might be closely related to the gold-standard gene clusters. It would also be interesting to consider a probabilistic model in which more than one cluster labels can be probabilistically assigned to each gene. ACKNOWLEDGEMENTS This work is supported in part by Bioinformatics Education Program Education and Research Organization for Genome Information Science and Kyoto University 21st Century COE Program Knowledge Information Infrastructure for Genome Science with support from MEXT, Japan. Conflict of Interest: none declared. 2-Amino-4-hydroxy-6-hydroxymethyl -7,8-dihydropteridine 2-Amino-4-hydroxy-6-hydroxymethyl -7,8-dihydropteridine-P2 YOR241W YMR113W REFERENCES Ashburner,M., et al. (2000) Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet., 25, Besag,J. (1986) On the statistical analysis of dirty pictures. J.R. Statist. Soc. B, 48, Boyle,E.I., et al. (2004) GO::TermFinder open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics, 20, Brem,R.B., et al. (2002) Genetic dissection of transcriptional regulation in budding yeast. Science, 296, Dempster,A.P., et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. J.R. Statist. Soc. B, 39, Edgar,R., et al. (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., 30, Eisen,M.B., et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, Fjortoft,R., et al. (2003) Unsupervised classification of radar images using hidden Markov chains and hidden Markov random fields. IEEE Trans. Geosci. Remote Sens., 41, Gasch,A.P., et al. (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell, 11, Guimera,R. and Nunes Amaral,L.A. (2005) Functional cartography of complex metabolic networks. Nature, 433, Guimera,R., et al. (2004) Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E, 70, Hanisch,D., et al. (2002) Co-clustering of biological networks and gene expression data. Bioinformatics, 18 (Suppl.), S145 S154. Huang,D. and Pan,W. (2006) Incorporating biological knowledge into distancebased clustering analysis of microarray gene expression data. Bioinformatics, 22, Hughes,T.R., et al. (2000) Functional discovery via a compendium of expression profiles. Cell, 102, Jeong,H., et al. (2000) The large-scale organization of metabolic networks. Nature, 407, Kanehisa,M., et al. (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res., 34, D354 D357. Karypis,G. and Kumar,V. (1998a) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput., 20, Karypis,G. and Kumar,V. (1998b) Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput., 48, Kearns,M., et al. (2001) An information-theoretic analysis of hard and soft assignment methods for clustering. In 13th Annual Conference on Uncertainty in Artificial Intelligence (UAI2001), pp Kerr,M.K. and Churchill,G.A. (2001) Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc. Natl. Acad. Sci. USA, 98, Kharchenko,P., et al. (2005) Expression dynamics of a cellular metabolic network. Mol. Syst. Biol., msb , E1 E6. Kreyszig,E. (1970) Introductory Mathematical Statistics. John Wiley & Sons. Mardia,K.V. and Jupp,P.E. (2000) Directional Statistics. 2nd edn. JohnWiley & Sons. Newman,M.E.J. and Girvan,M. (2004) Finding and evaluating community structure in networks. Phys. Rev. E, 69, Pan,W. (2006) Incorporating gene functions as priors in model-based clustering of microarray gene expression data. Bioinformatics, 22, Ravasz,E., et al. (2002) Hierarchical organization of modularity in metabolic networks. Science, 297, Spellman,P.T., et al. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell., 9, Storey,J.D., et al. (2005) Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol., 3, e267. Strehl,A. and Ghosh,J. (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J. Comput., 15, Tamayo,P., et al. (1999) Interpreting patterns of gene expression with selforganizing maps: methods and application to hematopoietic differentiation. Proc. Natl Acad. Sci. USA, 96, Tavazoie,S., et al. (1999) Systematic determination of genetic network architecture. Nat. Genet., 22, i477

11 M.Shiga et al. Troyanskaya,O., et al. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics, 17, Wu,L.F., et al. (2002) Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet., 31, Yvert,G., et al. (2003) Trans-acting regulatory variation in saccharomyces cerevisiae and the role of transcription factors. Nat. Genet., 35, Zhang,K. and Zhao,H. (2000) Assessing reliability of gene clusters from gene expression data. Funct. Integr. Genomics, 1, Zhang,Y., et al. (2001) Segmentation of brain MR images through a hidden Markov random field model and the Expectation-Maximization algorithm. IEEE Trans. Med. Imaging, 20, Zhong,S. and Ghosh,J. (2003) A unified framework for model-based clustering. J. Mach. Learn. Res., 4, Zhong,S. and Ghosh,J. (2005) Generative model-based document clustering: a comparative study. Knowl. Inf. Syst., 8, Zhou,X., et al. (2002) Transitive functional annotation by shortest-path analysis of gene expression data. Proc. Natl Acad. Sci. USA, 99, i478

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Integration of functional genomics data

Integration of functional genomics data Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics

More information

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species 02-710 Computational Genomics Reconstructing dynamic regulatory networks in multiple species Methods for reconstructing networks in cells CRH1 SLT2 SLR3 YPS3 YPS1 Amit et al Science 2009 Pe er et al Recomb

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Detecting temporal protein complexes from dynamic protein-protein interaction networks

Detecting temporal protein complexes from dynamic protein-protein interaction networks Detecting temporal protein complexes from dynamic protein-protein interaction networks Le Ou-Yang, Dao-Qing Dai, Xiao-Li Li, Min Wu, Xiao-Fei Zhang and Peng Yang 1 Supplementary Table Table S1: Comparative

More information

Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree

Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree YOICHI YAMADA, YUKI MIYATA, MASANORI HIGASHIHARA*, KENJI SATOU Graduate School of Natural

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Application of random matrix theory to microarray data for discovering functional gene modules

Application of random matrix theory to microarray data for discovering functional gene modules Application of random matrix theory to microarray data for discovering functional gene modules Feng Luo, 1 Jianxin Zhong, 2,3, * Yunfeng Yang, 4 and Jizhong Zhou 4,5, 1 Department of Computer Science,

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs

Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs Xiao Chang 1,#, Tao Xu 2,#, Yun Li 3, Kai Wang 1,4,5,* 1 Zilkha Neurogenetic Institute,

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

Using graphs to relate expression data and protein-protein interaction data

Using graphs to relate expression data and protein-protein interaction data Using graphs to relate expression data and protein-protein interaction data R. Gentleman and D. Scholtens October 31, 2017 Introduction In Ge et al. (2001) the authors consider an interesting question.

More information

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms Bioinformatics Advance Access published March 7, 2007 The Author (2007). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

More information

Systems biology and biological networks

Systems biology and biological networks Systems Biology Workshop Systems biology and biological networks Center for Biological Sequence Analysis Networks in electronics Radio kindly provided by Lazebnik, Cancer Cell, 2002 Systems Biology Workshop,

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES

EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES Charless Fowlkes 1, Qun Shan 2, Serge Belongie 3, and Jitendra Malik 1 Departments of Computer Science 1 and Molecular Cell Biology 2, University

More information

Computational Systems Biology

Computational Systems Biology Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery

More information

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 Cluster Analysis of Gene Expression Microarray Data BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 1 Data representations Data are relative measurements log 2 ( red

More information

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Title Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis Author list Yu Han 1, Huihua Wan 1, Tangren Cheng 1, Jia Wang 1, Weiru Yang 1, Huitang Pan 1* & Qixiang

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

Biological Systems: Open Access

Biological Systems: Open Access Biological Systems: Open Access Biological Systems: Open Access Liu and Zheng, 2016, 5:1 http://dx.doi.org/10.4172/2329-6577.1000153 ISSN: 2329-6577 Research Article ariant Maps to Identify Coding and

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

Fuzzy Clustering of Gene Expression Data

Fuzzy Clustering of Gene Expression Data Fuzzy Clustering of Gene Data Matthias E. Futschik and Nikola K. Kasabov Department of Information Science, University of Otago P.O. Box 56, Dunedin, New Zealand email: mfutschik@infoscience.otago.ac.nz,

More information

Probabilistic sparse matrix factorization with an application to discovering gene functions in mouse mrna expression data

Probabilistic sparse matrix factorization with an application to discovering gene functions in mouse mrna expression data Probabilistic sparse matrix factorization with an application to discovering gene functions in mouse mrna expression data Delbert Dueck, Quaid Morris, and Brendan Frey Department of Electrical and Computer

More information

Francisco M. Couto Mário J. Silva Pedro Coutinho

Francisco M. Couto Mário J. Silva Pedro Coutinho Francisco M. Couto Mário J. Silva Pedro Coutinho DI FCUL TR 03 29 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal Technical reports are

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Systems biology Introduction to Bioinformatics Systems biology: modeling biological p Study of whole biological systems p Wholeness : Organization of dynamic interactions Different behaviour of the individual

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Antonina Mitrofanova New York University antonina@cs.nyu.edu Vladimir Pavlovic Rutgers University vladimir@cs.rutgers.edu

More information

Genome-Scale Gene Function Prediction Using Multiple Sources of High-Throughput Data in Yeast Saccharomyces cerevisiae ABSTRACT

Genome-Scale Gene Function Prediction Using Multiple Sources of High-Throughput Data in Yeast Saccharomyces cerevisiae ABSTRACT OMICS A Journal of Integrative Biology Volume 8, Number 4, 2004 Mary Ann Liebert, Inc. Genome-Scale Gene Function Prediction Using Multiple Sources of High-Throughput Data in Yeast Saccharomyces cerevisiae

More information

Context dependent visualization of protein function

Context dependent visualization of protein function Article III Context dependent visualization of protein function In: Juho Rousu, Samuel Kaski and Esko Ukkonen (eds.). Probabilistic Modeling and Machine Learning in Structural and Systems Biology. 2006,

More information

Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions

Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions Belfield Campus Map Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions

More information

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen Bayesian Hierarchical Classification Seminar on Predicting Structured Data Jukka Kohonen 17.4.2008 Overview Intro: The task of hierarchical gene annotation Approach I: SVM/Bayes hybrid Barutcuoglu et al:

More information

Topographic Independent Component Analysis of Gene Expression Time Series Data

Topographic Independent Component Analysis of Gene Expression Time Series Data Topographic Independent Component Analysis of Gene Expression Time Series Data Sookjeong Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling

Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Abstract An automated unsupervised technique, based upon a Bayesian framework, for the segmentation of low light level

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Biological Process Term Enrichment

Biological Process Term Enrichment Biological Process Term Enrichment cellular protein localization cellular macromolecule localization intracellular protein transport intracellular transport generation of precursor metabolites and energy

More information

Types of biological networks. I. Intra-cellurar networks

Types of biological networks. I. Intra-cellurar networks Types of biological networks I. Intra-cellurar networks 1 Some intra-cellular networks: 1. Metabolic networks 2. Transcriptional regulation networks 3. Cell signalling networks 4. Protein-protein interaction

More information

86 Part 4 SUMMARY INTRODUCTION

86 Part 4 SUMMARY INTRODUCTION 86 Part 4 Chapter # AN INTEGRATION OF THE DESCRIPTIONS OF GENE NETWORKS AND THEIR MODELS PRESENTED IN SIGMOID (CELLERATOR) AND GENENET Podkolodny N.L. *1, 2, Podkolodnaya N.N. 1, Miginsky D.S. 1, Poplavsky

More information

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Antonina Mitrofanova New York University antonina@cs.nyu.edu Vladimir Pavlovic Rutgers University vladimir@cs.rutgers.edu

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Modeling Gene Expression from Microarray Expression Data with State-Space Equations. F.X. Wu, W.J. Zhang, and A.J. Kusalik

Modeling Gene Expression from Microarray Expression Data with State-Space Equations. F.X. Wu, W.J. Zhang, and A.J. Kusalik Modeling Gene Expression from Microarray Expression Data with State-Space Equations FX Wu, WJ Zhang, and AJ Kusalik Pacific Symposium on Biocomputing 9:581-592(2004) MODELING GENE EXPRESSION FROM MICROARRAY

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Package neat. February 23, 2018

Package neat. February 23, 2018 Type Package Title Efficient Network Enrichment Analysis Test Version 1.1.3 Date 2018-02-23 Depends R (>= 3.3.0) Package neat February 23, 2018 Author Mirko Signorelli, Veronica Vinciotti and Ernst C.

More information

Bioinformatics. Transcriptome

Bioinformatics. Transcriptome Bioinformatics Transcriptome Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/ Bioinformatics

More information

A general co-expression network-based approach to gene expression analysis: comparison and applications

A general co-expression network-based approach to gene expression analysis: comparison and applications BMC Systems Biology This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. A general co-expression

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Measuring TF-DNA interactions

Measuring TF-DNA interactions Measuring TF-DNA interactions How is Biological Complexity Achieved? Mediated by Transcription Factors (TFs) 2 Regulation of Gene Expression by Transcription Factors TF trans-acting factors TF TF TF TF

More information

Protein function prediction via analysis of interactomes

Protein function prediction via analysis of interactomes Protein function prediction via analysis of interactomes Elena Nabieva Mona Singh Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics January 22, 2008 1 Introduction Genome

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Prerequisites Properties of allosteric enzymes. Basic mechanisms involving regulation of metabolic pathways.

Prerequisites Properties of allosteric enzymes. Basic mechanisms involving regulation of metabolic pathways. Case 16 Allosteric Regulation of ATCase Focus concept An enzyme involved in nucleotide synthesis is subject to regulation by a variety of combinations of nucleotides. Prerequisites Properties of allosteric

More information

ATLAS of Biochemistry

ATLAS of Biochemistry ATLAS of Biochemistry USER GUIDE http://lcsb-databases.epfl.ch/atlas/ CONTENT 1 2 3 GET STARTED Create your user account NAVIGATE Curated KEGG reactions ATLAS reactions Pathways Maps USE IT! Fill a gap

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Improving Gene Functional Analysis in Ethylene-induced Leaf Abscission using GO and ProteInOn

Improving Gene Functional Analysis in Ethylene-induced Leaf Abscission using GO and ProteInOn Improving Gene Functional Analysis in Ethylene-induced Leaf Abscission using GO and ProteInOn Sara Domingos 1, Cátia Pesquita 2, Francisco M. Couto 2, Luis F. Goulao 3, Cristina Oliveira 1 1 Instituto

More information

Supporting Information

Supporting Information Supporting Information Weghorn and Lässig 10.1073/pnas.1210887110 SI Text Null Distributions of Nucleosome Affinity and of Regulatory Site Content. Our inference of selection is based on a comparison of

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Hub Gene Selection Methods for the Reconstruction of Transcription Networks for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for

More information

A Multiobjective GO based Approach to Protein Complex Detection

A Multiobjective GO based Approach to Protein Complex Detection Available online at www.sciencedirect.com Procedia Technology 4 (2012 ) 555 560 C3IT-2012 A Multiobjective GO based Approach to Protein Complex Detection Sumanta Ray a, Moumita De b, Anirban Mukhopadhyay

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

A Study of Network-based Kernel Methods on Protein-Protein Interaction for Protein Functions Prediction

A Study of Network-based Kernel Methods on Protein-Protein Interaction for Protein Functions Prediction The Third International Symposium on Optimization and Systems Biology (OSB 09) Zhangjiajie, China, September 20 22, 2009 Copyright 2009 ORSC & APORC, pp. 25 32 A Study of Network-based Kernel Methods on

More information

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information - Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes - Supplementary Information - Martin Bartl a, Martin Kötzing a,b, Stefan Schuster c, Pu Li a, Christoph Kaleta b a

More information

Metabolic networks: Activity detection and Inference

Metabolic networks: Activity detection and Inference 1 Metabolic networks: Activity detection and Inference Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group Advanced microarray analysis course, Elsinore, Denmark, May 21th,

More information

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks Twan van Laarhoven and Elena Marchiori Institute for Computing and Information

More information

Interaction Network Topologies

Interaction Network Topologies Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Inferrng Protein-Protein Interactions Using Interaction Network Topologies Alberto Paccanarot*,

More information

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein Analysis of Gene Expression Data Spring Semester, 2005 Lecture 10: May 19, 2005 Lecturer: Roded Sharan Scribe: Daniela Raijman and Igor Ulitsky 10.1 Protein Interaction Networks In the past we have discussed

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

V19 Metabolic Networks - Overview

V19 Metabolic Networks - Overview V19 Metabolic Networks - Overview There exist different levels of computational methods for describing metabolic networks: - stoichiometry/kinetics of classical biochemical pathways (glycolysis, TCA cycle,...

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

V14 extreme pathways

V14 extreme pathways V14 extreme pathways A torch is directed at an open door and shines into a dark room... What area is lighted? Instead of marking all lighted points individually, it would be sufficient to characterize

More information

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling Lethality and centrality in protein networks Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling molecules, or building blocks of cells and microorganisms.

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier *

Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Protein Structure Prediction Using Multiple Artificial Neural Network Classifier * Hemashree Bordoloi and Kandarpa Kumar Sarma Abstract. Protein secondary structure prediction is the method of extracting

More information

A Max-Flow Based Approach to the. Identification of Protein Complexes Using Protein Interaction and Microarray Data

A Max-Flow Based Approach to the. Identification of Protein Complexes Using Protein Interaction and Microarray Data A Max-Flow Based Approach to the 1 Identification of Protein Complexes Using Protein Interaction and Microarray Data Jianxing Feng, Rui Jiang, and Tao Jiang Abstract The emergence of high-throughput technologies

More information

Module Based Neural Networks for Modeling Gene Regulatory Networks

Module Based Neural Networks for Modeling Gene Regulatory Networks Module Based Neural Networks for Modeling Gene Regulatory Networks Paresh Chandra Barman, Std 1 ID: 20044523 Term Project: BiS732 Bio-Network Department of BioSystems, Korea Advanced Institute of Science

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Pattern Recognition Letters

Pattern Recognition Letters Pattern Recognition Letters 31 (2010) 2133 2137 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec Building gene networks with time-delayed

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Gene Ontology. Shifra Ben-Dor. Weizmann Institute of Science

Gene Ontology. Shifra Ben-Dor. Weizmann Institute of Science Gene Ontology Shifra Ben-Dor Weizmann Institute of Science Outline of Session What is GO (Gene Ontology)? What tools do we use to work with it? Combination of GO with other analyses What is Ontology? 1700s

More information

Combining Vector-Space and Word-based Aspect Models for Passage Retrieval

Combining Vector-Space and Word-based Aspect Models for Passage Retrieval Combining Vector-Space and Word-based Aspect Models for Passage Retrieval Raymond Wan Vo Ngoc Anh Ichigaku Takigawa Hiroshi Mamitsuka Bioinformatics Center, Institute for Chemical Research, Kyoto University,

More information

Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015

Clustering of Pathogenic Genes in Human Co-regulatory Network. Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Clustering of Pathogenic Genes in Human Co-regulatory Network Michael Colavita Mentor: Soheil Feizi Fifth Annual MIT PRIMES Conference May 17, 2015 Topics Background Genetic Background Regulatory Networks

More information

Slide 1 / Describe the setup of Stanley Miller s experiment and the results. What was the significance of his results?

Slide 1 / Describe the setup of Stanley Miller s experiment and the results. What was the significance of his results? Slide 1 / 57 1 Describe the setup of Stanley Miller s experiment and the results. What was the significance of his results? Slide 2 / 57 2 Explain how dehydration synthesis and hydrolysis are related.

More information

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Lecture Notes for Fall Network Modeling. Ernest Fraenkel Lecture Notes for 20.320 Fall 2012 Network Modeling Ernest Fraenkel In this lecture we will explore ways in which network models can help us to understand better biological data. We will explore how networks

More information

IDENTIFYING BIOLOGICAL PATHWAYS VIA PHASE DECOMPOSITION AND PROFILE EXTRACTION

IDENTIFYING BIOLOGICAL PATHWAYS VIA PHASE DECOMPOSITION AND PROFILE EXTRACTION IDENTIFYING BIOLOGICAL PATHWAYS VIA PHASE DECOMPOSITION AND PROFILE EXTRACTION 2691 Yi Zhang and Zhidong Deng * Department of Computer Science, Tsinghua University Beijing, 100084, China * Email: michael@tsinghua.edu.cn

More information