Pattern Recognition Letters

Size: px
Start display at page:

Download "Pattern Recognition Letters"

Transcription

1 Pattern Recognition Letters 31 (2010) Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: Building gene networks with time-delayed regulations Iti Chaturvedi a, *, Jagath C. Rajapakse a,b,c a Bioinformatics Research Center, School of Computer Engineering, Nanyang Technological University, , Singapore b Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA c Singapore-MIT Alliance, , Singapore article info abstract Article history: Available online 6 March 2010 Keywords: Dynamic Bayesian networks Gene regulatory networks Viterbi algorithm Skip-chain model Genetic algorithms We propose a method to build gene regulatory networks (GRN) capable of representing time-delayed regulations. The gene expression data is represented in two types of graphical models: a linear model using a dynamic Bayesian network (DBN) and a skip model using a hidden Markov model. The linear model is designed to find short-delays and skip model for long-delays. The algorithm was tested on time-series data obtained on yeast cell-cycle and validated against protein protein interaction data. The proposed method better fits expression profiles compared to classical higher-order DBN and found core genes that are crucial in cell-cycle regulation. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Gene expressions when collected over sufficiently large number of time points can be used to derive gene regulatory network (GRN). GRN represent causal interactions of genes and gene products in biological systems and provide a basis for signal transduction in biological pathways. Regulatory events among genes are not necessarily happening at the same time scale and several time-delayed interactions are known to exist in biological systems (Wagner and Stolovitzky, 2008). Since the signal transduction is transient, the study of the dynamics of the transduction is essential. The existing methods of deriving GRN from gene expression time-series can be broadly classified into three categories: networks built by using (a) boolean rules (Li et al., 2007); (b) differential equations (Liu et al., 2009); and (c) stochastic modeling (Gebert et al., 2008). Bayesian networks (BN) have been introduced for building gene regulatory networks in the stochastic framework (Friedman et al., 2000). Boolean networks are not causal and built upon mutual information among nodes. BN are causal and therefore are more biologically plausible and accurate than boolean networks. Ordinary differential equations (ODE) can model the complex regulatory dynamics, but BN can assist building such models by finding the underlying structure. Pathways have a natural representation in BN where genes are present at the nodes of the network and the edges represent causal interactions among them. The causal dependencies are in terms of conditional probabilities which infer cause and effect relationships among the genes in the network. However, BN are acyclic, * Corresponding author. address: iti_c@hotmail.com (I. Chaturvedi). and cannot track time-delayed, feedback, and self-regulatory events. The dynamic Bayesian networks (DBN) can model the temporal dynamics where the parents from the previous time instant are assumed to be regulating the genes (Friedman et al., 1998). This first-order assumption allows feedbacks but still deprives DBN of representing time-delayed interactions. The DBN when extended to higher-order is capable of representing delayed interactions. Mutual information has been used to determine the best time-delay (Zhengzheng and Dan, 2006). However, these generative models become computationally intractable at very high-orders. Therefore, we propose skip model which can handle long term delays in regulatory interactions. The skip model is represented by two types of features: (a) linear features modeling the short-delays; and (b) skip features modeling long-delays. In our model, skip features are modeled by using a hidden Markov model (HMM) where the log-likelihood of the network is decomposed into a sum of conditional probabilities between consecutive pairs of genes. So the maximum likelihood estimate of regulation can be found by using the Viterbi algorithm (VA). Our approach consists of two stages: (a) identification of timedelayed interaction features and computation of Viterbi scores; and (b) prediction of the optimal GRN by using a genetic algorithm (GA). The fitness function of the GA includes the Viterbi scores of the skip model. We demonstrate our method with an application to a long time-series of yeast cell-cycle data. We find core genes that are known to have regulatory effects with differing time delays on the cell-cycle. Earlier we have used skip models to find time-delayed interactions in Mycobacterium tuberculosis (Chaturvedi and Rajapakse, 2009). In this paper, we have detailed our formulation on skip /$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi: /j.patrec

2 2134 I. Chaturvedi, J.C. Rajapakse / Pattern Recognition Letters 31 (2010) models and describe implementation on a GA. Experiments were performed on larger sets of genes and higher number of time points obtained on cell-cycle regulation. Further, we show validation of our results by employing protein protein interaction (PPI) data. A validation with BioGrid PPI (Breitkreutz et al., 2008) for higher-order interactions shows that the method is more effective than existing techniques. 2. Methods Consider a set of n genes G ¼fg i : i ¼ 1; 2;...; ng and time-series of gene expressions gathered over T time points for all the genes. Let the gene expression data be x ¼fx i;t g nt in which row vector x i ¼ðx i;t : t ¼ 1; 2;...; TÞ corresponds to gene expression time-series of gene g i. Suppose that gene expressions are discretized into a set C of d levels: C ¼f1; 2;...; dg. Let the set of parent genes regulating the gene g i be denoted as a i and the number of states that a node in a i take to be q i Bayesian networks The Bayesian network (BN) decomposes the likelihood of gene expressions into a product of conditional probabilities by assuming independence of non-descendant genes, given their parents: pðxþ ¼ Yn i¼1 pðx i ja i ; h i Þ where x ¼ðx 1 ; x 2 ;...; x n Þ; pðx i ja i ; h i Þ is the conditional probability of gene expression x i given its parents a i, and h i denotes the parameters of the conditional probabilities. Given the set of conditional distributions with parameters h ¼fh i : i ¼ 1; 2;...; ng, the likelihood can be written as Z pðxþ ¼ pðxjs; hþpðhjsþdh ð2þ Let h ¼ pðx i;t ¼ kja i ¼ jþ and N be the number of instances of h that occur in the training data. Using the property of decomposability (Friedman et al., 1998), pðxþ ¼ Yn Y q i Y d h N i¼1 j¼1 k¼1 The model parameters h are given by the likelihood estimates: N h ¼ P d k¼1 N Then, log-likelihood of the data is given by log pðxþ ¼ Xn i¼1 X q i j¼1 X d k¼1 N N log P d k¼1 N The likelihood approximation is known to be good when a large number of data points are available (Friedman et al., 1998) Dynamic Bayesian networks (DBN) The acyclic condition of BN does not allow self- and feedback-regulations of genes, which are essential characteristics of GRN. The dynamic Bayesian networks (DBN) overcome this by modeling the regulatory network from one time point to the next. A first-order DBN is defined by a transition network of interactions between a pair of structures ðs t ; S tþ1 Þ corresponding to time instances t and t þ 1. In time instance t þ 1, the parents of genes are those specified in the time instant t. The gene ð1þ ð3þ ð4þ ð5þ regulations are obtained by unrolling the transition network over time and assuming first-order stationary behaviour over time. From Eq. (3) the likelihood of the data is given by: PðxÞ ¼ YT Y n Y q i Y d h Nðt;tþ1Þ t¼1 i¼1 j¼1 k¼1 where N ðt;tþ1þ correspond to the number of instances where x i;tþ1 ¼ k while a i;t ¼ j. The first-order DBN has two layers and therefore 2n nodes Hidden Markov models (HMM) The classical DBN is unable to capture complex time-dependencies and is extended to a o-order ðo P 2Þ Markov chain. It predicts the expression levels of a set of genes based on the expressions of up to o previous time points using frequency statistics. Higher-order dynamic Bayesian networks (HDBN) have been proposed to study time-delayed interactions. However, as the order o increases it is not possible to predict the delays because of the difficulty in estimating the increasing number of parameters. Therefore, we resort to a first-order hidden Markov model (HMM) to determine delayed interactions. It determines the probability of expression of a gene g j at time point t, given that g i was observed at s t where s t < t 1 within the section of the time-series of length t s t. Let a sequence of hidden states from time point s t to t be denoted by y st:t ¼ðy st ; y stþ1;...; y t Þ where y t 0 denotes the gene expressed at time point t 0 in the path. Correspondingly, we have the observed data x st:t ¼ðx i 0 ; x ;st i 0 ;stþ1;...; x i 0 ;tþ where x i 0 ;t ¼ k is the discretized gene 0 expression state. The states x i 0 ;t 2f 1; 0; 1g represent down-, 0 un-, and up-regulated genes; Given the microarray data, the maximum likelihood estimation can be used to estimate the state transition and emission probabilities, which are defined as follows (Cappae et al., 2005): M l;m ð6þ a l;m ¼ P n m 0 ¼1 M ; 8y t 0 ¼ g l ; y t 0 þ1 ¼ g m 2 G ð7þ l;m 0 b l ðkþ ¼ M k l P d k 0 ¼1 Mk0 l ; 8k 2 C; t 0 2fs t ; s t þ 1;...; tg; g l 2 G ð8þ where M l;m denotes the number of occurrences where x l;t 0 ¼ x m;t 0 þ1 ¼ 18t 0 2fs t ; s t þ 1;...; tg; 8k 2 C and M k l denotes the number of occurrences where gene g l has been at discrete state level k; 8t 2fs t ; s t þ 1;...; tg Viterbi algorithm When the expression time-series are modeled with an HMM, the maximum a posteriori (MAP) estimate could be used to find the time-delayed interactions of a pair of genes. The path begins and ends at the known states of genes: say, y st ¼ g i and y t ¼ g j. We assume that t s t is not very large and conditional independence between feature vectors. For a sequence of a set of genes, the most probable path is given by the MAP estimate: arg max y st :t pðy st:tjxþ ¼arg max pðxjy y st:tþpðy st:tþ st :t Viterbi algorithm (VA) is a dynamic programming procedure and determines the best path in an incremental manner. Let d m ðt 0 Þ be the probability of the most probable path ending at gene g m with the observation x m;t 0 at time t 0. Then, the best path at the next iteration is found as d m ðt 0 þ 1Þ ¼b m ðkþ max l d l ðt 0 Þa l;m ð9þ ð10þ

3 I. Chaturvedi, J.C. Rajapakse / Pattern Recognition Letters 31 (2010) We can divide the path probability by length of path to get a first-order probability as a goodness of fit of the path. Hence the skip-edge score is the normalized MAP interaction: 1 hðx i ; a i ; s t ; tþ ¼log ðt s t Þ max pðy y st:tjxþ st :t ð11þ where the parent set a i has only one gene at time point s t. Finally, for any pair of genes, the best time-delayed interaction has the highest probability: ^hðx i ; a i ; s t ; tþ ¼ max ðs t;tþ;t s t>o hðx i ; a i ; s t ; tþ ð12þ where g j 2 a i and o is predefined linear order. We find the corresponding delay if ^hðx i ; a i ; s t ; tþ is greater than a predetermined threshold Linear and skip features In order to handle both short- and long-delay interactions, we model gene regulations by using a linear model and a skip model. The skip-chain model is illustrated in Fig. 1. The linear features are given by dotted lines and skip features are given by solid bold lines. The two types of features are expressed in the likelihood of a gene expression x i as a weighted sum of linear and skip-edge scores. For gene g i : log pðx i ja i ; h i Þ/kf ðx i ; a iðt o:tþ ; tþþð1 kþhðx i ; a i ; s t ; tþ ð13þ where f ðx i ; a iðt o:tþ ; tþ and hðx i ; a i ; s t ; tþ represent the linear- and skipfeature functions and k is a weight determined heuristically. Linear-chain feature functions f ðx i ; a iðt o:tþ ; tþ represent local dependencies that are consistent with an o-order Markov assumption of gene expressions. The skip-chain feature functions hðx i ; a i ; s t ; tþ exploit the dependencies between genes that are arbitrarily distant at time instances s t and t respectively (Galley, 2006). It can model variable length Markov chain up to T 1 order. We use DBN to implement a linear-chain model and first-order HMM to implement the skip-chain model. The optimal delays can be found by using a GA to optimize the likelihood. For an o-order HDBN, we can have o ja ij structural possibilities for each gene where ja i j is cardinality of parent set. Hence, the search space and corresponding complexity is very high to find delays. On the other hand, skip models use VA to find the optimal delay and associated probability whose complexity is only quadratic on length of delay (Cappae et al., 2005) which is much smaller than the complexity of GA. 3. Implementation using a genetic algorithm A genetic algorithm (GA) is used to find the optimal network structure. The connectivity structure is given by the connection matrix fc i;j g nn where c i;j denotes the delay of gene g i regulating gene g j. The network is initialized by using mutual information (Zhengzheng and Dan, 2006). We randomize the order of genes during initialization for each individual. The cost function for finding the delays are given by the linear combination of linear- and skip features as given in Eq. (13). The Bayesian score of the graph of low orders can be calculated using Eq. (5). To account for longer delays, for any two genes g i and g j where c i;j > 1, we choose the highest Viterbi score among all the possible interaction features ^hðx i ; a i ; s t ; tþ. If the highest score is less than a threshold, no such skip-edge is determined. GA does optimization of search for the structure, by keeping a population of solutions to the connectivity structure. A random interpolation weight k < 1 can be appended to the individual. The GA then finds the best structure with the highest posterior probability for different combinations of linear score, skip score and k. Crossover and mutation introduce changes in the structure. Crossover involves swapping several rows and the weights between two parents resulting in possibly lower energy structures, while mutations alter a single cell. Here we run the GA for Q generations or if the change in score is less than a q 1 for 20 consecutive generations. As low lying structures can easily dominate the others leading to premature search convergence, a minimum similarity threshold of p s > 0:7 is maintained in each generation. The parameters of GA were found heuristically for maximum likelihood of the structure. 4. Experiments and results We evaluated our method on time-series gene expressions of yeast cell-cycle data obtained from Chou et al. (1998) (17 time points) and Spellman et al. (1998) (24 time points, cdc-15 cell cycle arrest). The yeast cell-division cycle consists of four main phases: genome duplication (S phase), and nuclear division (M phase), separated by two gap phases (G1 and G2). The S-G1-M-G2-S forms a cycle of cell duplication. The expression values were normalized and discretized into 1 for upregulation, 0 for un-regulation, and Fig. 1. State transition diagram illustrating six time points and four genes in a DBN. The states { 1,0,1} represent down-, un-, and up-regulated genes. The dashed edges are linear o ¼ 1; 2 order edges found by linear features. The solid edge represents a skip-edge over four time points.

4 2136 I. Chaturvedi, J.C. Rajapakse / Pattern Recognition Letters 31 (2010) to denote downregulation by using an approach described earlier (Shmulevich and Zhang, 2002). We use Chou dataset on nine genes to demonstrate the control of the sequential activation of cyclins and other cell-cycle regulators (Zhengzheng and Dan, 2006). Smaller subset of nine genes were used because validation with protein protein interaction data is practically not possible for large datasets. Similarly Spellman data was used on subset of genes in different phases. We use a GA to find the optimal structure. Simulation was done with an HDBN up to an order four and skip model with a maximum skip-edge length of 10 time points. We plotted the histogram of mutual information for each pair of genes for different time delays. The peak of the histogram was taken to determine the threshold for the presence of regulatory connections. The parameters of the GA and skip-edge weight were determined empirically for the best structure as given by the likelihood. Simulations were done at different numbers of individuals (N) and generations (Q) (N = 200/ 300/400 and Q = 300/400/500) for both HDBN and skip-chain model. The GA stops when the maximum number of generations is reached or if the score difference is below a certain value for 20 consecutive generations. We consider edges with probability over 0.7 over 20 runs of the GA in the final network. The mean and standard deviations were reported. It is observed that optimal k found by GA is larger for small networks where probability of skip-edge is low and is smaller for large networks where probability of skip interactions becomes higher due to longer cascades of genes. As seen from Table 1, HDBN of order four and skip-chain of order 1:10 have the highest likelihood in all datasets confirming that the network fits expression data well. The HDBN shows a peak of the interactions at delay 1 and 4. This indicates that most interactions are first-order or instantaneous, and the fourth-order may be insufficient to capture all higher-order interactions. In order to further validate our results, we look at the cascades of genes in the GRN, which corresponds to interactions in PPI network. On a subset of 19 S phase genes for which interaction are available in Bio- Grid, we can clearly see that our model gives higher number of true positives than DBN or HDBN. The true positives were relatively low because all the PPI were not available in the database. However, it could be used as an indicator to compare different models. We also see that the method is robust to an increase in the number of genes as most predicted interactions tend to show delays. Bigger networks like S (36) of 52 interactions had several eight time points delay. We also investigate the hubs nodes with high degree of connectivity in the network, which usually represent important nodes in causal networks. Table 2 gives a list of top 10 hubs of networks derived by different methods for 19 genes in S phase. The corresponding hubs in the BioGrid target network are also given. The top core genes produced by all methods seem the same and the core genes produced by the DBN and our method were quite similar. Further comparison of top 10 hubs predicted by a DBN, an HDBN and a skip-chain (Table 2) using Saccharomyces Genome Database (SGD) showed that while a DBN had hubs involved in instantaneous events such as initialization, silencing, etc., the time-delayed hubs in HDBN were mostly regulatory or feedback associated. For e.g. KIP1 (mitotic spindle assembly), MET6 (methionine synthesis) and MSB1 (suppressor of budding) emerge in DBN Table 1 First- and higher-order regulations predicted by DBN (d), HDBN (h), and skip-chain (d:h) models built on datasets by Chou et al. (nine genes) and Spellman et al. (for cycle S-G2- M). The number of genes in the dataset, (likelihood), total number of regulations, true positives with protein protein interaction data. # Genes Model (o) No of connections at different delays # Edges Validation Likelihood Chou (9) d (1) ± 3.13 h (4) ± 0.53 d:h (1:10) ± 0.77 S (19) d (1) ± h (4) ± 7.76 d:h (1:10) ± 0.80 S (36) d (1) ± 2.76 h (4) ± d:h (1:10) ± 2.22 G2 (33) d (1) ± 3.46 h (4) ± d:h (1:10) ± 0.71 M (60) d (1) ± 3.24 h (4) ± d:h (1:10) ± 1.13 Table 2 Top 10 hubs obtained for 19 genes in S phase of yeast cell-cycle, ranked in the order of their connectivity. The hubs obtained from the BioGrid are also given for comparison. Model (o) Rank of genes based on connectivity BioGrid HHF1 HTA1 HHT1 HTB2 HTA2 HTB1 HHT2 HHF2 ADA2 CIN d (1) HTB1 HHF2 HTA1 HHT2 HHT1 MET6 HTB2 KIP1 MSB1 TOF h (1) TOF2 HTB1 HHT2 HHF2 HHF1 HTA1 ADA d:h (1:10) HTB1 HHF2 KAR9 HHT1 MET6 HHT2 HTA2 DFG5 HTB2 TOF

5 I. Chaturvedi, J.C. Rajapakse / Pattern Recognition Letters 31 (2010) while TOF2 (simulates phosphatase activity) and ADA2 (coactivator) are seen in HDBN. Our skip-chain model showed a combination of both types of regulation such as KAR9 (positioning and orientation of spindle) and DFG5 (membrane protein for cell wall formation in buds). As seen in the Table 2, some hubs such as HHF1, HTB1, HTA2, and HHT1 or their homologs are conserved in all the models. These are histones required to initiate duplication by chromatin assembly and chromosome function. ADA2 and CIN8 (spindle assembly) were FP when compared with BioGrid. 5. Discussion and conclusion Pathways are often triggered by transcription factors which in turn express genes and produce proteins. Therefore, the regulatory interactions in molecular pathways can be given by GRN. Gene regulations generally include dynamic feedback loops, cascaded interactions, intermediary factors, etc., which provides for underlying biological mechanisms of regulation. This results in different time delays in regulatory interactions. We have considered higher-order DBN (HDBN) for representing delays in regulations. When larger delays are involved, implementation of HDBN becomes intractable. Therefore, we proposed a skip-chain HDBN. This involved two components: linear model to represent short-delays and skip model to represent long-delays. These two components may represent actions of activator and inhibitor involved in regulatory interactions. Our method was evaluated against earlier approaches, which shows our method better fits the gene expression data when GRN was built. In order to provide a more biological meaningful validation, we performed comparison with the protein protein interaction data. That validation also showed superiority of our technique over other methods but because of the incompleteness of PPI data sources, such comparisons results in large false positives. Skip-chain models address the difficulties of a HDBN by easily incorporating long-time-delayed regulations. The skip model is a first-order HMM and captures long-distance dependencies of input time-course gene expressions. This inference technique leads to lower computational times without loss in the accuracy compared to HDBN. The forward Viterbi path through the trellis determines the best long-distant time delay and therefore automatically finds the best higher-order interactions between genes. The method gives more accurate and biologically plausible networks than DBN implementations as short- and long-time delays are inherent in biological interactions. References Breitkreutz, B.-J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., Bahler, J., Wood, V., Dolinski, K., Tyers, M., The biogrid interaction database: 2008 update. Nucl. Acids Res. 36 (suppl. 1), D637 D640. Cappae, O., Moulines, E., Rydaen, T., Inference in Hidden Markov Models. Chaturvedi, I., Rajapakse, J., Detecting robust time-delayed regulation in Mycobacterium tuberculosis. BMC Genomics 10 (Suppl. 3), S28. Chou, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T.a., A genome-wide transcriptional analysis of the mitotic cell cycle. Mol. Cell 2, Friedman, N., Linial, M., Nachman, I., Pe er, D., Using bayesian networks to analyze expression data. J. Comput. Biol. 7 (3 4), Friedman, N., Murphy, K., Russell, S., Learning the structure of dynamic probabilistic networks. In: Proc. 14th Annual Conf. on Uncertainty in Artificial Intelligence (UAI-98), pp Galley, M., A skip-chain conditional random field for ranking meeting utterances by importance. In: Proc Conf. on Empirical Methods in Natural Language Processing (EMNLP 2006), pp Gebert, J., Motameny, S., Faigle, U., Forst, C.V., Schrader, R., Identifying genes of gene regulatory networks using formal concept analysis. J. Comput. Biol. 15 (2), Li, P., Zhang, C., Perkins, E.J., Gong, P., Deng, Y., Comparison of probabilistic boolean network and dynamic bayesian network approaches for inferring gene regulatory networks. BMC Bioinform. 8, S13 S20. Liu, B., Thiagarajan, P., Hsu, D., Probabilistic approximations of signaling pathway dynamics. In: Computational Methods in Systems Biology, pp Shmulevich, I., Zhang, W., Binary analysis and optimization-based normalization of gene expression data. Bioinformatics 18 (4), Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B., Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9 (12), Wagner, J., Stolovitzky, G., Stability and time-delay modeling of negative feedback loops. Proc. IEEE 96 (8), Zhengzheng, X., Dan, W., Modeling multiple time units delayed gene regulatory network using dynamic bayesian network. In: 6th IEEE Internat. Conf. on Data Mining Workshops 2006, ICDM Workshops 2006, pp

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Analyzing Microarray Time course Genome wide Data

Analyzing Microarray Time course Genome wide Data OR 779 Functional Data Analysis Course Project Analyzing Microarray Time course Genome wide Data Presented by Xin Zhao April 29, 2002 Cornell University Overview 1. Introduction Biological Background Biological

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM)

Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) Supplemental Information for Pramila et al. Periodic Normal Mixture Model (PNM) The data sets alpha30 and alpha38 were analyzed with PNM (Lu et al. 2004). The first two time points were deleted to alleviate

More information

Topographic Independent Component Analysis of Gene Expression Time Series Data

Topographic Independent Component Analysis of Gene Expression Time Series Data Topographic Independent Component Analysis of Gene Expression Time Series Data Sookjeong Kim and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,

More information

Learning Causal Networks from Microarray Data

Learning Causal Networks from Microarray Data Learning Causal Networks from Microarray Data Nasir Ahsan Michael Bain John Potter Bruno Gaëta Mark Temple Ian Dawes School of Computer Science and Engineering School of Biotechnology and Biomolecular

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Missing Value Estimation for Time Series Microarray Data Using Linear Dynamical Systems Modeling

Missing Value Estimation for Time Series Microarray Data Using Linear Dynamical Systems Modeling 22nd International Conference on Advanced Information Networking and Applications - Workshops Missing Value Estimation for Time Series Microarray Data Using Linear Dynamical Systems Modeling Connie Phong

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations

Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations Inferring Gene Regulatory Networks from Time-Ordered Gene Expression Data of Bacillus Subtilis Using Differential Equations M.J.L. de Hoon, S. Imoto, K. Kobayashi, N. Ogasawara, S. Miyano Pacific Symposium

More information

Learning from Sequential and Time-Series Data

Learning from Sequential and Time-Series Data Learning from Sequential and Time-Series Data Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/? Sequential and Time-Series Data Many real-world applications

More information

DESIGN OF EXPERIMENTS AND BIOCHEMICAL NETWORK INFERENCE

DESIGN OF EXPERIMENTS AND BIOCHEMICAL NETWORK INFERENCE DESIGN OF EXPERIMENTS AND BIOCHEMICAL NETWORK INFERENCE REINHARD LAUBENBACHER AND BRANDILYN STIGLER Abstract. Design of experiments is a branch of statistics that aims to identify efficient procedures

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree

Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree YOICHI YAMADA, YUKI MIYATA, MASANORI HIGASHIHARA*, KENJI SATOU Graduate School of Natural

More information

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012 Hidden Markov Models Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 19 Apr 2012 Many slides courtesy of Dan Klein, Stuart Russell, or

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian networks for multilevel system reliability

Bayesian networks for multilevel system reliability Reliability Engineering and System Safety 92 (2007) 1413 1420 www.elsevier.com/locate/ress Bayesian networks for multilevel system reliability Alyson G. Wilson a,,1, Aparna V. Huzurbazar b a Statistical

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries Anonymous Author(s) Affiliation Address email Abstract 1 2 3 4 5 6 7 8 9 10 11 12 Probabilistic

More information

An Evolutionary Programming Based Algorithm for HMM training

An Evolutionary Programming Based Algorithm for HMM training An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,

More information

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions. CS 88: Artificial Intelligence Fall 27 Lecture 2: HMMs /6/27 Markov Models A Markov model is a chain-structured BN Each node is identically distributed (stationarity) Value of X at a given time is called

More information

Comparison of Gene Co-expression Networks and Bayesian Networks

Comparison of Gene Co-expression Networks and Bayesian Networks Comparison of Gene Co-expression Networks and Bayesian Networks Saurabh Nagrecha 1,PawanJ.Lingras 2, and Nitesh V. Chawla 1 1 Department of Computer Science and Engineering, University of Notre Dame, Indiana

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

State Space and Hidden Markov Models

State Space and Hidden Markov Models State Space and Hidden Markov Models Kunsch H.R. State Space and Hidden Markov Models. ETH- Zurich Zurich; Aliaksandr Hubin Oslo 2014 Contents 1. Introduction 2. Markov Chains 3. Hidden Markov and State

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS 188: Artificial Intelligence Fall Recap: Inference Example CS 188: Artificial Intelligence Fall 2007 Lecture 19: Decision Diagrams 11/01/2007 Dan Klein UC Berkeley Recap: Inference Example Find P( F=bad) Restrict all factors P() P(F=bad ) P() 0.7 0.3 eather 0.7

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Introduction to Probabilistic Graphical Models

Introduction to Probabilistic Graphical Models Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Systems biology Introduction to Bioinformatics Systems biology: modeling biological p Study of whole biological systems p Wholeness : Organization of dynamic interactions Different behaviour of the individual

More information

Hidden Markov Models (recap BNs)

Hidden Markov Models (recap BNs) Probabilistic reasoning over time - Hidden Markov Models (recap BNs) Applied artificial intelligence (EDA132) Lecture 10 2016-02-17 Elin A. Topp Material based on course book, chapter 15 1 A robot s view

More information

Bayesian Network Structure Learning and Inference Methods for Handwriting

Bayesian Network Structure Learning and Inference Methods for Handwriting Bayesian Network Structure Learning and Inference Methods for Handwriting Mukta Puri, Sargur N. Srihari and Yi Tang CEDAR, University at Buffalo, The State University of New York, Buffalo, New York, USA

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

PARAMETER UNCERTAINTY QUANTIFICATION USING SURROGATE MODELS APPLIED TO A SPATIAL MODEL OF YEAST MATING POLARIZATION

PARAMETER UNCERTAINTY QUANTIFICATION USING SURROGATE MODELS APPLIED TO A SPATIAL MODEL OF YEAST MATING POLARIZATION Ching-Shan Chou Department of Mathematics Ohio State University PARAMETER UNCERTAINTY QUANTIFICATION USING SURROGATE MODELS APPLIED TO A SPATIAL MODEL OF YEAST MATING POLARIZATION In systems biology, we

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Antonina Mitrofanova New York University antonina@cs.nyu.edu Vladimir Pavlovic Rutgers University vladimir@cs.rutgers.edu

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems

A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi-Agent Systems Daniel Meyer-Delius 1, Christian Plagemann 1, Georg von Wichert 2, Wendelin Feiten 2, Gisbert Lawitzky 2, and

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Bayesian networks with a logistic regression model for the conditional probabilities

Bayesian networks with a logistic regression model for the conditional probabilities Available online at www.sciencedirect.com International Journal of Approximate Reasoning 48 (2008) 659 666 www.elsevier.com/locate/ijar Bayesian networks with a logistic regression model for the conditional

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data

State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data Mahdi Imani and Ulisses Braga-Neto Department of Electrical and Computer Engineering Texas A&M University

More information

Lecture 10: Cyclins, cyclin kinases and cell division

Lecture 10: Cyclins, cyclin kinases and cell division Chem*3560 Lecture 10: Cyclins, cyclin kinases and cell division The eukaryotic cell cycle Actively growing mammalian cells divide roughly every 24 hours, and follow a precise sequence of events know as

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks Twan van Laarhoven and Elena Marchiori Institute for Computing and Information

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods International Academic Institute for Science and Technology International Academic Journal of Science and Engineering Vol. 3, No. 6, 2016, pp. 169-176. ISSN 2454-3896 International Academic Journal of

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Learning Bayesian Networks for Biomedical Data

Learning Bayesian Networks for Biomedical Data Learning Bayesian Networks for Biomedical Data Faming Liang (Texas A&M University ) Liang, F. and Zhang, J. (2009) Learning Bayesian Networks for Discrete Data. Computational Statistics and Data Analysis,

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Protein Complex Identification by Supervised Graph Clustering

Protein Complex Identification by Supervised Graph Clustering Protein Complex Identification by Supervised Graph Clustering Yanjun Qi 1, Fernanda Balem 2, Christos Faloutsos 1, Judith Klein- Seetharaman 1,2, Ziv Bar-Joseph 1 1 School of Computer Science, Carnegie

More information

Physical network models and multi-source data integration

Physical network models and multi-source data integration Physical network models and multi-source data integration Chen-Hsiang Yeang MIT AI Lab Cambridge, MA 02139 chyeang@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA 02139 tommi@ai.mit.edu September 30,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Modeling Gene Expression from Microarray Expression Data with State-Space Equations. F.X. Wu, W.J. Zhang, and A.J. Kusalik

Modeling Gene Expression from Microarray Expression Data with State-Space Equations. F.X. Wu, W.J. Zhang, and A.J. Kusalik Modeling Gene Expression from Microarray Expression Data with State-Space Equations FX Wu, WJ Zhang, and AJ Kusalik Pacific Symposium on Biocomputing 9:581-592(2004) MODELING GENE EXPRESSION FROM MICROARRAY

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 Cluster Analysis of Gene Expression Microarray Data BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002 1 Data representations Data are relative measurements log 2 ( red

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Particle Filters and Applications of HMMs Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Roman Barták Department of Theoretical Computer Science and Mathematical Logic Summary of last lecture We know how to do probabilistic reasoning over time transition model P(X t

More information

The Monte Carlo Method: Bayesian Networks

The Monte Carlo Method: Bayesian Networks The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks

More information

Exploratory statistical analysis of multi-species time course gene expression

Exploratory statistical analysis of multi-species time course gene expression Exploratory statistical analysis of multi-species time course gene expression data Eng, Kevin H. University of Wisconsin, Department of Statistics 1300 University Avenue, Madison, WI 53706, USA. E-mail:

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas Hidden Markov Models Vibhav Gogate The University of Texas at Dallas Intro to AI (CS 4365) Many slides over the course adapted from either Dan Klein, Luke Zettlemoyer, Stuart Russell or Andrew Moore 1

More information

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example CS 88: Artificial Intelligence Fall 29 Lecture 9: Hidden Markov Models /3/29 Announcements Written 3 is up! Due on /2 (i.e. under two weeks) Project 4 up very soon! Due on /9 (i.e. a little over two weeks)

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources Antonina Mitrofanova New York University antonina@cs.nyu.edu Vladimir Pavlovic Rutgers University vladimir@cs.rutgers.edu

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 18: HMMs and Particle Filtering 4/4/2011 Pieter Abbeel --- UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Hub Gene Selection Methods for the Reconstruction of Transcription Networks for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for

More information

Fast and Accurate Causal Inference from Time Series Data

Fast and Accurate Causal Inference from Time Series Data Fast and Accurate Causal Inference from Time Series Data Yuxiao Huang and Samantha Kleinberg Stevens Institute of Technology Hoboken, NJ {yuxiao.huang, samantha.kleinberg}@stevens.edu Abstract Causal inference

More information