Non-linear dimensionality reduction analysis of the apoptosis signalling network. College of Science and Engineering School of Informatics

Size: px
Start display at page:

Download "Non-linear dimensionality reduction analysis of the apoptosis signalling network. College of Science and Engineering School of Informatics"

Transcription

1 Non-linear dimensionality reduction analysis of the apoptosis signalling network Sergii Ivakhno Thesis College of Science and Engineering School of Informatics Masters of Science August 2007 The University of Edinburgh

2 Acknowledgments I would like to express my deepest gratitude to my supervisor, Dr. Douglas Armstrong, for his support, guidance, and encouragement throughout my graduate research. My thanks also go to Dr. Douglas Lauffenburger at the Massachusetts Institute of Technology, Biological Engineering Division, where I have completed a sizable portion of this thesis work. My parents, Sergii and Irina, receive my deepest gratitude and love for their dedication and the many years of support during my current and previous studies that provided the foundation for this work. I also thank Dr. John Albeck and Dr. Kevin Janes for useful discussions and comments on the thesis manuscript. Finally, I would like to acknowledge the support of several funding bodies that enabled my four-month research visit to MIT: European Molecular Biology Organization (EMBO) short term-fellowship and Scottish International Education Trust (SIET) travel grant. Erasmus Mundus fellowship from European Commission supported my whole period of study and research in the EuMI master program. ii

3 Abstract Systems wide modelling and analysis of signalling networks is essential for understanding complex cellular behaviours, such as the biphasic responses to different combinations of cytokines and growth factors. For example, tumour necrosis factor (TNF) can act as a proapoptotic or prosurvival factor depending on its concentration, the current state of signalling network and the presence of other cytokines. To understand combinatorial regulation in such systems, new computational approaches are required that can take into account non-linear interactions in signalling networks and provide tools for clustering, visualization and predictive modelling. I extended and applied an unsupervised nonlinear dimensionality reduction approach, Isomap, to find clusters of similar treatment conditions of the apoptosis signalling network in human epithelial cancer cells treated with different combinations of TNF, epidermal growth factor (EGF) and insulin. For the analysis of the apoptosis signalling network I used the Cytokine compendium dataset where activity and concentration of 19 intracellular signalling molecules were measured to characterise apoptotic response to TNF, EGF and insulin. By projecting the original 19-dimensional iii

4 space of intracellular signals into a low-dimensional space, Isomap was able to reconstruct clusters corresponding to different cytokine treatments that were identified with graph-based clustering. In comparison, Principal Component Analysis (PCA) and Partial Least Squares - Discriminant analysis (PLS-DA) were unable to find biologically meaningful clusters. It is showed that by using Isomap components for supervised classification with k-nearest neighbour (k-nn) and quadratic discriminant analysis (QDA), apoptosis intensity can be predicted for different combinations of TNF, EGF and insulin. Prediction accuracy was highest when early activation time points in the apoptosis signalling network were used to predict apoptosis rates at later time points. To summarize, in this thesis I developed and applied extended Isomap approach for the analysis of cell signalling networks. Potential biological applications of this method include characterization, visualization and clustering of different treatment conditions (i.e. low and high doses of TNF) in terms of changes in intracellular signalling they induce. The material presented in this thesis has been also published in the following two journal articles and presented at the international conference. Ivakhno S, Armstrong JD. Non-linear dimensionality reduction of signalling networks. BMC Syst Biol Jun 8;1:27 [full content is given in the appendix]. Ivakhno S.S. From functional genomics to systems biology. FEBS J May;274(10): Epub 2007 Apr 19. Ivakhno S.S., Lauffenburger D. A., Armstrong J. D. Non-linear dimensionality reduction of the apoptosis signalling network activated by TNF, EGF and Insulin. Poster at the 3rd EMBL Biennial Symposium: From Functional Genomics to Systems Biology 2006, Heidelberg, Germany iv

5 Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. v

6 Contents Acknowledgments ii Abstract iii Declaration v Chapter 1 Introduction and the structure of the thesis 1 Chapter 2 Literature Survey Types of biological networks Machine learning methods used for the analysis of biological networks Gene networks Enhanced gene networks and regulatory networks Signalling networks: Biological perspective Signalling networks: Machine learning approaches Systems Biology methodology used for the analysis of biological networks vi

7 2.4 Dimensionality reduction techniques Chapter 3 Dimensionality reduction analysis of apoptosis signalling network Apoptosis signalling network and the Cytokine data compendium Main aims of the thesis research Cytokine compendium dataset Data representation and transformation PCA and PLS-DA could not find low dimensional embedding of apoptosis signalling networks Chapter 4 Feature selection and supervised analysis of Cytokine compendium Features of cytokine clusters found by Isomap Interpretation of components recovered by Isomap Supervised comparison of PCA and Isomap Chapter 5 Discussion and conclusions Extended Isomap can be used for visualization, predictive and descriptive modelling of apoptosis signalling network Factors contributing to Isomap algorithm performance Conclusions Chapter 6 Methods and Algorithms 54 vii

8 6.1 Cytokine compendium data normalisation and transformation Extended Isomap approach Original Isomap and determination of k nearest neighbours Graph-based clustering in the Isomap components Interpretation of Isomap projections by neural networks Choice of class labels Classifiers comparison PLS Discriminant Analysis Choice of Principal components (PC) using cross validation Guide to the software Bibliography 63 Appendices 69 Appendix A 70 Appendix B Article 74 viii

9 Chapter 1 Introduction and the structure of the thesis It is now recognized that the impact of biology in the 21st century will be similar in scope to contributions of physics in the 20th century: from the creation of artificial bacterial systems to produce alternative sources of energy to cheap sequencing and ready interpretation of each person s genome. Such dramatic progress in molecular biosciences in the past 50 years (after discovery of DNA structure) is attributable to technological advances such as creation of new methods for sequencing genomes and other methods for studying cells at the molecular level. The advancement of genomic/post-genomic era and generation of large datasets lead to the convergence of Biology with Information Technologies, Computer Science, Statistics and mathematical modelling and emergence of new disciplines of Bioinformatics and Systems Biology. The execution of body functions comes from coordinated actions of thousands of 1

10 networks of genes and proteins ever varied and changing in response to the environmental stimuli. Changes in such networks can lead to devastating diseases such as cancer, where through mutations and other mechanisms the normal functions of the part of the network changes and lead to the abnormal cell growth and malignant phenotype. Almost all diseases can be viewed as malfunctioning of biological networks, but our organisms have evolved multiple mechanisms to combat such abnormal changes, i.e. immune system. As a result of advancements in large scale high-throughput experiments, large number of data sets is available in proteomics, genomics and transcriptomics for data analysis and interpretation. By means of statistical genomics individual variation in susceptibility to common diseases can be investigated, whereas Systems Biology modelling allows designing better drugs for cancer treatment [1, 2]. Construction and understanding of biological networks represent the first steps toward modelling cellular activities in time and space. Network biology provides a simple conceptual representation of biological data in terms of interactions between various biological entities and has a nice property of representing complex biological relations indiscernible in large scale data sets in a lucid interpretable way. In this framework, the grand challenge of network biology is to combine different biological data to create predictive model of cellular behaviour. This leads to the realm of statistics, machine learning and data modelling. Machine learning approaches play a significant role in network biology by allowing quick automatic reconstruction and of biological networks 2

11 from high-throughput datasets and follow-up data analysis. Broadly, the topic of my thesis is the application of machine learning methods to the analysis of biological networks. In particular, the thesis are devoted to the application and modification of a non-linear dimensionality reduction techniques to data on the response of the cell s several biological networks (TNF and cytokine signalling) to the treatment with combination of various compounds with the aim to understand how cells respond to them in terms of signalling network s state rather than observable phenotype of the cell. This might have a practical application particularly for development of cancer drugs that target various proteins in signalling networks that become abnormal. The thesis is structured as follows. In Chapter 1 I start with the description of various biological networks and current literature survey of Networks Biology, which places my work in the broad context of current research directions in the field. This is followed by closer description of signalling networks and a particular network of TNF signalling that was investigated in this work. Chapter 2 begins with the description of the dataset and the pre-processing and normalisation steps. This is followed by general description of non-linear dimensionality reduction approach that was used in this work. The main aim of the thesis was to investigate if nonlinear dimensionality reductions methods can give greater biological incites and understanding of biological processes, than simpler linear dimensionality reduction methods such as Principal Component Analysis (PCA). Consequently, Chapter 3 includes more detailed il- 3

12 lustration of results and evaluates modified Isomap algorithm in terms of its biological discoveries and already established biological knowledge play the role of the benchmark. While Chapter 3 deals more with biological interpretations, Chapter 4 places them in the broader context of application of dimensionality reduction techniques to biological networks and discusses their usefulness in general. The major theme of this theses is gaining new biological knowledge through the use of advanced machine learning methods. Therefore, the technical details of algorithms and general design of machine learning experiment (statistical analysis, cross-validation), are summarized at the end of the thesis in Chapter 5 to preserve the flow of biologically centred analysis through introduction, results and discussions (Chapters 1-4). Various subsections in the thesis are introduced to make the navigation easier. 4

13 Chapter 2 Literature Survey 2.1 Types of biological networks There exist a great variety of biological networks that can capture different levels of cellular complexity. These networks differ from each other with respect to the type of high-throughput technology used to generate data (i.e. DNA microarrays, yeast-two hybrid system or LC-MS/MS-based proteomics), approaches to the network reconstruction (automatic network generation or manually created), additional type of information that networks can provide (dynamic flow of information through edges as in ODE models of signalling networks). Below different types of biological networks are listed and their aforementioned properties are briefly discussed. Protein-protein interaction networks. These networks represent either direct or indirect (a part of protein complex) physical interactions be- 5

14 tween proteins. In most cases protein interaction networks are static and show only small subset of the true biological interactions, although by integrating various data sources it was recently possible to reconstruct dynamics of protein complex formation during cell cycle [3]. Proteinprotein interaction networks can be created directly from the results of high throughput experiments, manually from literature curation or using computational approaches such as text mining. Genetic networks. In genetic networks protein interaction are defined indirectly at the gene level when mutation in one gene affects the phenotypic expressivity of other genes. Genetic networks can be constructed from synthetic lethal screens, where the simultaneous mutations in two genes lead to a lethal phenotype. As Protein-protein interaction networks, genetic networks usually represent static interactions. However, information flow can be readily traced through the network when epistasis analysis is used, in which case temporal order of protein interactions can be established [4]. Genetic networks can be reconstructed manually (for small networks) or using clustering and machine learning algorithms. Gene (expression) networks. Gene networks use DNA microarray data in combination of other techniques to find general relations at the gene level. The basic idea in gene network reconstruction is that of coexpression - if two genes show similar expression profiles, they are supposed to follow the same regulatory regime and are therefore connected. In case of a more elaborate perturbation experiments (i.e. RNAi) direct casual 6

15 interactions between genes can be inferred. Gene networks can be both dynamic and static and are usually reconstructed using various machine learning approaches. Transcription (regulatory) networks. These networks explicitly represent gene regulation at the transcriptional level by using information on DNA-protein interactions. Transcription networks are always directed and can be cyclic if two transcription factors regulate expression of each other. They can be reconstructed either manually from the literature or automatically by using high-throughput technologies for detecting regulatory proteins that bind to promoter regions (Chip technology) [5]. Signalling network. Signalling networks represent information flow in signal transduction pathways. In case of signalling networks the influences between regulatory molecules (such as protein kinases or phospholipids) are usually directed and dynamical so that information flow can be determined in temporal manner. Most of the signalling networks have been reconstructed by direct analysis of literature [6], although recently developed high-throughput techniques allow automatic determination of signalling network structure using machine learning techniques. Metabolic networks. Metabolic networks represent biochemical reactions pathways within the cell, where the nodes designate enzymes and/or metabolites and edges represent metabolic reactions. Metabolic networks are usually reconstructed manually from databases such as KEGG or 7

16 biological literature. These networks are directed and show dynamical events of metabolite conversion in biochemical pathway. Developmental networks. Developmental networks represent complex and hierarchical level of gene regulatory circuits during development [7]. They are highly similar to transcription networks in that most regulatory influences are captured at the DNA-protein interactions level [8]. However, these networks have an additional complexity of explicitly representing spatiotemporal regulation. Since many levels of developmental processes (i.e. special and temporal) are represented created manually. 2.2 Machine learning methods used for the analysis of biological networks It is not the purpose of this review to provide comprehensive literature analysis of machine learning methods used for the generation and analysis of biological networks. Rather, my aim in the next section is to provide brief overview of computational analysis approaches to gene and signalling networks. Although the main work of this thesis is devoted to the non linear dimensionality reduction analysis of signalling networks, the material on the gene networks is introduced first since due to the technological ease of generating gene expression data, the field of computational analysis of biological networks is much more established and provides already developed tools that can easily be adapted for the analysis of signalling networks. 8

17 2.2.1 Gene networks Reverse engineering (automatic reconstruction) of gene networks has become an active area of research since the pioneering work of Friedman et al [9] on reconstructing network of the Saccharomyces cerevisiae cell cycle using Bayesian networks approach. Many different computational techniques have been applied to the problem of reconstruction of gene networks, such as differential equations, Boolean networks and probabilistic graphical modelling approaches. Excellent early reviews on this topic can be found in De Jong [10]. The simplest way to reconstruct gene network is by utilizing simple coexpression network approach based on similarity measures such as correlation. In this case two genes become linked in the network by an edge if they have a similar expression profiles. Correlation networks are easy to interpret and can be accurately estimated even if the number of genes is much larger than the number of samples. One of the most successful applications of Correlation networks is found in Stuart et al. [11]; they built a graph from coexpression microarray data across multiple organisms (humans, flies, worms and yeast) and found that many coexpression relationships are conserved over evolution. The largest limitation of coexpression networks is that they can not distinguish between direct and indirect dependencies, for instance they would not be able to test if the regulation between two genes is mediated by some other third gene. To find such dependencies first order conditional independence models must be applied, in which genes can be regulated through one or few intermediates. Recently Basso et al. [12] built such model based on conditional mutual information as 9

18 a coexpression measure. The resulting method, called ARACNe, was successfully applied to reverse engineering of gene network from expression profiles of human B cells related to germinal centres. The authors used a wide range of microarray data sources resulting in 336 distinct B cell phenotypes that allow them to achieve high dynamic range of gene expression values and therefore build correlation network with high statistical confidence. Furthermore, by considering in details subnetwork around protooncogene MYC they were able to experimentally verify several proposed gene interactions, showing that new results can be obtained by judicious data integration schemes in combination with appropriate machine learning algorithms Enhanced gene networks and regulatory networks The accuracy of the reconstructed gene network can be improved by integrating microarray data with other sources, which can be achieved by appropriate modification of Bayesian networks or the use of other probabilistic graphical models. Such data integration can also lead to the increase in the accuracy of reconstructed protein-protein interaction networks and transcriptional networks. Below I consider data integration with respect to gene and transcriptional networks with protein-protein interaction networks reviewed in later sections. The common type of additional information that can be incorporated into gene networks is the sequence data for promoter elements. This integration is based on assumption that coexpressed genes should share common 10

19 regulatory regions in their promoter sequence, which can be considered as a promoter module of set active motifs. First approach to data integration based on probabilistic relational models (PRM) was proposed by Segal et al [13]. PRM differ from Bayesian networks in two major respects. First, these graphical models incorporate the notion of objects and relations between them, allowing one to explicitly integrate into the network different data entities: gene expression values, transcription factor binding sites sequences and modular composition of promoter regions. Second, they impose constraints on the kind of interactions that are possible between different objects, for instance gene expression values only depend on the composition of promoter regions. The model itself works as follows: first genes are partitioned into clusters by applying probabilistic clustering algorithm, followed by identification of common motifs in the promoter regions of coclustered genes using discriminative model for motif detection [14]. Then genes are reassigned to the clusters based on the number of motifs in such a way that two genes that share similar motifs in their promoter regions tend to get into the same cluster. Since PRM are the fully probabilistic models, Expectation Maximisation algorithm can be applied to find a set of motifs that regulate particular gene. Therefore, the major goal of applying PMR is to improve detection of motifs and motif clusters in promoter regions rather than to learn structure of gene networks. The probabilistic model gave a two fold improvement for motif detection over standard clustering approaches. Unfortunately, the identity of transcription factors that bind to promoter regions remain unknown with this approach, 11

20 although can be incorporating additional CHIP-chip data it becomes possible to infer the structure of transcription networks. The approach proposed by Segal has been extended in a number of different ways. Beer and Beer et. al used Bayesian networks to model probabilistic dependence of motif modules on gene expression values and incorporated additional constraints on the distribution of motifs in the promoter region [15]. They have defined a common promoter module for a set of gene such that motifs in the module are assembled by the set of AND, NOT, OR rules and made additional constraints on motif length, orientation, and relative position to the transcription start site (i.e. promoter module combined through the AND rule that incorporates particular motifs at a certain distance from the start site). It should be noted that gene network reconstructed by this approach is necessarily directed and represents influences of promoter module on the set of genes that it regulates. Additional constraints have significantly improved detection of promoter modules and Beer and Tavasoi reported that with this approach gene expression profiles can be reproduced with 73 percent accuracy. Furthermore, by conducting similar analysis for Caenorhabditis elegans they showed that their method of assembling genes into clusters based on the organisation of promoter module can be successfully applied to the detection of regulatory rules in higher eukaryotes. Apart from Bayesian networks approach, a number of regression trees methods have also been applied for motifs detection and reconstruction of regulatory (transcription) networks [16-18]. In the model proposed by Phuong et al for learning regulatory network of yeast cell cycle 12

21 data, motifs act as predictor variables and are assigned to the internal nodes of decision trees whereas leaf nodes designate gene expression values. During the process of regression tree learning the promoter module is assigned to each gene dynamically based on the tree splits for motifs Signalling networks: Biological perspective Having considered current methodology for reconstruction of gene networks I will now concentrate specifically of the application of machine learning methods to signalling networks, starting with the description of signalling networks themselves from biological perspective. Briefly, signalling networks execute the signal transduction process, which refers to any process by which cells converts one kind of signal or stimulus into another, most often involving ordered sequences of biochemical reactions inside the cell, that are carried out by enzymes and linked through second messengers resulting in what is thought of as a second messenger pathway. In many signal transduction processes, the number of proteins and other molecules participating in these events increases as the process emanates from the initial stimulus, resulting in a signal cascade and often results in a relatively small stimulus eliciting a large response. As the signalling networks are censoring and executional devices of the cell, their investigation merits particular attention. Multicellular organisms use a broad repertoire of extracellular signalling molecules for communication between and within cells. The specific binding of an extracellular signalling molecule (ligand) to its receptor on a target cell triggers a 13

22 specific response. This consists of a series of mutually activating or inhibitory molecular events, called a signal transduction pathway (or signalling pathway). Growth factors are important signalling molecules comprising a large group of secreted proteins. Figure 1 shows a graphical depiction of the general signalling pathway and below I briefly described the series of events comprising signal transduction that is depicted in the figure. (1). Specific growth factor binds with high specificity to a cell surface receptor protein (2). This activates intracellular signal transduction proteins (3) and initiates a cascade of activations of responsive proteins (often by phosphorylation) that act as second messengers (4). Hormones are small signalling molecules (5) that arrive via the bloodstream. They enter the cell either by diffusion or by binding to a cell surface receptor (6). Some hormones bind to an intranuclear receptor (7). Activated transcription factors (8) together with cofactors initiate transcription (9). Prior to transcription, an elaborate system of DNA damage recognition and repair mechanisms (10) checks DNA integrity (cell cycle control, 11). Cell division, as represented in the figure (or other biological process) proceeds if faults in DNA structure have been repaired; if not, the cell is sacrificed by apoptosis (cell death, 12) Signalling networks: Machine learning approaches A number of machine learning approaches were applied for modelling biological networks. They can be broadly classified into the following categories: Structure-based learning (i.e. Bayesian networks) 14

23 Figure 1: Schematic outline of the signalling pathway. Various steps of the signal transduction process are explained in the main text. Taken from reference [22] 15

24 Predictive modelling (regression and classification algorithms) Dimensionality reduction and unsupervised learning (including feature selection) In this and the next few subsections I will briefly describe each approach in turn before focusing on the main theme of the thesis. Both probabilistic graphical modelling and other machine learning algorithms were used for the Structure-based analysis of this type of signalling networks. The differences between reverse engineering approaches for gene vs. signalling networks is attributed to both biological and technological factors. As has been accurately pointed out by Gaudet et al [19] higher complexity of protein chemistry in comparison to DNA and RNA chemistry makes it impossible to develop a single high-throughput technique for studying signalling pathways. As a consequence a combination of various specific assays is required to delineate information flow in signal transduction, which makes data acquisition necessarily slow. Therefore, to date the reconstruction of signalling networks served a rather different purpose than in the case of gene networks - whereas in gene networks the main goal is the discovery of new gene relationships and network structure, in signalling networks the objective is not only accurate reconstruction of network structure, but also the prediction of network dynamic response to various perturbations (regression problem). Another noteworthy difference is the size of reconstructed networks, due to data scarcity for signalling network their size reported in literature has not exceeded 30 proteins. 16

25 Both Bayesian networks and Causal networks have been used for reconstruction of signal transduction pathways. Using Bayesian networks approach signalling network inference Causal networks (Bayesian network with directed edges allowed) were used to automatically reconstruct a map of human primary T cell signalling based on muliparameter flow cytometry data [20]. Human primary naive CD4+ T cells were exposed to series of stimulatory cues and inhibitory interventions (i.e. G06976 inhibitor of PKC isozymes) followed by flow cytometry measurements of 11 phosphorylated proteins and phospholipids. Unexpectedly, excellent performance has been reported - of the 17 arcs in the predicted network, 15 were expected, all 17 were either expected or reported already in the literature and only 3 were missed. Reconstructed network also dismissed indirect connections that were already explained by other network arcs, thereby producing robust network structure. The authors reported three features that distinguish this dataset from the majority of currently available biological datasets and attributed to the excellent performance of Causal networks approach. First, simultaneous measurement of multiple protein states in individual cells eliminated population-averaging effects that could have otherwise obscured interesting correlations. Second, because the measurements were made on single cells, thousands of data points were collected in each experiment. This feature constitutes a tremendous asset for Bayesian network modelling, because the large number of observations allowed for accurate assessment of underlying probabilistic relationships, and therefore extraction of complex relationships from noisy data. Third, interventional as- 17

26 says with protein inhibitors/stimulators made it possible to perform causal inference and reconstruct directed graph, which would have been impossible in other experimental settings. Brent et all compared this use of inhibitors in delineating signalling networks structure to epistatsis analysis of biochemical and vesicle transport systems in classical genetics [21]. It is noteworthy, that removing through simulation or additional experiments any of these features significantly decreased the accuracy of reconstructed network; with population averaging (simulated Western Blotting) unexpectedly showing the worst performance. Finally, using RNAi authors confirmed several poorly established regulatory phosphorylations reconstructed by the model. Unfortunately, muliparameter flow cytometry at present can not detect temporal phosphorylation patterns, resulting in few missed feedback loops that could have potentially been found by DBN technique. Bayesian networks approach allow reconstructing signalling networks structure and performing general regression analysis, however without being true regression algorithm they lack many futures available in statistical regression models, such as selection of most predictive protein signals in the network (feature selection problem). As was mentioned previously, the goal in signalling networks analysis is often to find proteins/molecular signals that best determine network response to external cues. An example of such approach is an analysis of combinatorial interactions of several cytokine/growth factors that determine apoptosis/survival cellular response, or similar analysis of other signalling systems exhibiting biphasic activity. To address such 18

27 problem new experimental approach must be developed to effectively sample activities of molecular signals in the networks. It should offer capability to perform treatment with varying combinations of cytokines/growth factors without loosing interpretability, measure activity of different modules within signalling network and provide quantitative information about the output signals (i.e. apoptosis). Furthermore, the technique should have robust normalisation routines and be amenable to statistical analysis and/or mechanistic modelling. All these requirements have recently been addressed by the study of Gaudet et al, whose main dataset was used for algorithms development and testing in this thesis. 2.3 Systems Biology methodology used for the analysis of biological networks Before proceeding with description of the approach taken by Gaudet et al (Chapter 3), the general principles of regression - based machine learning applications to signalling networks should be outlined in greater depth. The typical approach for studying cellular signalling in such experiments is based on the notion of cues - signals - responses paradigm and involves several steps in experimental design and computational modelling (Figure 2A) [19]. First, one or several signal transduction cascades are chosen depending on the questions that are addressed. For example, the apoptosis signalling network can be selected to study different rates of apoptosis in cancer cells. 19

28 Since even a small number of signalling cascades may include hundreds of proteins and other signalling molecules, this step also involves selection of the smaller subset of signalling proteins, signals, which are believed to be the most relevant for the regulation of a signalling network (based on the background biological knowledge and availability of appropriate high-throughput technology). As a final step in this design phase, the choice for specific perturbations - cues - is made to induce changes in the information flow through the signalling network and a number of specific cellular responses are assayed to analyze output of the network. For example, in the apoptosis signalling networks, different combinations of cytokines and growth factors can act as cues and assays measuring apoptosis intensity can act as responses. In the second step the activity and concentration of signalling molecules as well as corresponding cellular responses are measured experimentally across different cues/treatment conditions. Possible experimental approaches include western blotting, high-throughput kinase activity assays and protein microarrays [22]. Assays for quantification of cellular responses vary depending on the specific application and may include measurements of cell migration, overall cell integrity or secretion of specific ligands. The third and final step involves data analysis that addresses issues of building predictive and descriptive models of signalling networks. For instance, principal component analysis has been used to find how different cues and treatment conditions are positioned in the low-dimensional subspace of intracellular signals [23]. Alternatively, information on the activity and amount of 20

29 intracellular signalling molecules was used to build regression model for prediction of cellular responses and selection of the most informative for classification subset of signals (feature selection) [24]. It should be noted that three steps of the systems biology methodology outlined above can be extended in various ways, for example the raise in activity of signalling molecules in response to different cues can be measured along many time points. 2.4 Dimensionality reduction techniques The main purpose of dimensionality reduction techniques in machine learning and statistics is to reduce the number of variables and dimensions with the aim to extract new information (feature extraction), improve interpretability of the data and enhance exploratory data analysis. Dimensionality reduction can be Figure 2 (following page): Features of the Cytokine Compendium and the cues - signals - responses paradigm. A. Cues - signals - responses paradigm for the design and execution of systems biology experiments for the analysis of signalling networks. Cells are exposed to perturbations (cues) and molecular signalling molecules and cellular responses are assayed, followed by application of multivariate statistical and machine learning data analysis techniques. B. Schematic representation of the apoptosis signalling network induced by TNF, EGF and insulin. On the diagram arrows indicate the type of interaction: activation (green), inhibition (red) and slow process (blue). The measured proteins (molecular signals) are highlighted in yellow. Red circles, triangles, and rectangles indicate kinase assay, antibody array, and western blotting measurements, respectively. The decomposition of the network into different pathways is shown by different colours (reproduced with modifications with permission from [5]). C. Distinction between one-to-one and many-to-one mapping functions between signals and responses. In the former case cellular response is measured for each time point of a particular treatment condition (cue), in the later - one cellular response is measured after several time points of the treatment condition. 21

30 A JNK B Cytokines Growth Factors Adhesion molecules AKT p38 ERK NF-kB Cell migration Apoptosis Proliferation Cues Signals Responses Caspase Activation Pathway Bid Bax CytoC Apaf Casp. 9 Mitochondrial Pathway Bcl-2 Bcl-xl SubG1 DNA content Bad TRADD FADD C P Casp. 8 P Casp. 3 Caspase-3 & cytokeratin cleavage TNF TRAF2 RIP T N F R 1 NFkB Pathway claps TRADD MAP3K IKK lkb NFkB NFkB genes XIAP PS exposure Grb 2 Sos p38 MK2 HSP27 L- FLIP p38 pathway Membrane permeabilization S- FLIP RAF S MEK ERK Rsk EGF GAP Shc ERK pathway AP-1 AP-1 genes P ase E G F R PI3K Grb2 Sos Ras Y S S Akt JNK1 Rac MEKK1 JNKK P ase S FKHR AFX Ins JNK pathway IR Shc Grb2 Sos PTEN Y S IRS1 PI3K Ptdlns PDK1 S6K mtor Rheb TSC2 TSC1 Forkhead genes Akt pathway Kinase assay Immunoblot Antibody array C: cleaved P: pro S: phospho-s Y: phospho-y Activation inactivation Slow C time time responses cues responses cues signals signals Figure 1 one-to-one mapping many-to-one mapping 22

31 used on its own or in combination with other methods. Broadly, dimensionality reduction algorithms are subdivided into linear and non-linear dimensionality reduction. Linear methods assume that data come from some linear subspace, whereas non-linear methods allow presence of non-linear manifold. The classical techniques for linear dimensionality reduction, such as principal component analysis (PCA) and multidimensional scaling (MDS), are simple to implement, efficiently computable, and guaranteed to discover the true structure of data lying on or near a linear subspace of the high-dimensional input space (13). PCA finds a low-dimensional embedding of the data points that best preserves their variance as measured in the high-dimensional input space. Classical MDS finds an embedding that preserves the interpoint distances, equivalent to PCA when those distances are Euclidean. However, many data sets contain essential nonlinear structures that are invisible to PCA and MDS, for instance rotated face images of the same individual with intrinsic 3-dimensional degree of freedom [26]. A number of non-linear dimensionality reduction methods, such as Isomap [26], have been developed to address these challenges (detailed description of the Isomap approach to dimensionality reduction is provided in the methods section of the thesis). 23

32 Chapter 3 Dimensionality reduction analysis of apoptosis signalling network 3.1 Apoptosis signalling network and the Cytokine data compendium In the light of the general introduction to the signalling networks and systems biology approach of studying them described above, I will now discuss specific signalling network that was analyzed in this thesis. In this study I investigated the relationship between cues, signals and responses using apoptosis signalling network in human adenocarcinoma cells. To analyze the apoptosis network I considered a previously published protein signalling dataset known as the 24

33 Cytokine compendium first described by Gaudet et al [19], for which quantitative western blotting, high-throughput protein kinase assays and protein microarrays were used to investigate the combinatorial effect of tumour necrosis factor (TNF), epidermal growth factor (EGF) and insulin on apoptosis of human adenocarcinoma cells. Following is the brief description of experimental design and the major questions addressed in that study. To investigate how TNF-related signalling networks coordinate cellular responses, they have established multi-input system where HT-29 colon cancer cells were stimulated with combination of TNF, EGF, and insulin [19]. Since the components of intracellular protein network downstream of these cytokine inputs are understood in reasonable detail, they chose 19 key molecular signals - receptor, kinase, caspase, and adaptor proteins within signalling networks for activity measurements, thereby addressing the issue of sufficient signalling nodes coverage. Multiplex kinase assays, antibody microarrays and quantitative immunoblots have been used to permit reliable identification of protein activity [22]. With nine distinct pair wise combinations of TNF, EGF, and insulin they sampled each molecular signal in triplicate at 13 time points between 0 and 24 hours to compile 7980 distinct molecular signals from the shared intracellular network. The whole dataset was called Compendium of Signals and Responses Triggered by Prodeath and Prosurvival Cytokines (hereafter referred to as the compendium).to test whether apoptosis could be connected to the measured signals in the network, cell-death phenotype was assayed for each combinatorial cytokine stimulus using four distinct apoptosis assays. Together, 25

34 these output measurements constituted an apoptotic signature that characterized early (phosphatidylserine exposure), middle (caspase substrate cleavage and membrane permeability), and late (nuclear fragmentation) responses of apoptosis. With this cytokine compendium Gaudet et al investigated how individual molecular signals in combination could predict apoptotic response. This can be viewed as a typical multivariate regression problem where independent/predictor variables (molecular signals) regress against dependent variables (apoptotic outputs). Feature selection methods can be applied in this framework to determine which subset of original features in combination can best predict apoptotic responses. Since it was unclear which features of dynamically sampled molecular signal could be responsible for apoptotic outcome (i.e. kinase activity at various time points or just maximal activity), Janes et al defined an additional panel of time dependent signalling metrics such as peak of kinase activity or total area under the temporal profile (i.e. creation of new variables). In total they obtained 660 independent variables/molecular signals and 12 dependent variables/apoptotic outputs. Having temporal measurements resulted in high redundancy of the dataset and requirement to apply feature selection techniques. The statistical approach that they chose was partial least-squares (PLS) regression [23]. It differs from multivariate regression in that later uses subset selection approach for pooling predictive variables that depends on the order in which they are presented (greedy selection). On the other hand, PLS regression is more suitable for the cases where large number of correlated variables should be combined into small set of linear 26

35 combinations (dimensions). In this respect it is similar to PCA, but PLS regression also seeks directions that have high correlation with response (and not merely high variance). By applying PLS regression Janes et al were able to project original data set (660 dimensions) into the 3-dimensional/ principal components set of feature combinations. Resulting apoptosis model was able to predict output of four apoptosis metrics with 94 percent on cross validation dataset. Further analysis of this model gave interesting discoveries and observations [24]. For instance, the model was able to predict apoptosis outputs to new growth factor treatments that were not available in the original dataset with 90 percent accuracy. Furthermore, fist two principal components defining the optimal two-dimensional slice through the signalling data represented opposing apoptosis and survival signalling axes (apoptosis basis set). The first principal component oriented toward stress and apoptotic pathways whereas second appeared to constitute a global survival signal. These two-dimensional signalling axes for instance revealed that the same molecule (i.e. IKK) can convey either pro- or antiapoptotic messages depending on the timing and mechanism of activation. It is noteworthy, that these derived metrics were essential for constructing predictive PLS regression models. PLS regression treats each time point independently and derived metrics were therefore the primary means by which time dependences were encoded [25]. However, it may not always be possible to apply a fully-supervised dimensionality reduction approach to study how changes in the activity of signalling network influence cellular responses. For instance, Janes et al [24] 27

36 used measurements of network activity across multiple early time points to predict apoptosis outcome at later time points, whereas in some applications it may be desirable to relate signals to responses at identical time points [24]. In the context of the variable and conflicting cellular responses to the various TNF, EGF, and insulin treatment conditions noted above it would be useful to understand how different treatments are positioned within the resulting space of intracellular molecular signals. 3.2 Main aims of the thesis research In this thesis I assessed applicability of unsupervised dimensionality reduction techniques for the analysis of apoptosis signalling networks. The key aim was to infer connections between external cues, intracellular molecular signals and corresponding cellular responses. More specifically, by using Cytokine signalling data compendium I aimed to answer the following questions: 1. Can different treatment combinations of ligands be positioned into separate clusters of cues in the low-dimensional signalling space? 2. What are characteristics of these clusters in the low-dimensional space (unsupervised learning)? 3. Can low-dimensional embedding be intuitively explained in terms of original dimensions of molecular signals? 4. Can the low-dimensional representation of signalling networks be used 28

37 for predictive (supervised) modelling of cellular responses, i.e. apoptosis intensity? 5. Are there any differences in performance between linear and nonlinear dimensionality reduction techniques in the case of both supervised and unsupervised learning (i.e. PCA vs. Isomap)? To address these questions I applied a non-linear dimensionality reduction approach Isomap [26]. Isomap and a similar technique, local linear embedding (LLE) [26, 27] have already been successfully applied as dimensionality reduction approaches for gene networks [28-30] and many other problems in cognitive sciences and computer vision. These algorithms have been found superior to PCA and Multidimensional Scaling (MDS) in finding low-dimensional submanifolds in many cases. For instance, a modified LLE algorithm Local Context Finder (LCF) enabled successful reconstruction of a low-dimensional representation of the pathogen induced gene network in Arabidopsis [31]. To apply Isomap specifically in the context of signalling networks I extended it with graph-based clustering (questions 2) and neural networks (questions 3) (Figure 3). Finally, I used the low-dimensional embedding found by Isomap to build a classifier for prediction of apoptosis intensity (question 5). In my application Isomap generates a low-dimensional projection of signalling networks represented by the activity of molecular signals, where groups of different cues/treatment conditions could be easily identified and visualized (henceforce I refer to such projection as low-dimensional embedding of the signalling network). Consequently, the main contribution of this thesis is analysis of 29

38 Extended Isomap I Construct k-nn graph and find optimal value for k Estimate geodesic distances using Djikstra's algorithm Estimate Euclidian distance matrix Isomap Construct Isomap d- dimensional embedding (MDS) II. Estimate Euclidian distances for Isomap lowdimensional map Construct k-nn graph from Euclidian distance matrix (sparcification) Graph-based clustering with Isomap embedding Partition k-nn graph into separate clusters (multilevel k-way partitioning for irregular graphs) III Build ensemble of neural networks (NN) and perform regression from original dimensions into Isomap projections Construct test and training set with the split ¼ and use parametric bootstrap to select five best NNs Use sensitivity analysis for feature selection, choose five top features Interpretation of Isomap projections with neural networks Figure 3: Schematic representation of the extended Isomap approach. Extended Isomap approach involves three main steps: Isomap algorithm to construct a low dimensional embedding of the apoptosis signalling networks (I), graph-based clustering to find clusters in the Isomap space (II) and neural networks ensemble to find meaningful interpretation of the Isomap projection (III) signalling networks in the new unsupervised learning context using nonlinear dimensionality reduction approaches. 3.3 Cytokine compendium dataset The main details of the Cytokine compendium dataset [19] that are relevant for the present study are as follows. HT-29 epithelial cancer cells were treated with 10 combinations of saturating or subsaturating concentrations of TNF, EGF and insulin (0, 0.2, 5, 100 ng/ml TNF and 0, 1, 100 ng/ml EGF or 0, Figure 2 1, 5, 500 ng/ml insulin respectively), which collectively represent all the cues used in the study. 19 molecular signals were chosen to characterize changes in 30

Types of biological networks. I. Intra-cellurar networks

Types of biological networks. I. Intra-cellurar networks Types of biological networks I. Intra-cellurar networks 1 Some intra-cellular networks: 1. Metabolic networks 2. Transcriptional regulation networks 3. Cell signalling networks 4. Protein-protein interaction

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on Regulation and signaling Overview Cells need to regulate the amounts of different proteins they express, depending on cell development (skin vs liver cell) cell stage environmental conditions (food, temperature,

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Inferring Protein-Signaling Networks II

Inferring Protein-Signaling Networks II Inferring Protein-Signaling Networks II Lectures 15 Nov 16, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022

More information

Signal Transduction. Dr. Chaidir, Apt

Signal Transduction. Dr. Chaidir, Apt Signal Transduction Dr. Chaidir, Apt Background Complex unicellular organisms existed on Earth for approximately 2.5 billion years before the first multicellular organisms appeared.this long period for

More information

Apoptosis in Mammalian Cells

Apoptosis in Mammalian Cells Apoptosis in Mammalian Cells 7.16 2-10-05 Apoptosis is an important factor in many human diseases Cancer malignant cells evade death by suppressing apoptosis (too little apoptosis) Stroke damaged neurons

More information

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Evidence for dynamically organized modularity in the yeast protein-protein interaction network Evidence for dynamically organized modularity in the yeast protein-protein interaction network Sari Bombino Helsinki 27.3.2007 UNIVERSITY OF HELSINKI Department of Computer Science Seminar on Computational

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Systems biology Introduction to Bioinformatics Systems biology: modeling biological p Study of whole biological systems p Wholeness : Organization of dynamic interactions Different behaviour of the individual

More information

Richik N. Ghosh, Linnette Grove, and Oleg Lapets ASSAY and Drug Development Technologies 2004, 2:

Richik N. Ghosh, Linnette Grove, and Oleg Lapets ASSAY and Drug Development Technologies 2004, 2: 1 3/1/2005 A Quantitative Cell-Based High-Content Screening Assay for the Epidermal Growth Factor Receptor-Specific Activation of Mitogen-Activated Protein Kinase Richik N. Ghosh, Linnette Grove, and Oleg

More information

COMPUTER SIMULATION OF DIFFERENTIAL KINETICS OF MAPK ACTIVATION UPON EGF RECEPTOR OVEREXPRESSION

COMPUTER SIMULATION OF DIFFERENTIAL KINETICS OF MAPK ACTIVATION UPON EGF RECEPTOR OVEREXPRESSION COMPUTER SIMULATION OF DIFFERENTIAL KINETICS OF MAPK ACTIVATION UPON EGF RECEPTOR OVEREXPRESSION I. Aksan 1, M. Sen 2, M. K. Araz 3, and M. L. Kurnaz 3 1 School of Biological Sciences, University of Manchester,

More information

SPA for quantitative analysis: Lecture 6 Modelling Biological Processes

SPA for quantitative analysis: Lecture 6 Modelling Biological Processes 1/ 223 SPA for quantitative analysis: Lecture 6 Modelling Biological Processes Jane Hillston LFCS, School of Informatics The University of Edinburgh Scotland 7th March 2013 Outline 2/ 223 1 Introduction

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

Biological Pathways Representation by Petri Nets and extension

Biological Pathways Representation by Petri Nets and extension Biological Pathways Representation by and extensions December 6, 2006 Biological Pathways Representation by and extension 1 The cell Pathways 2 Definitions 3 4 Biological Pathways Representation by and

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

Chem Lecture 10 Signal Transduction

Chem Lecture 10 Signal Transduction Chem 452 - Lecture 10 Signal Transduction 111202 Here we look at the movement of a signal from the outside of a cell to its inside, where it elicits changes within the cell. These changes are usually mediated

More information

Inferring Protein-Signaling Networks

Inferring Protein-Signaling Networks Inferring Protein-Signaling Networks Lectures 14 Nov 14, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 1

More information

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007 Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Computational Lens Exposes the computational nature of natural processes and provides a language for their description.

More information

86 Part 4 SUMMARY INTRODUCTION

86 Part 4 SUMMARY INTRODUCTION 86 Part 4 Chapter # AN INTEGRATION OF THE DESCRIPTIONS OF GENE NETWORKS AND THEIR MODELS PRESENTED IN SIGMOID (CELLERATOR) AND GENENET Podkolodny N.L. *1, 2, Podkolodnaya N.N. 1, Miginsky D.S. 1, Poplavsky

More information

State Machine Modeling of MAPK Signaling Pathways

State Machine Modeling of MAPK Signaling Pathways State Machine Modeling of MAPK Signaling Pathways Youcef Derbal Ryerson University yderbal@ryerson.ca Abstract Mitogen activated protein kinase (MAPK) signaling pathways are frequently deregulated in human

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

Computational Systems Biology

Computational Systems Biology Computational Systems Biology Vasant Honavar Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Center for Computational Intelligence, Learning, & Discovery

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

The EGF Signaling Pathway! Introduction! Introduction! Chem Lecture 10 Signal Transduction & Sensory Systems Part 3. EGF promotes cell growth

The EGF Signaling Pathway! Introduction! Introduction! Chem Lecture 10 Signal Transduction & Sensory Systems Part 3. EGF promotes cell growth Chem 452 - Lecture 10 Signal Transduction & Sensory Systems Part 3 Question of the Day: Who is the son of Sevenless? Introduction! Signal transduction involves the changing of a cell s metabolism or gene

More information

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation Authors: Fan Zhang, Runsheng Liu and Jie Zheng Presented by: Fan Wu School of Computer Science and

More information

Zool 3200: Cell Biology Exam 5 4/27/15

Zool 3200: Cell Biology Exam 5 4/27/15 Name: Trask Zool 3200: Cell Biology Exam 5 4/27/15 Answer each of the following short answer questions in the space provided, giving explanations when asked to do so. Circle the correct answer or answers

More information

Enabling Technologies from the Biology Perspective

Enabling Technologies from the Biology Perspective Enabling Technologies from the Biology Perspective H. Steven Wiley January 22nd, 2002 What is a Systems Approach in the Context of Biological Organisms? Looking at cells as integrated systems and not as

More information

BIOH111. o Cell Biology Module o Tissue Module o Integumentary system o Skeletal system o Muscle system o Nervous system o Endocrine system

BIOH111. o Cell Biology Module o Tissue Module o Integumentary system o Skeletal system o Muscle system o Nervous system o Endocrine system BIOH111 o Cell Biology Module o Tissue Module o Integumentary system o Skeletal system o Muscle system o Nervous system o Endocrine system Endeavour College of Natural Health endeavour.edu.au 1 Textbook

More information

Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements

Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements Derek Ruths 1 Luay Nakhleh 2 1 School of Computer Science, McGill University, Quebec, Montreal

More information

Systems Biology Across Scales: A Personal View XIV. Intra-cellular systems IV: Signal-transduction and networks. Sitabhra Sinha IMSc Chennai

Systems Biology Across Scales: A Personal View XIV. Intra-cellular systems IV: Signal-transduction and networks. Sitabhra Sinha IMSc Chennai Systems Biology Across Scales: A Personal View XIV. Intra-cellular systems IV: Signal-transduction and networks Sitabhra Sinha IMSc Chennai Intra-cellular biochemical networks Metabolic networks Nodes:

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Valley Central School District 944 State Route 17K Montgomery, NY Telephone Number: (845) ext Fax Number: (845)

Valley Central School District 944 State Route 17K Montgomery, NY Telephone Number: (845) ext Fax Number: (845) Valley Central School District 944 State Route 17K Montgomery, NY 12549 Telephone Number: (845)457-2400 ext. 18121 Fax Number: (845)457-4254 Advance Placement Biology Presented to the Board of Education

More information

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution. The AP Biology course is designed to enable you to develop advanced inquiry and reasoning skills, such as designing a plan for collecting data, analyzing data, applying mathematical routines, and connecting

More information

Lecture 10: Cyclins, cyclin kinases and cell division

Lecture 10: Cyclins, cyclin kinases and cell division Chem*3560 Lecture 10: Cyclins, cyclin kinases and cell division The eukaryotic cell cycle Actively growing mammalian cells divide roughly every 24 hours, and follow a precise sequence of events know as

More information

ANAXOMICS METHODOLOGIES - UNDERSTANDING

ANAXOMICS METHODOLOGIES - UNDERSTANDING ANAXOMICS METHODOLOGIES - UNDERSTANDING THE COMPLEXITY OF BIOLOGICAL PROCESSES Raquel Valls, Albert Pujol ǂ, Judith Farrés, Laura Artigas and José Manuel Mas Anaxomics Biotech, c/balmes 89, 08008 Barcelona,

More information

Identifying Signaling Pathways

Identifying Signaling Pathways These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey Identifying Signaling Pathways BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018

More information

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Map of AP-Aligned Bio-Rad Kits with Learning Objectives Map of AP-Aligned Bio-Rad Kits with Learning Objectives Cover more than one AP Biology Big Idea with these AP-aligned Bio-Rad kits. Big Idea 1 Big Idea 2 Big Idea 3 Big Idea 4 ThINQ! pglo Transformation

More information

Cell Death & Trophic Factors II. Steven McLoon Department of Neuroscience University of Minnesota

Cell Death & Trophic Factors II. Steven McLoon Department of Neuroscience University of Minnesota Cell Death & Trophic Factors II Steven McLoon Department of Neuroscience University of Minnesota 1 Remember? Neurotrophins are cell survival factors that neurons get from their target cells! There is a

More information

Reception The target cell s detection of a signal coming from outside the cell May Occur by: Direct connect Through signal molecules

Reception The target cell s detection of a signal coming from outside the cell May Occur by: Direct connect Through signal molecules Why Do Cells Communicate? Regulation Cells need to control cellular processes In multicellular organism, cells signaling pathways coordinate the activities within individual cells that support the function

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models

FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models Markus W. Covert Stanford University 0 CRC Press Taylor & Francis Group Boca Raton London New York Contents /... Preface, xi

More information

Cytokine-Induced Signaling Networks Prioritize Dynamic Range over Signal Strength

Cytokine-Induced Signaling Networks Prioritize Dynamic Range over Signal Strength Theory Cytokine-Induced Signaling Networks Prioritize Dynamic Range over Signal Strength Kevin A. Janes, 1,2,3,4 H. Christian Reinhardt, 1,3 and Michael B. Yaffe 1, * 1 Koch Institute for Integrative Cancer

More information

An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets

An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets An Empirical Comparison of Dimensionality Reduction Methods for Classifying Gene and Protein Expression Datasets George Lee 1, Carlos Rodriguez 2, and Anant Madabhushi 1 1 Rutgers, The State University

More information

Clustering and Network

Clustering and Network Clustering and Network Jing-Dong Jackie Han jdhan@picb.ac.cn http://www.picb.ac.cn/~jdhan Copy Right: Jing-Dong Jackie Han What is clustering? A way of grouping together data samples that are similar in

More information

AP Curriculum Framework with Learning Objectives

AP Curriculum Framework with Learning Objectives Big Ideas Big Idea 1: The process of evolution drives the diversity and unity of life. AP Curriculum Framework with Learning Objectives Understanding 1.A: Change in the genetic makeup of a population over

More information

Apoptosis: Death Comes for the Cell

Apoptosis: Death Comes for the Cell Apoptosis: Death Comes for the Cell Joe W. Ramos joeramos@hawaii.edu From Ingmar Bergman s The Seventh Seal 1 2 Mutations in proteins that regulate cell proliferation, survival and death can contribute

More information

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Model of Mitogen Activated Protein Kinases for Cell Survival/Death and its Equivalent Bio-circuit

Model of Mitogen Activated Protein Kinases for Cell Survival/Death and its Equivalent Bio-circuit Current Research Journal of Biological Sciences 2(1): 59-71, 2010 ISSN: 2041-0778 Maxwell Scientific Organization, 2009 Submitted Date: September 17, 2009 Accepted Date: October 16, 2009 Published Date:

More information

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization The Cell Cycle 16 The Cell Cycle Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization Introduction Self-reproduction is perhaps

More information

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai Network Biology: Understanding the cell s functional organization Albert-László Barabási Zoltán N. Oltvai Outline: Evolutionary origin of scale-free networks Motifs, modules and hierarchical networks Network

More information

Activation of a receptor. Assembly of the complex

Activation of a receptor. Assembly of the complex Activation of a receptor ligand inactive, monomeric active, dimeric When activated by growth factor binding, the growth factor receptor tyrosine kinase phosphorylates the neighboring receptor. Assembly

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180 Course plan 201-201 Academic Year Qualification MSc on Bioinformatics for Health Sciences 1. Description of the subject Subject name: Code: 30180 Total credits: 5 Workload: 125 hours Year: 1st Term: 3

More information

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Lecture Notes for Fall Network Modeling. Ernest Fraenkel Lecture Notes for 20.320 Fall 2012 Network Modeling Ernest Fraenkel In this lecture we will explore ways in which network models can help us to understand better biological data. We will explore how networks

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

ADAM FAMILY. ephrin A INTERAZIONE. Eph ADESIONE? PROTEOLISI ENDOCITOSI B A RISULTATO REPULSIONE. reverse. forward

ADAM FAMILY. ephrin A INTERAZIONE. Eph ADESIONE? PROTEOLISI ENDOCITOSI B A RISULTATO REPULSIONE. reverse. forward ADAM FAMILY - a family of membrane-anchored metalloproteases that are known as A Disintegrin And Metalloprotease proteins and are key components in protein ectodomain shedding Eph A INTERAZIONE B ephrin

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Models of transcriptional regulation

Models of transcriptional regulation Models of transcriptional regulation We have already discussed four simple mechanisms of transcriptional regulation, nuclear exclusion nuclear concentration modification of bound activator redirection

More information

A A A A B B1

A A A A B B1 LEARNING OBJECTIVES FOR EACH BIG IDEA WITH ASSOCIATED SCIENCE PRACTICES AND ESSENTIAL KNOWLEDGE Learning Objectives will be the target for AP Biology exam questions Learning Objectives Sci Prac Es Knowl

More information

Analysis and Simulation of Biological Systems

Analysis and Simulation of Biological Systems Analysis and Simulation of Biological Systems Dr. Carlo Cosentino School of Computer and Biomedical Engineering Department of Experimental and Clinical Medicine Università degli Studi Magna Graecia Catanzaro,

More information

A. Incorrect! The Cell Cycle contains 4 distinct phases: (1) G 1, (2) S Phase, (3) G 2 and (4) M Phase.

A. Incorrect! The Cell Cycle contains 4 distinct phases: (1) G 1, (2) S Phase, (3) G 2 and (4) M Phase. Molecular Cell Biology - Problem Drill 21: Cell Cycle and Cell Death Question No. 1 of 10 1. Which of the following statements about the cell cycle is correct? Question #1 (A) The Cell Cycle contains 3

More information

Computational Structural Bioinformatics

Computational Structural Bioinformatics Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite

More information

Campbell Biology AP Edition 11 th Edition, 2018

Campbell Biology AP Edition 11 th Edition, 2018 A Correlation and Narrative Summary of Campbell Biology AP Edition 11 th Edition, 2018 To the AP Biology Curriculum Framework AP is a trademark registered and/or owned by the College Board, which was not

More information

Biological networks CS449 BIOINFORMATICS

Biological networks CS449 BIOINFORMATICS CS449 BIOINFORMATICS Biological networks Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better

More information

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods International Academic Institute for Science and Technology International Academic Journal of Science and Engineering Vol. 3, No. 6, 2016, pp. 169-176. ISSN 2454-3896 International Academic Journal of

More information

Exhaustive search. CS 466 Saurabh Sinha

Exhaustive search. CS 466 Saurabh Sinha Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

Big Idea 1: The process of evolution drives the diversity and unity of life.

Big Idea 1: The process of evolution drives the diversity and unity of life. Big Idea 1: The process of evolution drives the diversity and unity of life. understanding 1.A: Change in the genetic makeup of a population over time is evolution. 1.A.1: Natural selection is a major

More information

Supplementary Materials

Supplementary Materials Electronic Supplementary Material (ESI) for Integrative Biology. This journal is The Royal Society of Chemistry 2015 Predicting genetic interactions from Boolean models of biological networks Supplementary

More information

Quantitative analysis of intracellular communication and signaling errors in signaling networks

Quantitative analysis of intracellular communication and signaling errors in signaling networks Quantitative analysis of intracellular communication and signaling errors in signaling networks Habibi et al. Habibi et al. BMC Systems Biology 014, 8:89 Habibi et al. BMC Systems Biology 014, 8:89 METHODOLOGY

More information

Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements

Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements Derek Ruths School of Computer Science, McGill University, Quebec, Montreal Canada Email: derek.ruths@cs.mcgill.ca

More information

J. Hasenauer a J. Heinrich b M. Doszczak c P. Scheurich c D. Weiskopf b F. Allgöwer a

J. Hasenauer a J. Heinrich b M. Doszczak c P. Scheurich c D. Weiskopf b F. Allgöwer a J. Hasenauer a J. Heinrich b M. Doszczak c P. Scheurich c D. Weiskopf b F. Allgöwer a Visualization methods and support vector machines as tools for determining markers in models of heterogeneous populations:

More information

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

Supplementary Figures

Supplementary Figures Supplementary Figures a x 1 2 1.5 1 a 1 = 1.5 a 1 = 1.0 0.5 a 1 = 1.0 b 0 0 20 40 60 80 100 120 140 160 2 t 1.5 x 2 1 0.5 a 1 = 1.0 a 1 = 1.5 a 1 = 1.0 0 0 20 40 60 80 100 120 140 160 t Supplementary Figure

More information

Biology Teach Yourself Series Topic 2: Cells

Biology Teach Yourself Series Topic 2: Cells Biology Teach Yourself Series Topic 2: Cells A: Level 14, 474 Flinders Street Melbourne VIC 3000 T: 1300 134 518 W: tssm.com.au E: info@tssm.com.au TSSM 2013 Page 1 of 14 Contents Cells... 3 Prokaryotic

More information

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data 1 Gene Networks Definition: A gene network is a set of molecular components, such as genes and proteins, and interactions between

More information

Apoptosis EXTRA REQUIREMETS

Apoptosis EXTRA REQUIREMETS Apoptosis Introduction (SLIDE 1) Organisms develop from a single cell, and in doing so an anatomy has to be created. This process not only involves the creation of new cells, but also the removal of cells

More information

Cell-Cell Communication in Development

Cell-Cell Communication in Development Biology 4361 - Developmental Biology Cell-Cell Communication in Development October 2, 2007 Cell-Cell Communication - Topics Induction and competence Paracrine factors inducer molecules Signal transduction

More information

International Journal of Scientific Research and Reviews

International Journal of Scientific Research and Reviews Jain Shruti et al., IJSRR 08, 7(), 5-63 Research article Available online www.ijsrr.org ISSN: 79 0543 International Journal of Scientific Research and Reviews Computational Model Using Anderson Darling

More information

ON/OFF and Beyond - A Boolean Model of Apoptosis

ON/OFF and Beyond - A Boolean Model of Apoptosis Rebekka Schlatter 1 *, Kathrin Schmich 2, Ima Avalos Vizcarra 1,3, Peter Scheurich 4, Thomas Sauter 5, Christoph Borner 6, Michael Ederer 1,7, Irmgard Merfort 2, Oliver Sawodny 1 1 Institute for System

More information

Gene Network Science Diagrammatic Cell Language and Visual Cell

Gene Network Science Diagrammatic Cell Language and Visual Cell Gene Network Science Diagrammatic Cell Language and Visual Cell Mr. Tan Chee Meng Scientific Programmer, System Biology Group, Bioinformatics Institute Overview Introduction Why? Challenges Diagrammatic

More information

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs William (Bill) Welsh welshwj@umdnj.edu Prospective Funding by DTRA/JSTO-CBD CBIS Conference 1 A State-wide, Regional and National

More information

Citation. As Published Publisher. Version

Citation. As Published Publisher. Version Training Signaling Pathway Maps to Biochemical Data with Constrained Fuzzy Logic: Quantitative Analysis of Liver Cell Responses to Inflammatory Stimuli The MIT Faculty has made this article openly available.

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

Models and Languages for Computational Systems Biology Lecture 1

Models and Languages for Computational Systems Biology Lecture 1 Models and Languages for Computational Systems Biology Lecture 1 Jane Hillston. LFCS and CSBE, University of Edinburgh 13th January 2011 Outline Introduction Motivation Measurement, Observation and Induction

More information

Correlation Networks

Correlation Networks QuickTime decompressor and a are needed to see this picture. Correlation Networks Analysis of Biological Networks April 24, 2010 Correlation Networks - Analysis of Biological Networks 1 Review We have

More information

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction Lecture 8: Temporal programs and the global structure of transcription networks Chap 5 of Alon 5. Introduction We will see in this chapter that sensory transcription networks are largely made of just four

More information

From Petri Nets to Differential Equations An Integrative Approach for Biochemical Network Analysis

From Petri Nets to Differential Equations An Integrative Approach for Biochemical Network Analysis From Petri Nets to Differential Equations An Integrative Approach for Biochemical Network Analysis David Gilbert drg@brc.dcs.gla.ac.uk Bioinformatics Research Centre, University of Glasgow and Monika Heiner

More information

MAPK kinase kinase regulation of SAPK/JNK pathways

MAPK kinase kinase regulation of SAPK/JNK pathways MAPK kinase kinase regulation of SAPK/JNK pathways Lisa Stalheim and Gary L. Johnson Abstract SAPK/JNK members of the MAPK family are regulated by at least fourteen known MAPK kinase kinases (MKKKs). In

More information

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database OECD QSAR Toolbox v.3.3 Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database Outlook Background The exercise Workflow Save prediction 23.02.2015

More information

Geert Geeven. April 14, 2010

Geert Geeven. April 14, 2010 iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the

More information

Supplementary online material

Supplementary online material Supplementary online material A probabilistic functional network of yeast genes Insuk Lee, Shailesh V. Date, Alex T. Adai & Edward M. Marcotte DATA SETS Saccharomyces cerevisiae genome This study is based

More information

Answer Key. Cell Growth and Division

Answer Key. Cell Growth and Division Cell Growth and Division Answer Key SECTION 1. THE CELL CYCLE Cell Cycle: (1) Gap1 (G 1): cells grow, carry out normal functions, and copy their organelles. (2) Synthesis (S): cells replicate DNA. (3)

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information