ActiveNetworks Cross-Condition Analysis of Functional Genomic Data

Similar documents
Predicting Protein Functions and Domain Interactions from Protein Interactions

Pathway Association Analysis Trey Ideker UCSD

Automatic Reconstruction of the Building Blocks of Molecular Interaction Networks

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Systems biology and biological networks

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Comparative Network Analysis

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Bioinformatics. Transcriptome

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Written Exam 15 December Course name: Introduction to Systems Biology Course no

CONJOINT 541. Translating a Transcriptome at Specific Times and Places. David Morris. Department of Biochemistry

Introduction to Microarray Data Analysis and Gene Networks lecture 8. Alvis Brazma European Bioinformatics Institute

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

A general co-expression network-based approach to gene expression analysis: comparison and applications

Integration of functional genomics data

Cytoscape An open-source software platform for the exploration of molecular interaction networks

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

MULE: An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks

The geneticist s questions

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Inferring Transcriptional Regulatory Networks from High-throughput Data

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Genome Evolution Greg Lang, Department of Biological Sciences

Graph Alignment and Biological Networks

arxiv: v1 [q-bio.mn] 7 Nov 2018

Discovering modules in expression profiles using a network

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

Introduction to Bioinformatics

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Bioinformatics: Network Analysis

Biological Networks. Gavin Conant 163B ASRC

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

What is Systems Biology

Self Similar (Scale Free, Power Law) Networks (I)

Introduction to clustering methods for gene expression data analysis

Extend Your Metabolomics Insight!

Chapter 15 Active Reading Guide Regulation of Gene Expression

Bioinformatics and Computerscience

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Network alignment and querying

Redescription Mining and Applications in Bioinformatics

Clustering and Network

Rule learning for gene expression data

Types of biological networks. I. Intra-cellurar networks

Differential Modeling for Cancer Microarray Data

Introduction to clustering methods for gene expression data analysis

Network Biology-part II

BMD645. Integration of Omics

FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps

Identifying Signaling Pathways

Protein-protein interaction networks Prof. Peter Csermely

Significant Pattern Mining

Measuring TF-DNA interactions

Supplementary online material

Computational Systems Biology

Regulation of Gene Expression

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Introduction to Systems Biology. James Gomes KSBS, IITD

Dynamic modular architecture of protein-protein interaction networks beyond the dichotomy of date and party hubs

Supplementary Figure 3

Whole-genome analysis of GCN4 binding in S.cerevisiae

Network Motifs of Pathogenic Genes in Human Regulatory Network

FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models

Discovering Binding Motif Pairs from Interacting Protein Groups

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES

networks in molecular biology Wolfgang Huber

Using graphs to relate expression data and protein-protein interaction data

BicNET: Efficient Biclustering of Biological Networks to Unravel Non-Trivial Modules

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

CSCE555 Bioinformatics. Protein Function Annotation

Latent Variable models for GWAs

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Computational Analysis of Transcriptional Programs: Function and Evolution

Lecture 3: A basic statistical concept

Linking the Signaling Cascades and Dynamic Regulatory Networks Controlling Stress Responses

Ph.D. thesis. Study of proline accumulation and transcriptional regulation of genes involved in this process in Arabidopsis thaliana

How much non-coding DNA do eukaryotes require?

What is Systems Biology?

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Phylogenetic Analysis of Molecular Interaction Networks 1

Proteomics Systems Biology

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Network motifs in the transcriptional regulation network (of Escherichia coli):

A Multiobjective GO based Approach to Protein Complex Detection

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Comparison of Protein-Protein Interaction Confidence Assignment Schemes

Keywords: systems biology, microarrays, gene expression, clustering

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

Analysis of Biological Networks: Network Integration

Computational methods for predicting protein-protein interactions

Transcription:

ActiveNetworks Cross-Condition Analysis of Functional Genomic Data T. M. Murali April 18, 2006

Motivation: Manual Systems Biology Biologists want to study a favourite stress, e.g., oxidative stress or desiccation tolerance. Measure gene expression, apply clustering algorithms, and find genes whose expression level change in response to the stress.

Motivation: Manual Systems Biology Biologists want to study a favourite stress, e.g., oxidative stress or desiccation tolerance. Measure gene expression, apply clustering algorithms, and find genes whose expression level change in response to the stress. Trace genes by hand through databases of protein-protein interactions, gene regulatory networks, metabolic pathways, PubMed searches to build networks activated in response to the stress.

Motivation: Manual Systems Biology Redescription R5 Heat Shock, 30 min -1 T 2 vs T 1-5 AND NOT T 2 vs T 1-1 -7-5 -3-1 Heat shock, 30 min riboswitch ligands? SAM Thiamine HMG-CoA ERG13 PHO3 transport SAM1 fumarate ARO4 reductases FRDS ARO1 OSM1 SAH1 osmotic growth protein YFR055W CYS4 NADPH TEF4 saccharopine -KG SIR3 -ketoglutarate MET30 F-box; protein ubiquitination 0.71 CLN2 CDC34 GAS1 GAS3 Redescription R5 Gene List ARO4, ASN1, CLN2, GAS3, HEM13, HIS1, IMD4, -7-5 -3-1 PHO3, RPL-7A, 7B, 13A, 17B, 27B, 40B, RPS-0B, T 9B, 10A, 16B, 22B, 26B, SAH1, SAM1, SUN4, TEF4, 2 vs. T 1 TPO2, URA7, UTR2, YHB1, YBR238C, YER156C, YFR055W, YOR309C cell wall organization glutamate YHB1 oxidative stress response HIS4 HIS1 SER3 serine biosynthesis TPO2 polyamine transport URA7 phospholipid synthesis URA8 binding SIP18 LYS14 YBL085W lysine LYS12 YBR238C acetoacetate acetyl-coa S0B S9B acetyl-coa ribosomal L17B genes L7B S26B S22B S10A S16B A B C L40B L7A L27B L13A

Motivation: Manual Systems Biology Redescription R5 Heat Shock, 30 min -1 T 2 vs T 1-5 AND NOT T 2 vs T 1-1 -7-5 -3-1 Heat shock, 30 min riboswitch ligands? SAM Thiamine HMG-CoA ERG13 PHO3 transport SAM1 fumarate ARO4 reductases FRDS ARO1 OSM1 SAH1 osmotic growth protein YFR055W CYS4 NADPH TEF4 saccharopine -KG SIR3 -ketoglutarate MET30 F-box; protein ubiquitination 0.71 CLN2 CDC34 GAS1 GAS3 Redescription R5 Gene List ARO4, ASN1, CLN2, GAS3, HEM13, HIS1, IMD4, -7-5 -3-1 PHO3, RPL-7A, 7B, 13A, 17B, 27B, 40B, RPS-0B, T 9B, 10A, 16B, 22B, 26B, SAH1, SAM1, SUN4, TEF4, 2 vs. T 1 TPO2, URA7, UTR2, YHB1, YBR238C, YER156C, YFR055W, YOR309C cell wall organization glutamate YHB1 oxidative stress response HIS4 HIS1 SER3 serine biosynthesis TPO2 polyamine transport URA7 phospholipid synthesis URA8 binding SIP18 LYS14 YBL085W lysine LYS12 YBR238C acetoacetate acetyl-coa S0B S9B acetyl-coa ribosomal L17B genes L7B S26B S22B S10A S16B A B C L40B L7A L27B L13A Can we automate this process?

Requirements for Automation Wiring diagram of the cell: protein-protein interactions, metabolic pathways, transcriptional regulatory networks,... Measurement of molecular profiles (gene expression, protein expression, metabolite levels) under different conditions or cell states. Algorithms for combining these types of information.

High-throughput Biology Provides Wiring Diagram Large amounts of information on different types of cellular interactions are now available. Protein-protein interactions: genome-scale yeast 2-hybrid experiments, in-vivo pulldowns of protein complexes. Transcriptional regulatory networks: ChIP-on-chip experiments yield protein-dna binding data. Metabolic networks: databases culled from the literature (KEGG). Techniques that extract interactions automatically from abstracts.

S. cerevisiea Wiring Diagram Physical network 15,429 protein-protein interactions from the Database of Interacting Proteins (DIP). 5869 protein-dna interactions (Lee et al., Science, 2002). 6,306 metabolic interactions (proteins operate on at least common metabolite) based on KEGG. Genetic network 4,125 synthetically lethal/sick interactions (Tong et al., Science, 2004). 687 synthetically lethal interactions (MIPS). Overall network has 32,416 (27,604 physical and 4,812 genetic) interactions between 5601 proteins (Kelley and Ideker, Nature Biotech., 2005).

Challenges in Utilising the Wiring Diagram Networks are large; they contain tens of thousands of interactions. High-throughput experiments contain many errors. Networks are incomplete; experiments are expensive and have biases.

Challenges in Utilising the Wiring Diagram Networks are large; they contain tens of thousands of interactions. High-throughput experiments contain many errors. Networks are incomplete; experiments are expensive and have biases. A biologist wants to explore and analyse system of interest. How do we zoom into the appropriate parts of the wiring diagram?

ActiveNetworks ActiveNetwork: network of interactions activated in response to a stress or in a particular condition. 1. Overlay molecular profile for a particular stress on wiring diagram to obtain ActiveNetwork for that stress. 2. Combine computed ActiveNetworks for each stress to find 2.1 ActiveNetwork common to multiple stresses. 2.2 ActiveNetwork unique to a particular stress or group of stresses.

ActiveNetworks ActiveNetwork: network of interactions activated in response to a stress or in a particular condition. 1. Overlay molecular profile for a particular stress on wiring diagram to obtain ActiveNetwork for that stress. 2. Combine computed ActiveNetworks for each stress to find 2.1 ActiveNetwork common to multiple stresses. 2.2 ActiveNetwork unique to a particular stress or group of stresses.

The ActiveNetworks pipeline Experiments Heat shock Cold shock Carbon starvation Desiccation Protein protein interaction networks Transcriptional regulatory networks Universal network ActiveNetwork computation ActiveNetwork mining Metabolic pathways ActiveNetworks for each condition Condition specific ActiveNetworks Hypotheses Cross condition ActiveNetworks

Overlaying Gene Expression Data Weight of an interaction is the Pearson correlation between the expression profiles of the interacting genes. Weight activity level of the interaction.

Overlaying Gene Expression Data Weight of an interaction is the Pearson correlation between the expression profiles of the interacting genes. Weight activity level of the interaction. Discard interactions based on a threshold. Unsatisfactory since we test each interaction individually.

Overlaying Gene Expression Data Weight of an interaction is the Pearson correlation between the expression profiles of the interacting genes. Weight activity level of the interaction. Discard interactions based on a threshold. Unsatisfactory since we test each interaction individually. We find the most highly active subnetwork.

Defining Highly-Active Subnetworks How do we measure the activity/weight of a subnetwork?

Defining Highly-Active Subnetworks How do we measure the activity/weight of a subnetwork? Sum or average of edge weights?

Defining Highly-Active Subnetworks How do we measure the activity/weight of a subnetwork? Sum or average of edge weights? The density of a network with n nodes is the total weight of the edges divided by n. Problem: Compute the subnetwork with highest density. 0.3 0.1 0.1 0.4 0.8 0.5 0.5 0.9 0.5 0.7

Computing Most Dense Subnetwork O(n 3 ) time network flow-based approach gives optimal result (Gallo, Grigoriadis, Tarjan, SIAM J. Comp, 1989). Can also be solved by linear programming.

Computing Most Dense Subnetwork Greedy algorithm: Weight of a node total weight of incident edges. Repeatedly delete nodes with the smallest weight. Keep track of density of remaining network. Return the most dense subnetwork. 0.3 0.1 0.1 0.4 0.8 0.5 0.5 0.9 0.5 0.7

Computing Most Dense Subnetwork Greedy algorithm: Weight of a node total weight of incident edges. Repeatedly delete nodes with the smallest weight. Keep track of density of remaining network. Return the most dense subnetwork. 0.3 0.1 0.1 0.4 0.8 0.5 0.5 0.9 0.5 0.7

Computing Most Dense Subnetwork Greedy algorithm: Weight of a node total weight of incident edges. Repeatedly delete nodes with the smallest weight. Keep track of density of remaining network. Return the most dense subnetwork. Computed subnetwork is at least half as dense as the most dense subnetwork (Charikar, Proc. APPROX, 2000). 0.3 0.1 0.1 0.4 0.8 0.5 0.5 0.9 0.5 0.7

Computing Multiple Dense Subnetworks Repeat 1. Apply greedy algorithm to compute most dense subnetwork. 2. Remove edges of computed subnetwork from the network. Until remaining network has density less than the original network. Output is a sequence of decreasingly dense subnetworks that can share nodes but not edges.

Advantages of Dense Subnetworks Uses no parameters. Avoid inclusion of interactions that appear active due to noise. Relatively weakly correlated interactions can reinforce each other.

Example of an ActiveNetwork

Example of an ActiveNetwork

Further Analysis of an ActiveNetwork Visualise the network (Graphviz package) and the gene expression profiles. Measure functional enrichment. Use hypergeometric distribution to calculate the significance of functions enriched in an ActiveNetwork. Use Bonferroni correction to adjust for testing multiple hypotheses.

AA Starvation: Conventional Analysis

AA Starvation: ActiveNetwork PRS5 CKA2 PRS3 ELP2 ELP3 IKI1 PMT4 CPA2 ADE6 PRS4 TDH3 KTI12 PPT1 GUA1 GLN4 PUS1 TKL1 TDH1 PHD1 SIT4 GLN1 PHO11 GPM1 GLT1 MSN4 CPA1 ADE4 IME1 HSP104 PRS1 TDH2 MSN2 CTT1 COX5B GCN4 ADH5 ADH3 GID8 YKL044W YLR164W SDH1 RIP1 COR1 ADH1 YAP6 MGA1 UGA2 CIN5 SDH2 QCR10 NRG1 LSC2 LSC1 CYB2 PDC1 YNL092W URA8 PYK2 PYC1 ALT1 PDC5

ActiveNetworks for Multiple Stresses ActiveNetwork: network of interactions activated in response to a stress or in a particular condition. 1. Overlay molecular profile for a particular stress on universal network to obtain ActiveNetwork for that stress. 2. Combine computed ActiveNetworks for each stress to find 2.1 ActiveNetwork common to multiple stresses. 2.2 ActiveNetwork unique to a particular stress or group of stresses.

ActiveNetworks for Multiple Stresses ActiveNetwork: network of interactions activated in response to a stress or in a particular condition. 1. Overlay molecular profile for a particular stress on universal network to obtain ActiveNetwork for that stress. 2. Combine computed ActiveNetworks for each stress to find 2.1 ActiveNetwork common to multiple stresses. 2.2 ActiveNetwork unique to a particular stress or group of stresses.

Comparative ActiveNetwork Analysis Experiments Heat shock Cold shock Carbon starvation Protein protein interaction networks Transcriptional regulatory networks Universal network ActiveNetwork computation ActiveNetworks for each condition

Comparative ActiveNetwork Analysis Experiments Heat shock Cold shock Carbon starvation Desiccation Protein protein interaction networks Transcriptional regulatory networks Universal network ActiveNetwork computation ActiveNetworks for each condition Richard and Malcolm want to compare desiccation ActiveNetwork with other ActiveNetworks to find similarities and differences. Can we automate this process?

Cross-Condition ActiveNetworks Experiments Heat shock Cold shock Carbon starvation Desiccation Protein protein interaction networks Transcriptional regulatory networks Universal network ActiveNetwork computation ActiveNetwork mining ActiveNetworks for each condition Condition specific ActiveNetworks Cross condition ActiveNetworks A conserved ActiveNetwork is a set of conditions and a set of interactions, such that each interaction appears in the ActiveNetwork for each condition.

Computing Conserved ActiveNetworks Interactions ActiveNetworks Construct a 0-1 interaction-by-condition matrix.

Computing Conserved ActiveNetworks Interactions ActiveNetworks Construct a 0-1 interaction-by-condition matrix. A conserved ActiveNetwork is,

Computing Conserved ActiveNetworks Interactions ActiveNetworks Construct a 0-1 interaction-by-condition matrix. A conserved ActiveNetwork is a set of conditions,

Computing Conserved ActiveNetworks Interactions ActiveNetworks Construct a 0-1 interaction-by-condition matrix. A conserved ActiveNetwork is a set of conditions and a set of interactions,

Computing Conserved ActiveNetworks Interactions ActiveNetworks Construct a 0-1 interaction-by-condition matrix. A conserved ActiveNetwork is a set of conditions and a set of interactions, such that each interaction appears in the ActiveNetwork for each condition.

Computing Conserved ActiveNetworks Interactions ActiveNetworks Construct a 0-1 interaction-by-condition matrix. A conserved ActiveNetwork is a set of conditions and a set of interactions, such that each interaction appears in the ActiveNetwork for each condition. We can compute a conserved ActiveNetwork using techniques for finding itemsets or biclusters.

Computing Conserved ActiveNetworks Interactions ActiveNetworks A large submatrix of 1 s is a frequent itemset. Such a submatrix is a special case of a bicluster in gene expression data.

Computing Conserved ActiveNetworks Interactions ActiveNetworks A large submatrix of 1 s is a frequent itemset. Such a submatrix is a special case of a bicluster in gene expression data. We use the apriori algorithm for finding all maximal (closed) itemsets (Agrawal and Srikant 1995) and the xmotif algorithm for finding large biclusters (Murali and Kasif, 2003).

Example of a Cross-Condition ActiveNetwork COX12 RIP1 COR1 HAP4 CYT1 CYB2 DLD1 QCR6 QCR2 COX7 QCR8 Common to Alternative carbon sources. DTT treatment and Growth in YPD culture.

Example of a Cross-Condition ActiveNetwork Common to Alternative carbon sources. DTT treatment and Growth in YPD culture.

Comparative Systems Biology ActiveNetworks provide an integrated view of multi-modal universal networks and measurements of molecular profiles. Compute single stimulus ActiveNetworks using dense subgraphs. Compare and contrast ActiveNetworks for different stimuli using frequent itemsets. Automatic extraction of network modules and legos from large scale data. Promises system-level insights from comparisons between different conditions, disease states, or species.

Closing the Loop Protein protein interaction networks Transcriptional regulatory networks Metabolic pathways Experiments Heat shock Cold shock Carbon starvation Desiccation Universal network ActiveNetwork computation ActiveNetwork mining ActiveNetworks for each condition Condition specific ActiveNetworks Hypotheses Cross condition ActiveNetworks

Project Members Greg Grothaus Deept Kumar Maulik Shukla Graham Jack Corban Rivera Richard Helm Malcolm Potts Naren Ramakrishnan

Future Research: Modelling and Algorithmic Improvements Integrate other types of data: metabolic measurements, protein expression. Explicitly incorporate expression level of a gene.

Future Research: Applications ActiveNetworks in cancer: integrate gene expression data and protein interaction networks. Compare oxidative stress networks across kingdom boundaries (yeast, Arabidopsis thaliana, malaria parasite, P. sojae). Cross-stress networks in Arabidopsis thaliana. Redox signalling in various plant species.

Related Research Discovering regulatory and signalling circuits in molecular interaction networks, Ideker et al. ISMB 2002. Physical network models and multi-source data integration, Yeang and Jakkola, RECOMB 2003. Discovering molecular pathways from protein interaction and gene expression data, Segal, Wang, and Koller, ISMB 2003. Computational discovery of gene modules and regulatory networks, Bar-Joseph et al., Nature Biotechnology, November 2003. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data, Tanay et al., PNAS, March 2004. Evidence for dynamically organized modularity in the yeast protein-protein interaction network, Han et al., Nature, July 2004. Genomic analysis of regulatory network dynamics reveals large toplogical changes, Luscombe et al., Nature, October 2004.