Chapter 8: The Topology of Biological Networks 8.1 Introduction & survey of network topology Prof. Yechiam Yemini (YY) Computer Science Department Columbia University A gallery of networks Small-world network models Network motifs Overview 2 1
A Gallery of Networks 3 Network abstractions Introduction Node: biological object Edge: interaction between nodes Regulatory networks Node: genes; directed edge: regulatory interaction Metabolic networks Node: metabolites; edges: Protein networks Node: protein; edge: interaction Node: module; edge: interaction Node: complex; edge: sharing a protein Node: residue; edge: folding neigbors What Can Network Abstractions Teach Us About Biological Systems? 4 2
Node= gene Edge= co-expressed Yeast Co-expression 5 The E-Coli Regulatory Network Node= TFs Edge= Regulatory interaction http://www.biomedcentral.com/1471-2105/5/199 Hierarchical structure of E. coli transcriptional regulatory network. A: The original unorganized network. B: the hierarchical regulation structure. The global regulators found in this work are shown in red. The yellow nodes are operons in the longest regulatory pathway related with flagella motility. Hierarchical structure and modules in the Escherichia coli transcriptional regulatory network revealed by a new top-down approach Hong-Wu Ma, Jan Buer, and An-Ping Zeng 6 3
Another View of E. Coli Regulatory Net Sergei Maslov; quoted from U. Alon http://www.cmth.bnl.gov/~maslov/rockefeller_2002_networks.ppt 7 Node= TFs Edge= Regulatory interaction Yeast Regulation http://www.biochemj.org/bj/381/0001/bj3810001.htm Charting gene regulatory networks: strategies, challenges and perspectives Gong-Hong WEI, De-Pei LIU1 and Chih- Chuan LIANG ; Biochem J. 2004 (381) The colour scheme depicts functional category: orange, mitotic cell cycle; pink, budding and filament formation; green, amino acid metabolism; yellow, nitrogen and sulphur utilization; blue, C-compound and carbohydrate utilization; red, TFs; grey, unspecific or several functional categories. 8 4
Sergei Maslov Yeast Regulatory Network http://www.cmth.bnl.gov/~maslov/rockefeller_2002_networks.ppt 9 Regulatory Network of Homo Sapiens Sergei Maslov http://www.cmth.bnl.gov/~maslov/rockefeller_2002_networks.ppt 10 5
E. Coli Metabolic Network Ravasz et al Science Vol 297, 2002 Node= Metabolites (colors for types) Edge= Reaction 11 Node= proteins Edge= interaction Yeast P2P Interaction Network http://www.imb-jena.de/tsb/yeast.html http://www.macdevcenter.com/pub/a/mac/2004/08/20/bioinformatics.html 12 6
The Yeast PX Domain P2P Network http://itgmv1.fzk.de/www/itg/uetz/domains/px/html/px map.html Node= domain Edge= interaction 13 Yeast P2P Domain Interaction Network Node= domain Edge= interaction http://www.utoronto.ca/boonelab/proteomics.htm (A) Yeast SH3 domain protein-protein network; proteins are colored according to their k-core value (6-core = black, 5-core = cyan, 4-core = blue, 3-core = red, 2- core = green, 1-core = yellow), identifying subnets in which each protein has at least k interactions. By definition, lower core numbers encompass all higher core numbers (e.g. 4-core subgraph includes 4-core, 5- core and 6-core). The 6-core subgraph is highlighted in red and depicted in (B). 14 7
A Network of Protein Complexes http://www.genomenewsnetwork.org/articles/01_02/yeast_proteins_image1.shtml The protein complex network. Links connect complexes sharing at least one protein. Cellular roles are colour coded: red, cell cycle; dark green, signalling; dark blue, transcription, DNA maintenance, chromatin structure; pink, protein and RNA transport; orange, RNA metabolism; light green, protein synthesis and turnover; brown, cell polarity and structure; violet, intermediate and energy metabolism; light blue, membrane biogenesis and traffic. The lower panel is an example of a complex (yeast TAP-C212) linked to two other complexes (yeast TAP-C77 and TAP-C110) by shared components. It illustrates the connection between the protein and complex levels of organization. Red lines indicate physical interactions as listed in YPD22. 15 What Can Network Abstractions Teach? Network abstractions hide biological specifics Can they be used to infer biologically significant knowledge? Global statistical organizational principles: Are nodes equal? How did such networks evolve? Is there an underlying modular structure? Are these networks resilient to random failures? Local organizational principles: Are there network motifs? Are there conserved network homologies? 16 8
Statistical Topology Features 17 Random Networks (Erdos Renyi, 1959) G(n,p) a graph on n nodes where an edge has probability p Toss a coin with probability p to select an edge Average degree d=p(n-1)~pn m Probability of k edges (m=n(n-1)/2): k p k (1-p) [m-k] ~ (d k /k!)exp(-d) G(n,p(n)) has a property F, if p(g(n,p(n) F) 1 when n Main result: many properties F have threshold behavior There exists p*(n) such that if p(n)/p*(n)>1 p(g(n,p(n) F) 1 and if p(n)/p*(n)<1 then p(g(n,p(n) F) 0 Example: F=connectivity p*(n)=(1/n)ln(n) As p(n) increases towards p*(n) the graph grows a giant component 18 9
Topology Measures Degree distribution Clustering Path length C=2q/(k(k-1))= fraction of clique filled Poisson C=p L~ ln N L p(k)~ d k /k! C 19 ER Does Not Model Many Real-World Networks Watts-Strogatz (98) many real-networks have: (A) high degree of clustering (cliquishness) and (B) short average length (small-world separation) Network C C rand L N WWW 0.1078 0.00023 3.1 153127 Internet 0.18-0.3 0.001 3.7-3.76 3015-6209 Actor 0.79 0.00027 3.65 225226 Coauthors 0.43 0.00018 5.9 52909 Metabolic 0.32 0.026 2.9 282 Foodweb 0.22 0.06 2.43 134 C. elegan 0.28 0.05 2.65 282 20 10
Watts-Strogatz Small World Networks Start with a deterministic k-regular ring Rewire connection with probability p Converges to Random network p L=100 d=49.51 C=0.67 L=14 d=11.1 C=0.63 L=5 d=4.46 C=0.01 21 Many Real Networks Have Power-Law Degree Distribution Albert, Jeong, Barabasi 99 P(k) = k - γ A: actors γ =2.3 B: WWW γ =2.67 C: power grid γ =4 frequency 0-1 0 2 4 6 8-2 -3-4 -5-6 -7-8 -9-10 degree Faloutsos & Faloutsos 99 Internet graph AS graph: γ =2.1 Routers: γ =2.48 22 11
Scale Free vs. Random Poisson distribution Power-law distribution Exponential Network Scale-free Network 23 Scale Free Network Examples (Barabasi 01) 24 12
Scale Free Networks Preferential attachment: A new node connects to node i with probability: where k i is the degree of i Results in power-law degree distribution: p(k)~ k γ γ=3 for the evolution rule above Contrast with ER: p(k)~γ k /k! Developed extensive theory of SF nets γ=2 hub-and-spoke topology 2<γ<3 small number of hubs γ>3 network is dispersed Topology measures: L~ln(lnN)) for 2<γ<3 C(k) constant π ( k ) i P(k) ~k -3 ki = Σ k Scale Free = Power law distribution j j 25 Global Topology Features 26 13
Characterizing Metabolic Networks 27 Metabolic Nets Have Power Law Distribution 28 14
Clustering Metabolic networks E. Ravasz et al., Science, 2002 Protein networks 29 The P53 Tumor Supressor Network Vogelstein et al, Nature 2000 30 15
Topology of Protein Network: Adapted Evolution Model γ k + k0 P( k) ~ ( k + k0) exp( ) k τ H. Jeong, S.P. Mason, A.-L. Barabasi & Z.N. Oltvai, Nature, 2001 31 Robustness: SF Nets Are Robust WRT Failures 1 Albert, Jeong, Barabási Nature 406, 378 (2000) Maintain connectivity and topological features through loss S f c node failure 0 1 Fraction of removed nodes, f 1 S Failures 0 f 1 32 16
How About Targeted Attacks? SF networks are sensitive to attacks on hubs 1 S 0 f 1 f c Disease analysis; drug design 33 Robustness of The Yeast Protein Network H. Jeong et al., Nature, 2001 Highly connected proteins are more essential (lethal)... 34 17
Robustness of The Yeast Protein Network Node Failure: Red: lethal Green: robust Yellow: unknown 35 Biological Networks Chung, Dewey, Lu, Galas, D.J., Journal of Computational Biology(2003) Network Approx. Exponent γ NON-BIOLOGICAL Internet 2.1 (in), 2.5 (out) Citations 3 Actors 2.3 Power-grid 4 Phone calls 2.1-2.3 BIOLOGICAL Yeast Protein-Protein Net 1.5, 1.6, 1.7, 2.5 E.coli Metabolic Net 1.7, 2.2 Yeast Gene Expression Net 1.4-1.7 Gene functional interactions 1.6 Evolution through duplication can explain γ<2 36 18
Gene Duplication Networks Pastor-Satorras, Smith & R. V. Sole, Evolving protein interaction networks through gene duplication, Santa Fe Institute Working Paper 02-02-008, 2002 Scale Free + Small world 1 1 10 100 0.1 log P(k) 0.01 0.001 log k 37 Is The Metabolic Network of E-Coli Small? Fell, D. A. & Wagner, A. (2000) Nat. Biotechnol. 18. Wagner, A. & Fell, D. A. (2001) Proc. R. Soc. London Ser. B 268,. Ma, H.-W. & Zeng, A.-P. (2003) Bioinformatics 19, 270 277. Masanori Arita. PNAS 101 (6): 1543-1547 38 19
Is The Metabolic Network Small? Masanori Arita. PNAS 101 (6): 1543-1547 Considered more detailed structural model Focus on carbon metabolism Filled bars: the direction of reactions is considered, AL = 8.4 Open bars: all reactions are considered reversible, AL = 8.0 8, much larger than that of a random graph 39 Local Topology Features: Network Motifs (Shen Orr, Milo, Mangan, Alon 02) 40 20
Discovering Network Motifs Working on an adjacency matrix representation Look for all possible two- or three-node configurations Eg 13 possible 3-node subsets: Look for patterns which occur significantly more frequently in real than in equivalent randomized networks 41 E-Coli Regulatory Network 42 21
E-Coli Regulatory Motifs 43 Yeast Regulatory Network Motifs Lee et al, Science 2002 Cell Cycle Developmental Biosynthesis DNA/RNA/Prot Environment Metabolism 44 22
Yeast Regulatory Network Motifs Lee et al, Science 2002 45 Motifs Of The Yeast Protein Network S. Wuchty, Z. Oltvai & A.-L. Barabasi, Nature Genetics, 2003 46 23
Motifs In Protein Networks 47 Modular Structure of Protein Domain Net http://www.utoronto.ca/boonelab/proteomics.htm (A) Yeast SH3 domain protein-protein network; proteins are colored according to their k-core value (6-core = black, 5-core = cyan, 4-core = blue, 3-core = red, 2- core = green, 1-core = yellow), identifying subnets in which each protein has at least k interactions. By definition, lower core numbers encompass all higher core numbers (e.g. 4-core subgraph includes 4-core, 5- core and 6-core). The 6-core subgraph is highlighted in red and depicted in (B). 48 24
Network Homology Through Path Alignment Sharan et al. RECOMB2004 49 Network Homology: Path & Cluster Alignment Yeast Worm Fly 50 25
Nodular Structure of Conserved Clusters Sharan et al. PNAS 2005 51 Yeast Regulatory Network Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA & Gerstein M (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431: 308-312. 52 26
Comprehensive Dataset Available Transcription Factors Very complex network 3420 genes, 142 TFs 7074 regulatory interactions Simplify using graphtheoretic statistics: Global topological measures Local network motifs Target Genes 53 Global Topology Measures Connectivity: Ingress degree: 2.1 each gene is regulated by ~ 2TFs Egress degree 49.8 each TF targets ~ 50 genes Degree distribution: power law (scale free) Clustering coefficient: 0.11 (low local density) 4 neighbours 1 existing link 6 possible links Clustering coefficient= 1/6 = 0.17 54 27
Partition Network Into Activity Subnets Active subnet computations: Start with active genes Compute TFs that influence them Compute closure of TFs that influence current graph Cellular Activity No. genes Cell cycle 437 Sporulation 876 Diauxic shift 1,876 DNA damage 1,715 Stress response 1,385 55 Activity Subnets Cell cycle Sporulation Diauxic shift DNA damage Stress Multi-stage activities Binary state 56 28
Do Global Topology Features Vary By Activity? Literature: Network topologies are perceived to be invariant Scale-free, small-world, and clustered Different biological networks and genomes Random expectation: Sample different size sub-networks from complete network and calculate topological measures incoming degree path length clustering coefficient outgoing degree random network size 57 Outgoing degree Binary conditions greater connectivity Multi-stage conditions lower connectivity Multi-stage: Controlled, ticking over of genes at different stages Binary: Quick, large-scale turnover of genes 58 29
Incoming degree Binary conditions smaller connectivity less complex TF combinations Multi-stage conditions larger connectivity more complex TF combinations Multi-stage Binary 59 Path length Binary conditions shorter path-length faster, direct action Multi-stage conditions longer path-length slower, indirect action Multi-stage Binary 60 30
Clustering coefficient Binary conditions smaller coefficients less TF-TF inter-regulation Multi-stage conditions larger coefficients more TF-TF inter-regulation Multi-stage Binary 61 Do Local Topology Features Vary By Activity? Literature: motif usage is well conserved for regulatory networks across different organisms [Alon] Random expectation: sample sub-nets for motif occurrence single input motif multiple input motif feed-forward loop random network size 62 31
Network Motifs Statistics Motifs Cell cycle Sporulati on Diauxic shift DNA damage Stress response SIM 32.0% 38.9% 57.4% 55.7% 59.1% MIM 23.7% 16.6% 23.6% 27.3% 20.2% FFL 44.3% 44.5% 19.0% 17.0% 20.7% 63 summary multi-stage conditions binary conditions fewer target genes longer path lengths more inter-regulation between TFs more target genes shorter path lengths less inter-regulation between TFs 64 32
Final Notes 65 Challenges & Opportunities Improved understanding of network evolution Similarity, conservation Selection (fitness), motifs.. Modularity Network-sequence relationships Network-structure (folding) relationships Applications: drug-design 66 33