Systems Biology Workshop Systems biology and biological networks Center for Biological Sequence Analysis
Networks in electronics Radio kindly provided by Lazebnik, Cancer Cell, 2002 Systems Biology Workshop, May 14th-15th, 2009
Understanding systems YER001W YBR088C YOL007C YPL127C YNR009W YDR224C YDL003W YBL003C YDR097C YBR089W YBR054W YMR215W YBR071W YBL002W YNL283C YGR152C Parts List Interactions Sequencing Gene knock-out Microarrays etc. Genetic interactions Protein-Protein interactions Protein-DNA interactions Subcellular Localization Microarrays Proteomics Metabolomics Model Generation Dynamics Radio kindly provided by Lazebnik, Cancer Cell, 2002 Cell model from Andersen et al., Molecular Systems Biology 4:178 Systems Biology Workshop, May 14th-15th, 2009
Networks
Co-authorship on scientific articles Co-authorships at the Max Planck Institute http://www.jeffkennedyassociates.com:16080/connections/concept/image.html
Networks in Molecular Biology Protein-Protein interactions Protein-DNA interactions Genetic interactions Metabolic reactions Co-expression interactions Text mining interactions Association Networks Etc. Barabasi & Oltvai, Nature Reviews, 2004
Types of networks
Network Theory
Graphs Graph G=(V,E) is a set of vertices V and edges E A subgraph G of G is induced by some V V and E E Graph properties: Connectivity (node degree, paths) Cyclic vs. acyclic Directed vs. undirected
Sparse vs Dense G(V, E) where V =n, E =m the number of vertices and edges Graph is sparse if m~n Graph is dense if m~n 2 Complete graph when m=n 2
Paths A path is a sequence {x 1, x 2,, x n } such that (x 1,x 2 ), (x 2,x 3 ),, (x n-1,x n ) are edges of the graph. A closed path x n =x 1 on a graph is called a graph cycle or circuit.
Protein network representations gene A regulates gene B regulatory interactions (protein-dna) gene A binds gene B functional complex B is a substrate of A (protein-protein) gene A reaction product is a substrate for gene B metabolic pathways
Degree or connectivity
Random versus scale-free networks P(k) is probability of each degree k, i.e fraction of nodes having that degree. For random networks, P(k) is normally distributed. For real networks the distribution is often a powerlaw: P(k) ~ k γ Such networks are said to be scale-free
Clustering Coefficient The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularity C I = ni k 2 = k k: neighbors of I 2n I ( k 1) The center node has 8 (grey) neighbors There are 4 edges between the neighbors n I : edges between node I s neighbors C = 2*4 /(8*(8-1)) = 8/56 = 1/7
Hierarchical Networks
Detecting hierarchical organization
Knock-out lethality and connectivity 1.0E+01 60 P (k ) 1.0E+00 1.0E-01 1.0E-02 1.0E-03 y = 1.2x -1.91 % Essential Genes 50 40 30 20 10 1.0E-04 1 10 100 0 0 5 10 15 20 25 Degree k Degree k
The Swedish sex web Target the hubs to have an efficient safe sex education campaign
Terrorists form networks http://www.state.gov/secretary/former/powell/photos/2003/17266.htm
Protein-Protein Interactions
Characterization of physical interactions Obligation obligate (protomers only found/function together) non-obligate (protomers can exist/function alone) Lifetime of interaction permanent (complexes, often obligate) strong transient (require trigger, e.g. G proteins) weak transient (dynamic equilibrium)
obligate, permanent non-obligate, strong transient
Protein-protein interaction data is accumulating
Experimental detection of PPI Two of the most commonly used techniques Yeast-2-hybrid (Y2H) Mass spec pull-down
Transcription factor An activating transcription factor: 1. Binds to DNA using a DNA-binding domain (DBD) 2. Recruits the transcriptional machinery using a transcriptional activation domain (AD)
Yeast Two-Hybrid Method Y2H assays interactions in vivo. Uses property that transcription factors generally have separable transcriptional activation (AD) and DNA binding (DBD) domains. A functional transcription factor can be created if a separately expressed AD can be made to interact with a DBD. A protein bait B is fused to a DBD and screened against a library of protein preys, each fused to a AD. Causier, Mass spectrometry Reviews, 2004
Issues with Y2H Strengths high sensitivity (transient & permanent PPIs) takes place in vivo independent of endogenous expression Weaknesses: False positive interactions Auto-activation sticky prey detects possible interactions that may not take place under real physiological conditions may identify indirect interactions (A-C-B) Weaknesses: False negatives interactions Similar studies often reveal very different sets of interacting proteins (i.e. False negatives) may miss PPIs that require other factors to be present (e.g. ligands, proteins, PTMs)
Affinity Purification
Mass spectrometry Aebersold & Mann, Nature, 2003
Noise in data Uetz et al. : 6144 prey X 5345 baits Overlap Ito et al. : ~ 6200 prey X ~ 6200 baits Y2H 692 Interactions 551 141 700 841 Interactions 30-50 % Gavin et al. : 1167 baits False positive rate in proteinprotein interaction data based on curated complexes Ho et al. : 725 Mass Spec 3225 interactions among 1440 proteins 3007 198 3419 3617 interactions among 1578 proteins Overlap
Scoring proteinprotein interactions
Topology based scoring of interactions Yeast two-hybrid D A B C High confidence (1 unshared interaction partners) Low confidence (4 unshared interaction partners) Complex pull-downs Low confidence (rarely purified together) High confidence (often purified together) de Lichtenberg et al., Science, 2005
Benchmarking interaction-scores
Reducing the error rate in PPI data Benchmark by measuring the overlap between the curated MIPS complexes and different PPI data sets derived from high-throughput screens. Overlap with curated MIPS complexes Estimated error rate de Lichtenberg et al., Science, 2005
Filtering by subcellular localization de Lichtenberg et al., Science, 2005
Guilty by association
Friends are similar to you Sig mig, hvem du omgås, så skal jeg skal sige dig, hvem du er Systems Biology Workshop, May 14th-15th, 2009
Interacting proteins often share biological role Ribosomal biogenesis In yeast, many steps of ribosome biogenesis are relatively well described In mammals, by contrast, only few genes have been identified Systems Biology Workshop, May 14th-15th, 2009
Summary Models in Systems Biology can be constructed by integrating data Such models are often depicted as biological networks The networks have properties and characteristics Nodes, edge, node degree, clustering coefficient Protein-protein interactions can be experimentally determined Y2H and Mass spec High through-put data is noisy and should be cleaned Interacting proteins are often involved in the same process