..OMICS.. Today, people talk about: Genomics Variomics Transcriptomics Proteomics Interactomics Regulomics Metabolomics

Similar documents
Systems biology and biological networks

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

networks in molecular biology Wolfgang Huber

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Biological Networks. Gavin Conant 163B ASRC

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Complex (Biological) Networks

Measuring TF-DNA interactions

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

Predicting Protein Functions and Domain Interactions from Protein Interactions

Cellular Biophysics SS Prof. Manfred Radmacher

Erzsébet Ravasz Advisor: Albert-László Barabási

Self Similar (Scale Free, Power Law) Networks (I)

V 5 Robustness and Modularity

Types of biological networks. I. Intra-cellurar networks

Complex (Biological) Networks

Biological Networks Analysis

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

SYSTEMS BIOLOGY 1: NETWORKS

Chapter 8: The Topology of Biological Networks. Overview

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Graph Theory and Networks in Biology

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006

V 6 Network analysis

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Protein-protein interaction networks Prof. Peter Csermely

The architecture of complexity: the structure and dynamics of complex networks.

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Overview of Network Theory

An introduction to SYSTEMS BIOLOGY

Proteomics Systems Biology

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

The geneticist s questions

Biological networks CS449 BIOINFORMATICS

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Interaction Network Analysis

Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

How Scale-free Type-based Networks Emerge from Instance-based Dynamics

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods

Analysis of Biological Networks: Network Robustness and Evolution

Identifying Signaling Pathways

Graph Theory Approaches to Protein Interaction Data Analysis

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Graph Alignment and Biological Networks

Introduction to Bioinformatics

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION

Structure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction Networks

BioControl - Week 6, Lecture 1

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

What is Systems Biology

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Discovering molecular pathways from protein interaction and ge

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Grundlagen der Systembiologie und der Modellierung epigenetischer Prozesse

Towards Detecting Protein Complexes from Protein Interaction Data

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Clustering and Network

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Computational methods for predicting protein-protein interactions

Comparative Network Analysis

Bioinformatics 2. Large scale organisation. Biological Networks. Non-biological networks

BIOINFORMATICS CS4742 BIOLOGICAL NETWORKS

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Preface. Contributors

Design and characterization of chemical space networks

Graph Theory Properties of Cellular Networks

Biological Concepts and Information Technology (Systems Biology)

Markov Random Field Models of Transient Interactions Between Protein Complexes in Yeast

Network Biology-part II

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

Basic modeling approaches for biological systems. Mahesh Bule

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Fine-scale dissection of functional protein network. organization by dynamic neighborhood analysis

Mini course on Complex Networks

Functional Characterization and Topological Modularity of Molecular Interaction Networks

Automated Assignment of Backbone NMR Data using Artificial Intelligence

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

7.32/7.81J/8.591J: Systems Biology. Fall Exam #1

Supplementary Information

Analysis and Simulation of Biological Systems

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks

Lecture notes for /12.586, Modeling Environmental Complexity. D. H. Rothman, MIT October 20, 2014

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

Central postgenomic. Transcription regulation: a genomic network. Transcriptome: the set of all mrnas expressed in a cell at a specific time.

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Complex networks: an introduction

Unravelling the biochemical reaction kinetics from time-series data

MTopGO: a tool for module identification in PPI Networks

Computational Systems Biology

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Transcription:

...OMICS

..OMICS.. Today, people talk about: Genomics Variomics Transcriptomics Proteomics Interactomics Regulomics Metabolomics Many more at: http://www.genomicglossaries.com/content/omes.asp

OMICS AND HIGH-THROUGHPUT METHODS

WHICH EXPERIMENTAL METHODS ARE SUITABLE FOR: Genomics? NGS sequencing, DNA Microarrays Variomics? NGS sequencing, DNA Microarrays Transcriptomics? NGS sequencing, DNA Microarrays Proteomics? 2D gels, Protein arrays, Mass Spectrometry, Isotope tag Protein-protein Interactomics? Yeast 2-hybrid, affìnity purification + Mass Spec Protein-DNA Interactomics? Chromatine immunoprecipitation on chip, ChIP-Seq Metabolomics? Mass Spec, NMR

WHICH ARE THE PROS AND CONS OF THE ALTERNATIVE METHODS? TO WHICH LEVEL DO THE DIFFERENT METHODS AGREE?

INTERACTOMICS: FINDING THE INTERACTING PROTEINS Yeast two hybrids

CHARACTERIZATION OF PHYSICAL INTERACTIONS Obligation obligate (protomers only found/function together) non-obligate (protomers can exist/function alone) Time of interaction permanent (complexes, often obligate) strong transient (require trigger, e.g. G proteins) weak transient (dynamic equilibrium)

EXAMPLES: GPCR obligate, permanent non-obligate, strong transient ol

APPROACHES BY INTERACTION TYPE Physical Interactions Yeast two hybrid screens Affinity purification (mass spec) Other measures of association Genetic interactions (double deletion mutants) Genomic context (STRING)

YEAST TWO-HYBRID METHOD Y2H assays interactions in vivo. Uses property that transcription factors generally have separable transcriptional activation (AD) and DNA binding (DBD) domains. A functional transcription factor can be created if a separately expressed AD can be made to interact with a DBD. A protein bait B is fused to a DBD and screened against a library of protein preys, each fused to a AD.

YEAST TWO-HYBRID METHOD Ito et al., Trends Biotechnol. 19, S23 (2001)

The Protein Interaction Network of Yeast Uetz et al, Nature 2000

The Protein Interaction Network of Drosophila Giot et al, Science 2003

ISSUES WITH Y2H Strengths High sensitivity (transient & permanent PPIs) Takes place in vivo Independent of endogenous expression Weaknesses: False positive interactions Auto-activation sticky prey Detects possible interactions that may not take place under real physiological conditions May identify indirect interactions (A-C-B) Weaknesses: False negatives interactions Similar studies often reveal very different sets of interacting proteins (i.e. False negatives) May miss PPIs that require other factors to be present (e.g. ligands, proteins, PTMs)

DIFFERENT Y2H EXPERIMENTS GIVE DIFFERENT RESULTS. Deane et al, Mol Cell Proteomics 1:349 (2002)

DIFFERENT Y2H EXPERIMENTS GIVE DIFFERENT RESULTS. A Venn diagram illustrates the overlap between the datasets in YEAST-DIP. Each oval represents a high throughput Y2H study, and the overlaps between the Y2H studies are given at the intersections. The number in parentheses represents those interactions that have been determined by small scale methods (see "Experimental Procedures" for more details). Thus, the numbers within parentheses represent the INT set. Notice the small overlap among the datasets. Deane et al, Mol Cell Proteomics 1:349 (2002)

Y2H FOR MEMBRANE PROTEINS Fields, FEBS Journal 272, 5391 (2005)

EXERCISE: Y2H Draw the correspondent graph

INTERACTOMICS: FINDING THE INTERACTING PROTEINS Mass spectrometry

PROTEIN INTERACTIONS BY IMMUNO-PRECIPITATION FOLLOWED BY MASS SPECTROMETRY Start with affinity purification of a single epitope-tagged protein This enriched sample typically has a low enough complexity to be fractionated on a standard polyacrylamide gel. Individual bands can be excised from the gel and identified with mass spectrometry. Pier Luigi Martelli- Systems and in Silico Biology - 2014-2015

PROTEIN INTERACTIONS BY IMMUNO-PRECIPITATION FOLLOWED BY MASS SPECTROMETRY Kumar & Snyder, Nature 415, 123 (2002)

TANDEM AFFINITY PURIFICATION LA Huber Nature Reviews Molecular Cell Biology 4, 74-80 (2003)

AFFINITY PURIFICATION Strengths High specificity Well suited for detecting permanent or strong transient interactions (complexes) Detects real, physiologically relevant PPIs Weaknesses Less suited for detecting weaker transient interactions (low sensitivity) May miss complexes not present under the given experimental conditions (low sensitivity) May identify indirect interactions (A-C-B)

Y2H Y2H MS MS Franzot & Carugo, J Struct Funct Biol 4, 245 (2004)

Caveat: different experiments give different results Titz et al, Exp Review Proteomics, 2004

DIFFERENT INFORMATION HAVE TO BE CROSSED TO LOWER THE ERROR RATE The fraction of interactions in which both partners have the same protein localization. Here, only proteins clearly assigned to a single category are considered Von Mering, Nature,2002

INTERACTOMICS: FINDING THE INTERACTING PROTEINS Genomic methods

Rost et al.cellular Molecular Life Sciences, 2003, 60:2637-2650 http://cubic.bioc.columbia.edu/papers/2003_rev_func/paper.html

REGULOMICS: FINDING THE TRANSCRIPTION NETWORK ChIP-chip: Chromatine ImmunoPrecipitation on chip ChIPSeq: Chromatine ImmunoPrecipitation coupled to NGS

CHIP-CHIP MEASUREMENT OF PROTEIN- DNA INTERACTIONS Simon et al., Cell 2001

CHIP-CHIP MEASUREMENT OF PROTEIN- DNA INTERACTIONS Lee et al., Science 2002

CHIP-SEQ MEASUREMENT OF PROTEIN- DNA INTERACTIONS Szalkowski, A.M, and Schmid, C.D.(2010). Rapid innovation in ChIP-seq peak-calling algorithms is outdistancing banchmarking efforts. Briefings in Bioinfomatics.

MAPPING TRANSCRIPTION FACTOR BINDING SITES Harbison C., Gordon B., et al. Nature 2004

PROMOTER ARCHITECTURES Harbison C., Gordon B., et al. Nature 2004

TRANCRIPTION NETWORKS Babu et al., Curr. Opin. Struct. Biol. 14, 283 (2004)

REGULATION OF TRANSCRIPTION FACTORS Lee et al., Science 298, 799 (2002)

METABOLOMICS: FINDING THE CORRELATIONS AMONG METABOLITES Chromatography-Mass spectroscopy, NMR

Separation methods Gas chromatography, especially when interfaced with mass spectrometry (GC-MS), is one of the most widely used and powerful methods It offers very high chromatographic resolution, but requires chemical derivatization for many biomolecules: only volatile chemicals can be analysed without derivatization. (Some modern instruments allow '2D' chromatography, using a short polar column after the main analytical column, which increases the resolution still further.) Some large and polar metabolites cannot be analysed by GC. High performance liquid chromatography (HPLC). Compared to GC, HPLC has lower chromatographic resolution, but it does have the advantage that a much wider range of analytes can potentially be measured. Capillary electrophoresis (CE). CE has a higher theoretical separation efficiency than HPLC, and is suitable for use with a wider range of metabolite classes than is GC. As for all electrophoretic techniques, it is most appropriate for charged analytes. Wikipedia: Metabolomics

Detection methods Mass spectrometry (MS) is used to identify and to quantify metabolites after separation by GC, HPLC (LC-MS), or CE. GC-MS is the most 'natural' combination of the three, and was the first to be developed. In addition, mass spectral fingerprint libraries exist or can be developed that allow identification of a metabolite according to its fragmentation pattern. MS is both sensitive (although, particularly for HPLC-MS, sensitivity is more of an issue as it is affected by the charge on the metabolite, and can be subject to ion suppression artifacts) and can be very specific. There are also a number of studies which use MS as a stand-alone technology: the sample is infused directly into the mass spectrometer with no prior separation, and the MS serves to both separate and to detect metabolites. Nuclear magnetic resonance (NMR) spectroscopy. NMR is the only detection technique which does not rely on separation of the analytes, and the sample can thus be recovered for further analyses. All kinds of small molecule metabolites can be measured simultaneously - in this sense, NMR is close to being a universal detector. The main advantages of NMR are high analytical reproducibility and simplicity of sample preparation. Practically, however, it is relatively insensitive compared to mass spectrometry-based techniques. Wikipedia: Metabolomics

GC-MS Weckwerth, Annu. Rev. Plant Biol. 54, 669 (2003)

FINDING CORRELATION BETWEEN THE METABOLITE CONTENT Weckwerth, Annu. Rev. Plant Biol. 54, 669 (2003)

METABOLITE NETWORK Weckwerth, Annu. Rev. Plant Biol. 54, 669 (2003)

IS THE CORRELATION EVALUATION SUFFICIENT?

COLLECT THE MEASURES FOR VARIABLES X AND Y X Y x 1 y 1 x 2 y 2 x 3 y 3 x 4 y 4 x 5 y 5 x n y n Covariance: 1 cov( X, Y) n 1 Y Linear regression a a b XY XY X Y b a n X i 1 cov X, Y 2 XY x x x y y i i If correlation is significant enough: X Y

COLLECT THE MEASURES FOR VARIABLES Y AND Z Y Z y 1 z 1 y 2 z 2 y 3 z 3 y 4 z 4 y 5 z 5 y n z n Covariance: 1 cov( Y, X ) n 1 Linear regression Z a a c YZ YZ Y Z c a cov Y, 2 YZ Y Y n i 1 Z y y z z i i If correlation is significant enough: Y Z

Z WHAT ABOUT THE X AND Z? X In general: Y Z a YZ Y Z a XZ IF we suppose that Z depends on X ONLY indirectly, via Y, a XZ =0 : X c' a X b c' a a X d a X d a Y c' a ~ YZ YZ XY So, Z and X have a regression with coefficient a XY a YZ a~ XZ cov YZ XY X, Z cov X, Y cov Y, Z 2 X 2 X 2 Y XZ

WHAT ABOUT THE X AND Z? X cov X X,, Z Z Y cov cov Z X, Y cov Y, Z X, Y cov Y, Z 2 Y X 2 Y Z X, Y Y, Z X and Z have a correlation index equal to the product of correlation indexes, even if there is not direct relation between them. Correlation is not sufficient to establish the direct relation between the variables

EXAMPLE σ 2 X=4, σ 2 Y=3, cov(x,y) =2 σ 2 Y=3, σ 2 Z=6, cov(z,y) =1.5 If X and Z are not directly dependent: X, Y cov Y, Z cov 2 1.5 cov X, Z 1 2 3 So the overall covariance matrix is: COV 4 2 1 Y 2 3 1.5 1 1.5 6

( x EXAMPLE COV 4 2 1 2 3 1.5 Gaussian model:, COV ) 2 1 1 1.5 6 3 1 2 2 2 COV 1 exp T 1 x COV x The inverse of the correlation matrix (called precision matrix, K) is involved

EXAMPLE K COV COV 4 2 1 1 2 3 1.5 1 1.5 6 0.375 0.250 0 0.250 0.548 0.095 0 0.095 0.190 The element of the precision matrix corresponding to the non directly related variable vanishes!

DIRECT LINKS ARE RECOVERED FROM THE PRECISION MATRIX Given a set of sample describe by variables X 1 X 2 X 3 X 4 X N Compute the Covariance Matrix Compute the Precision Matrix (K) as the inverse of the covariance matrix The partial correlation indexes between pairs of variables X i X j, with i j is ~ X i, X j K K ii ij K jj

PARTIAL CORRELATION COEFFICIENT Formally, the partial correlation between X and Y given a set of n controlling variables Z = {Z 1, Z 2,..., Z n }, written ρ XY Z, is the correlation between the residuals R X and R Y resulting from the linear regression of X with Z and of Y with Z, respectively.

APPLICATION IN METABOLOMICS ANALYSIS In our new approach we propose the application of a Gaussian graphical model (GGM), an undirected probabilistic graphical model estimating the conditional dependence between variables. GGMs are based on partial correlation coefficients, that is pairwise Pearson correlation coefficients conditioned against the correlation with all other metabolites. We first demonstrate the general validity of the method and its advantages over regular correlation networks with computersimulated reaction systems. Then we estimate a GGM on data from a large human population cohort, covering 1020 fasting blood serum samples with 151 quantified metabolites. The GGM is much sparser than the correlation network, shows a modular structure with respect to metabolite classes, and is stable to the choice of samples in the data set. On the example of human fatty acid metabolism, we demonstrate for the first time that high partial correlation coefficients generally correspond to known metabolic reactions. This feature is evaluated both manually by investigating specific pairs of high-scoring metabolites, and then systematically on a literature-curated model of fatty acid synthesis and degradation. Our method detects many known reactions along with possibly novel pathway interactions, representing candidates for further experimental examination. Krumsiek et al. BMC Systems Biology 2011, 5:21 http://www.biomedcentral.com/1752-0509/5/21

Krumsiek et al. BMC Systems Biology 2011, 5:21 http://www.biomedcentral.com/1752-0509/5/21

Krumsiek et al. BMC Systems Biology 2011, 5:21 http://www.biomedcentral.com/1752-0509/5/21

Krumsiek et al. BMC Systems Biology 2011, 5:21 http://www.biomedcentral.com/1752-0509/5/21

Krumsiek et al. BMC Systems Biology 2011, 5:21 http://www.biomedcentral.com/1752-0509/5/21

APPLICATION IN PROTEIN STRUCTURE PREDICTION Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, et al. (2011). PLoS ONE 6(12): e28766. doi:10.1371/journal.pone.0028766

APPLICATION IN PROTEIN STRUCTURE PREDICTION Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, et al. (2011). PLoS ONE 6(12): e28766. doi:10.1371/journal.pone.0028766

APPLICATION IN PROTEIN STRUCTURE PREDICTION Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, et al. (2011). PLoS ONE 6(12): e28766. doi:10.1371/journal.pone.0028766

APPLICATION IN PROTEIN STRUCTURE PREDICTION See also PSICOV: Jones D, Buchan DWA, Cozzetto D, Pontil M, Bioinformatics 28:184-190 (2012)

How to Describe a System As a Whole? Networks - The Language of Complex Systems

Air Transportation Network

The World Wide Web

Fragment of a Social Network (Melburn, 2004) Friendship among 450 people in Canberra

Biological Networks A. Intra-Cellular Networks Protein interaction networks Metabolic Networks Signaling Networks Gene Regulatory Networks Composite networks Networks of Modules, Functional Networks Disease networks B. Inter-Cellular Networks Neural Networks C. Organ and Tissue Networks D. Ecological Networks E. Evolution Network

The Protein Interaction Network of Yeast Yeast two hybrid Uetz et al., Nature 2000

Metabolic Networks Source: ExPASy

Gene Regulation Networks Abdollahi A et al., PNAS 2007

Networks derived from networks Goh,..,Barabasi (2007) PNAS 104:8685

Networks derived from networks Goh,..,Barabasi (2007) PNAS 104:8685

L-A Barabasi _ - - PROTEOME GENOME mirna regulation? protein-gene interactions protein-protein interactions Citrate Cycle METABOLISM Bio-chemical reactions

What is a Network? Network is a mathematical structure composed of points connected by lines Network Theory <-> Network Graph Graph Theory Nodes Vertices (points) Links Edges (Lines) A network can be build for any functional system System vs. Parts = Networks vs. Nodes

The 7 bridges of Königsberg The question is whether it is possible to walk with a route that crosses each bridge exactly once.

The representation of Euler The shape of a graph may be distorted in any way without changing the graph itself, so long as the links between nodes are unchanged. It does not matter whether the links are straight or curved, or whether one node is to the left or right of another. In 1736 Leonhard Euler formulated the problem in terms of abstracted the case of Königsberg: 1) by eliminating all features except the landmasses and the bridges connecting them; 2) by replacing each landmass with a dot (vertex) and each bridge with a line (edge).

The solution depends on the node degree 3 5 3 3 In a continuous path crossing the edges exactly once, each visited node requires an edge for entering and a different edge for exiting (except for the start and the end nodes). A path crossing once each edge is called Eulerian path. It possible IF AND ONLY IF there are exactly two or zero nodes of odd degree. Since the graph corresponding to Königsberg has four nodes of odd degree, it cannot have an Eulerian path.

The solution depends on the node degree 3 4 6 5 End Start 2 1 If there are two nodes of odd degree, those must be the starting and ending points of an Eulerian path.

Hamiltonian paths Find a path visiting each node exactly one Conditions of existence for Hamiltonian paths are not simple

Hamiltonian paths

Graph nomenclature Graphs can be simple or multigraphs, depending on whether the interaction between two neighboring nodes is unique or can be multiple, respectively. A node can have or not self loops

Graph nomenclature Networks can be undirected or directed, depending on whether the interaction between two neighboring nodes proceeds in both directions or in only one of them, respectively. 1 2 3 4 5 6 The specificity of network nodes and links can be quantitatively characterized by weights 2.5 7.3 3.3 12.7 5.4 2.5 Vertex-Weighted 8.1 Edge-Weighted

Graph nomenclature A network can be connected (presented by a single component) or disconnected (presented by several disjoint components). connected disconnected Networks having no cycles are termed trees. The more cycles the network has, the more complex it is. trees cyclic graphs

Graph nomenclature Paths Stars Cycles Complete Graphs

Large graphs = Networks

Statistical features of networks Vertex degree distribution (the degree of a vertex is the number of vertices connected with it via an edge)

Statistical features of networks Clustering coefficient: the average proportion of neighbours of a vertex that are themselves neighbours Node 4 Neighbours (N) 2 Connections among the Neighbours Clustering for the node = 2/6 Clustering coefficient: Average over all the nodes 6 possible connections among the Neighbours (Nx(N-1)/2)

Statistical features of networks Clustering coefficient: the average proportion of neighbours of a vertex that are themselves neighbours C=0 C=0 C=0 C=1

Statistical features of networks Given a pair of nodes, compute the shortest path between them Average shortest distance between two vertices Diameter: maximal shortest distance How many degrees of separation are they between two random people in the world, when friendship networks are considered?

Wutchy, Ravasz, Barabasi. In: Complex Systems in Biomedicine, Kluwer 2003

How to compute the shortest path between home and work? Edge-weighted Graph The exaustive search can be too much time-consuming

The Dijkstra s algorithm Fixed nodes NON fixed nodes Initialization: Fix the distance between Casa and Casa equal to 0 Compute the distance between Casa and its neighbours Set the distance between Casa and its NONneighbours equal to

The Dijkstra s algorithm Fixed nodes NON fixed nodes Iteration (1): Search the node with the minimum distance among the NON-fixed nodes and Fix its distance, memorizing the incoming direction

The Dijkstra s algorithm Iteration (2): Update the distance of NON-fixed nodes, starting from the fixed distances 4 Fixed nodes NON fixed nodes

The Dijkstra s algorithm The updated distance is different from the previous one Iteration: Fixed nodes NON fixed nodes Fix the NON-fixed nodes with minimum distance Update the distance of NON-fixed nodes, starting from the fixed distances.

The Dijkstra s algorithm Iteration: Fixed nodes NON fixed nodes Fix the NON-fixed nodes with minimum distance Update the distance of NON-fixed nodes, starting from the fixed distances.

The Dijkstra s algorithm Iteration: Fixed nodes NON fixed nodes Fix the NON-fixed nodes with minimum distance Update the distance of NON-fixed nodes, starting from the fixed distances.

The Dijkstra s algorithm Iteration: Fixed nodes NON fixed nodes Fix the NON-fixed nodes with minimum distance Update the distance of NON-fixed nodes, starting from the fixed distances.

The Dijkstra s algorithm Iteration: Fixed nodes NON fixed nodes Fix the NON-fixed nodes with minimum distance Update the distance of NON-fixed nodes, starting from the fixed distances.

The Dijkstra s algorithm Conclusion: Fixed nodes NON fixed nodes The label of each node represents the minimal distance from the starting node The minimal path can be reconstructed with a back-tracing procedure

Statistical features of networks Vertex degree (k) distribution Clustering coefficient (C) Average shortest distance between two vertices (L) Diameter: maximal shortest distance

Two reference models for networks Regular network (lattice) Random network (Erdös+Renyi, 1959) Regular connections Each edge is randomly set with probability p

with s=e/n Two reference models for networks Comparing networks with the same total number of nodes (N) and edges (E) Degree (k) distribution Poisson distribution k P k e k! Exp decay Average shortest path N High log (N) Low Average clustering 1.5 (s-1)/(2s-1) High 2s/N Low

Some examples for real networks Network size vertex degree shortest path Shortest path in fitted random graph Clustering Clustering in random graph Film actors 225,226 61 3.65 2.99 0.79 0.00027 MEDLINE coauthorship E.Coli substrate graph C.Elegans neuron network 1,520,251 18.1 4.6 4.91 0.43 1.8 x 10-4 282 7.35 2.9 3.04 0.32 0.026 282 14 2.65 2.25 0.28 0.05 Real networks are not regular (low shortest path) Real networks are not random (high clustering)

Adding randomness in a regular network Random changes in edges OR Addition of random links

Adding randomness in a regular network (rewiring) Networks with high clustering (like regular ones) and low path length (like random ones) can be obtained: SMALL WORLD NETWORKS (Strogatz and Watts, 1999)

Small World Networks A small amount of random shortcuts can decrease the path length, still maintaining a high clustering: this model explains the 6-degrees of separations in human friendship network

What about the degree distribution in real networks? Both random and small world models predict an approximate Poisson distribution: most of the values are near the mean; Exponential decay when k gets higher: P(k) e -k, for large k.

What about the degree distribution in real networks? In 1999, modelling the WWW (pages: nodes; link: edges), Barabasi and Albert discover a slower than exponential decay: P(k) k -a with 2 < a < 3, for large k

Scale-free networks Networks that are characterized by a power-law degree distribution are highly non-uniform: most of the nodes have only a few links. A few nodes with a very large number of links, which are often called hubs, hold these nodes together. Networks with a power degree distribution are called scale-free hubs It is the same distribution of wealth following Pareto s 20-80 law: Few people (20%) possess most of the wealth (80%), most of the people (80%) possess the rest (20%)

Hubs Attacks to hubs can rapidly destroy the network

Three non biological scale-free networks Note the log-log scale LINEAR PLOT P( k) A k log P( k) log A log k Albert and Barabasi, Science, 1999

Wutchy, Ravasz, Barabasi. In: Complex Systems in Biomedicine, Kluwer 2003

AttackTolerance Complex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns) node failure Albert and Barabasi, Rev Mod Phys, 2002

Path Length Robust. For <3, removing nodes does not break network into islands. Very resistant to random attacks, but attacks targeting key nodes are more dangerous. Attack Tolerance Targeted attack Random attack Targeted attack Random attack Targeted attack Random attack Targeted attack Random attack Albert and Barabasi, Rev Mod Phys, 2002

How can a scale-free network emerge? Network growth models: start with one vertex.

How can a scale-free network emerge? Network growth models: new vertex attaches to existing vertices by preferential attachment: vertex tends choose vertex according to vertex degree In economy this is called Matthew s effect: The rich get richer This explain the Pareto s distribution of wealth

How can a scale-free network emerge? Network growth models: hubs emerge (in the WWW: new pages tend to link to existing, well linked pages)

Metabolic pathways are scale-free Hubs are pyruvate, coenzyme A.

Protein interaction networks are scale-free Uetz et al., Nature 2000

Protein interaction networks are scale-free Albert R, J Cell Sci, 2005

Protein interaction networks are scale-free Degree is in some measure related to phenotypic effect upon gene knock-out Red : lethal Green: non lethal Yellow: Unknown Uetz et al., Nature 2000

Are central proteins essential? Proteins with 6 neighbours 21% are essential (lethality upon knock-out) Proteins with 15 neighbours 62% are essential (lethality upon knock-out)

Caveat: different experiments give different results Titz et al, Exp Review Proteomics, 2004

How can a scale-free interaction network emerge? Gene duplication (and differentiation): duplicated genes give origin to a protein that interacts with the same proteins as the original protein (and then specializes its functions)

Wutchy, Ravasz, Barabasi. In: Complex Systems in Biomedicine, Kluwer 2003

Trancription networks Babu et al., Curr. Opin. Struct. Biol. 14, 283 (2004)

Trancription networks The incoming connectivity is the number of transcription factors regulating a target gene, which quantifies the combinatorial effect of gene regulation. The fraction of target genes with a given incoming connectivity decreases exponentially. Most target genes are regulated by similar numbers of factors (93% of genes are regulated by 1 4 factors in yeast). Babu et al., Curr. Opin. Struct. Biol. 14, 283 (2004)

Trancription networks The outgoing connectivity is the number of target genes regulated by each transcription factor. It is distributed according to a power law.this is indicative of a hub-containing network structure, in which a select few transcription factors participate in the regulation of a disproportionately large number of target genes. These hubs can be viewed as global regulators, as opposed to the remaining transcription factors that can be considered fine tuners. In the transcriptional network in yeast, regulatory hubs have a propensity to be lethal if removed. Babu et al., Curr. Opin. Struct. Biol. 14, 283 (2004)

These mechanisms alone cannot explain the evolution of network motifs and the scale-free topology. Babu et al., Curr. Opin. Struct. Biol. 14, 283 (2004)

Caveat on the use of the scale-free theory The same noisy data can be fitted in different ways x F x) f ( z) dz f ( x) Cx ( 1) ( has to be used: more discriminative Keller, BioEssays 2006

Caveat on the use of the scale-free theory A sub-net of a non-free-scale network can have a scale-free behaviour Finding a scale-free behaviour do NOT imply the growth with preferential attachment mechanism Keller, BioEssays 2006

Hierarchical networks Standard free scale models have low clustering: a modular hierarchical model accounts for high clustering, low average path and scale-freeness Wutchy, Ravasz, Barabasi. In: Complex Systems in Biomedicine, Kluwer 2003

Wutchy, Ravasz, Barabasi. In: Complex Systems in Biomedicine, Kluwer 2003

Wutchy, Ravasz, Barabasi. In: Complex Systems in Biomedicine, Kluwer 2003

Hierarchical Modularity Metabolic Networks Protein Networks E. Ravasz et al., Science, 2002

Hierarchical structures in directed networks master regulators (nodes with zero in-degree), workhorses (nodes with zero out-degree), middle managers (nodes with nonzero in- and out-degree). Yan & Gerstein, PNAS 2010

Yan & Gerstein, PNAS 2010

Sc: Yeast Hs: Homo Rr: Rat Mm: Mouse Ec: E.coli Mt: Mycobacteriun tubercolosis Ph: Phosphorilation Mo: Modification Tr: Trancriptional regulation Bhardvaj, Yan, Gerstein PNAS 2010

Bhardvaj, Yan, Gerstein, PNAS, 2010

Motifs Sub-graphs more represented than expected 209 bi-fan motifs found in the E.coli regulatory network

Measures of centrality Degree centrality is defined as the number of links incident upon a node Betweenness is the ratio between the number of shortest paths passing through a given vertex over the number of shortest pairs. Closeness is defined as the mean shortest path between a vertex v and all other vertices reachable from it.

Measures of centrality A B Which is the node with the highest degree centrality? Which is the node with the highest closeness? Which is the node with the highest betweenness? C

Del Rio et al., BMC Systems Biology 2009, 3:102

Community structure subsets of vertices within which vertex vertex connections are dense, but between which connections are less dense. Girvan and Newman, PNAS, 2002

Detecting communities Betweenness can be computed also fo edges: ratio between the number of shortest paths passing through a given edge over the number of shortest pairs. bottleneck of the communication though the network GIRVAN NEWMAN ALGORITHM 1. Calculate the betweenness for all edges in the network. 2. Remove the edge with the highest betweenness. 3. Recalculate betweennesses for all edges affected by the removal. 4. Repeat from step 2 until no edges remain.

Girvan and Newman, PNAS, 2002

Community clustering of protein-protein interaction networks Dunn et al, BMC Bioinformatics, 2005

Community clustering of protein-protein interaction networks Dunn et al, BMC Bioinformatics, 2005

Community clustering of protein-protein interaction networks Dunn et al, BMC Bioinformatics, 2005

Community clustering of protein-protein interaction networks Dunn et al, BMC Bioinformatics, 2005

Science 298, 2002

Geometric structure for networks Geometric random networks Higham, et al, Bioinformatics, 2008

Algorithm for embedding the graph in a metric space Higham, et al, Bioinformatics, 2008 IS =?

Higham, et al, Bioinformatics, 2008 YHC: Yeast ER: Random ER-DD: Random with the same degree distribution as YHC GEO-3D: Geometric in 3D GEO-3D-10%: GEO-3D with 10% noise SF: Scale free

Higham, et al, Bioinformatics, 2008

Barabasi and Oltvai (2004) Network Biology: understanding the cell s functional organization. Nature Reviews Genetics 5:101-113 Stogatz (2001) Exploring complex networks. Nature 410:268-276 Hayes (2000) Graph theory in practice. American Scientist 88:9-13/104-109 Mason and Verwoerd (2006) Graph theory and networks in Biology Keller (2005) Revisiting scale-free networks. BioEssays 27.10: 1060-1068