Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Similar documents
Bioinformatics I. CPBS 7711 October 29, 2015 Protein interaction networks. Debra Goldberg

Erzsébet Ravasz Advisor: Albert-László Barabási

networks in molecular biology Wolfgang Huber

Self Similar (Scale Free, Power Law) Networks (I)

Chapter 8: The Topology of Biological Networks. Overview

Complex (Biological) Networks

Complex (Biological) Networks

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Biological Networks. Gavin Conant 163B ASRC

Analysis of Biological Networks: Network Robustness and Evolution

The architecture of complexity: the structure and dynamics of complex networks.

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

How Scale-free Type-based Networks Emerge from Instance-based Dynamics

SYSTEMS BIOLOGY 1: NETWORKS

Graph Theory and Networks in Biology

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

Graph Theory and Networks in Biology arxiv:q-bio/ v1 [q-bio.mn] 6 Apr 2006

V 5 Robustness and Modularity

Interaction Network Analysis

Towards Detecting Protein Complexes from Protein Interaction Data

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Graph Theory Approaches to Protein Interaction Data Analysis

Network Analysis and Modeling

Complex networks: an introduction

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Predicting Protein Functions and Domain Interactions from Protein Interactions

Biological Networks Analysis

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Comparative Network Analysis

Graph Alignment and Biological Networks

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Overview of Network Theory

Systems biology and biological networks

Preface. Contributors

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

Measuring the shape of degree distributions

Deterministic scale-free networks

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Networks as a tool for Complex systems

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

BioControl - Week 6, Lecture 1

Computational Biology: Basics & Interesting Problems

Protein-protein interaction networks Prof. Peter Csermely

Adventures in random graphs: Models, structures and algorithms

CS224W: Analysis of Networks Jure Leskovec, Stanford University

A Multiobjective GO based Approach to Protein Complex Detection

Spectral Analysis of Directed Complex Networks. Tetsuro Murai

Clustering and Network

Data Mining and Analysis: Fundamental Concepts and Algorithms

A Study of Network-based Kernel Methods on Protein-Protein Interaction for Protein Functions Prediction

Design and characterization of chemical space networks

Social Networks- Stanley Milgram (1967)

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

Applications of Complex Networks

arxiv:q-bio/ v1 [q-bio.mn] 13 Aug 2004

The geneticist s questions

University of Notre Dame. Zoltán N. Oltvai Northwestern Univ., Medical School

Protein function prediction via analysis of interactomes

Sparse Linear Algebra Issues Arising in the Analysis of Complex Networks

NETWORK BIOLOGY: UNDERSTANDING THE CELL S FUNCTIONAL ORGANIZATION

Phylogenetic Analysis of Molecular Interaction Networks 1

Function Prediction Using Neighborhood Patterns

CS224W: Social and Information Network Analysis

REVIEW ARTICLE. Analyzing and modeling real-world phenomena with complex networks: a survey of applications

Spectral Graph Theory Tools. Analysis of Complex Networks

Fine-scale dissection of functional protein network. organization by dynamic neighborhood analysis

BIOINFORMATICS CS4742 BIOLOGICAL NETWORKS

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Data Mining Techniques

Types of biological networks. I. Intra-cellurar networks

Network models: random graphs

GEOMETRIC EVOLUTIONARY DYNAMICS OF PROTEIN INTERACTION NETWORKS

Analysis of Complex Systems

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Weighted gene co-expression analysis. Yuehua Cui June 7, 2013

Computational methods for predicting protein-protein interactions

Lecture VI Introduction to complex networks. Santo Fortunato

Improving evolutionary models of protein interaction networks

HIERARCHICAL GRAPHS AND OSCILLATOR DYNAMICS.

V 6 Network analysis

MAE 298, Lecture 8 Feb 4, Web search and decentralized search on small-worlds

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

A Linear-time Algorithm for Predicting Functional Annotations from Protein Protein Interaction Networks

The Spreading of Epidemics in Complex Networks

EFFICIENT AND ROBUST PREDICTION ALGORITHMS FOR PROTEIN COMPLEXES USING GOMORY-HU TREES

Structure and Centrality of the Largest Fully Connected Cluster in Protein-Protein Interaction Networks

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation

Uncertain interactions affect degree distribution of biological networks

Graph Theory Properties of Cellular Networks

1 Complex Networks - A Brief Overview

Protein Complex Identification by Supervised Graph Clustering

Evolving network with different edges

Analysis of Biological Networks: Network Integration

Complex systems: Network thinking

Dynamics and Inference on Biological Networks

The art of community detection

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

Transcription:

Bioinformatics I Overview CPBS 7711 October 29, 2014 Protein interaction networks Debra Goldberg debra@colorado.edu Networks, protein interaction networks (PINs) Network models What can we learn from PINs Discovering protein complexes PIN evolution Final words Overview Networks, protein interaction networks (PINs) Network models What can we learn from PINs Discovering protein complexes PIN evolution Final words Introduction to Networks What is a network? Social networks A collection of objects (nodes, vertices) Binary relationships (edges) May be directed Also called a graph Networks are everywhere! People Friendship from www.liberality.org 1

Sexual networks Transportation networks People Romantic and sexual relations Locations Roads Power grids Airline routes Power station High voltage transmission line Airports Flights Internet World-Wide-Web MBone Routers Physical connection Web documents Hyperlinks 2

Quick activity What kinds of biological networks are there or might there be? Molecular biology Gene and protein networks Metabolic networks Signaling networks Metabolites Biochemical reaction (enzyme) Molecules (e.g., Proteins or Neurotransmitters) Activation or Deactivation from web.indstate.edu from www.life.uiuc.edu Gene regulatory networks Disease Networks Genes or gene products Regulation of expression Inferred from error-prone gene expression data from Wyrick et al. 2002 Diseases Common genes Obesity from Goh et al., PNAS 2007 Rheumatoid arthritis HIV SARS, progresssion_of Hypertension Myocardial infarction Alzheimer disease 3

10/29/14 Disease Gene Networks Protein interaction networks Proteins Observed interaction Genes Common diseases from Goh et al., PNAS 2007 Synthetic sick or lethal networks (SSL) Y Cells live (wild type) Y Cells live Y Cells live X X X X Y Other gene networks Homology edges Coexpression transcribed at same times, conditions Cells die or grow slowly Nonessential genes Genes co-lethal from www.embl.de Sometimes used to connect other network types across species Gene knockout / knockdown similar phenotype (defects) when suppressed from Tong et al. 2001 Gene function predicted Gene function, drug targets predicted What they really look like Overview Networks, protein interaction networks (PINs) We need models! Network models What can we learn from PINs Discovering protein complexes PIN evolution Final words 4

Traditional graph modeling from GD2002 Random Regular Erdos-Renyi (1960) Lattice Network Research Renaissance Change in direction of network research: 1998 Four factors Theoretical analysis coupled with empirical evidence Networks are not static, they evolve over time Dynamical systems modeling real-world behaviors Computing power! Enables large system analysis Introduce small-world networks Small-World Networks Small-world Networks Six degrees of separation 100 1000 friends each Six steps: 10 12-10 18 But We live in communities Small-world measures Typical separation between two vertices Measured by characteristic path length (average distance) Watts-Strogatz small-world model Cliquishness of a typical neighborhood Measured by clustering coefficient v C v = 1.00 v C v = 0.33 5

10/29/14 Measures of the W-S model Small-world measures of various graph types Path length drops faster than cliquishness Wide range of p has both small-world properties Another network property: Degree distribution P (k) Cliquishness Characteristic Path Length Regular graph High Long Random graph Low Short Small-world graph High Short Degree distribution of E-R random networks The degree (notation: k) of a node is the number of its neighbors Erdös-Rényi random graphs The degree distribution is a histogram showing the frequency of nodes having each degree P(k) Binomial degree distribution, well-approximated by a Poisson Network figures from Strogatz, Nature 2001 Degree distribution of many realworld networks Other degree distributions Degree = k Scale-free networks Degree distribution follows a power law P (k = x) = α x -β Amaral, Scala, et al., PNAS (2000)" 6

Hierarchical Networks Properties of hierarchical networks Ravasz, et al., Science 2002 1. Scale-free 2. Clustering coefficient independent of N 3. Scaling clustering coefficient (DGM) 37 38 C of 43 metabolic networks Independent of N Clustering coefficient scaling C(k) Metabolic networks Ravasz, et al., Science 2002 Ravasz, et al., Science 2002 39 40 Summary of network models Many real-world networks are small-world, scale-free Random Small world Scale-free Hierarchical Poisson degree distribution high CC, short pathlengths power law degree distribution high CC, modular, power law degree distribution World-wide-web Collaboration of film actors (Kevin Bacon) Mathematical collaborations (Erdös number) Power grid of US Syntactic networks of English Neuronal network of C. elegans Metabolic networks Protein-protein interaction networks 7

Overview So What? Networks, protein interaction networks (PINs) Network models What can we learn from PINs Discovering protein complexes PIN evolution Final words There is information in a gene s position in the network We can use this to predict Relationships Interactions Regulatory relationships Protein function Process Complex / molecular machine Implications from topology Edges indicate function Proteins that are connected by an edge in many types of biological networks are more likely to have a common function Adjacent edges indicate 3 rd In some biological networks, if gene A is connected both to genes B and C, then gene B is more likely to be connected to gene C 8

False positives, false negatives Can use topology to assess confidence if true edges and false edges have different network properties Assess how well each edge fits topology of true network Can also predict unknown relations SSL hubs might be good cancer drug targets Normal cell Cancer cells w/ random mutations Alive Dead Dead (Tong et al, Science, 2004) 2-hop predictors for SSL SSL SSL (S-S) Homology SSL (H-S) Co-expressed SSL (X-S) Physical interaction SSL (P-S) 2 physical interactions (P-P) S: Synthetic sickness or lethality (SSL) H: Sequence homology X: Correlated expression P: Stable physical interaction Wong, et al., PNAS 2004 v w Multi-color motifs S: Synthetic sickness or lethality H: Sequence homology X: Correlated expression P: Stable physical interaction R: Transcriptional regulation Zhang, et al., Journal of Biology 2005 Protein complexes Tightly connected proteins may indicate a protein complex Beware of bias from Girvan and Newman, PNAS 2002 9

Lethality Hubs are more likely to be essential Jeong, et al., Nature 2001 Protein abundance Abundant proteins are more likely to be represented in some types of experiments More likely to be essential Correlation between degree (hubs) and essentiality disappears or is reduced when corrected for protein abundance Bloom and Adami, BMC Evolutionary Biology 2003 Degree anti-correlation Degree correlation Few edges directly between hubs Edges between hubs and low-degree genes are favored Regulatory NW PPI Anti-correlation of degrees of interacting proteins disappears in un-biased data average degree K1 25 20 15 10 5 0 essential non-essential 0 10 20 30 40 50 60 70 degree k Maslov and Sneppen, Science 2002 Coulomb, et al., Proceedings of the Royal Society B 2005 Methods: predicting function Predicting protein function Homology Machine Learning Graph-theoretic methods Direct methods Module-assisted methods Review: Sharan, Ulitsky, Shamir. Molecular Systems Biology, 2007 10

Direct methods: Neighborhood Majority method Schwikowski, Uetz, et al., Nat Biotechnol 18, 2000 Neighborhood method How does frequency affect assignment? Hishigaki, Nakai, et al., Yeast 18, 2001 Minimum cut (graph-theoretic) methods Vazquez, Flammini, et al. (2003) globally tries to minimize the number of protein interactions between different annotations Karaoz, Murali, et al. (2004) incorporates gene-expression data for better performance Nabieva, Jim, et al. (2005) reformulated as an integer linear programming problem Functional flow Nabieva, Jim, et al., Bioinformatics 21 Suppl 1, 2005 A Markov random field method Letovsky and Kasif, Bioinformatics 19 Suppl 1, 2003 Derive marginal probabilities given other proteins putative assignment Statistically, neighbors often share label Applies p(l N, k) = p(k L,N) p(l) p(k N) iteratively to propagate probabilities L is a Boolean random variable that indicates whether or not a node has a that label N is the number of neighbors k is the number of neighbors with that label Module-assisted methods Spirin and Mirny, PNAS 2003 Find fully connected subgraphs (cliques), OR Find subgraphs that maximize density: 2m/(n(n 1)) Bader and Hogue, BMC Bioinformatics 2003 Weight vertices: neighborhood density, connectedness Find connected communities with high weights MCODE : Molecular COmplex DEtection Girvan and Newman, PNAS 2002 Betweenness centrality Removes edges likely to go between communities Confidence assessment, edge prediction 11

Confidence assessment Traditionally, biological networks determined individually High confidence Slow New methods look at entire organism Lower confidence ( 50% false positives) Inferences made based on this data Confidence assessment Can use topology to assess confidence if true edges and false edges have different network properties Assess how well each edge fits topology of true network Can also predict unknown relations Goldberg and Roth, PNAS 2003 Use clustering coefficient, a local property Number of triangles = N(v) N(w) y x v w Normalization factor? N(x) = the neighborhood of node x v w. Mutual clustering coefficient (MCC) Jaccard Index: Meet / Min: Geometric: N(v) N(w) ---------------- N(v) N(w) N(v) N(w) ------------------------ min ( N(v), N(w) ) Hypergeometric: a p-value N(v) N(w) 2 ------------------ N(v) N(w) Prediction A v-w edge would have a high MCC Questions Degree distribution? v w Clustering coefficient? 2, 5, 9 Mutual clustering coefficient: 2 & 7 Use Meet/Min definition 60 12

Overview Networks, protein interaction networks (PINs) Network models What can we learn from PINs Discovering protein complexes PIN evolution Final words Protein Complexes Groups of proteins that bind together to perform a specific task. Examples: Ribosomes Proteasomes Replication complexes GINS complex, DNA polymerase Image from: Computation site for bioinformatics at Charité, Universitätsmedizin Berlin Found at http://bioinf.charite.de/hergo/intro.htm Finding protein compexes Protein-Protein Interaction Network Dense regions may be an indication of a protein complex from Girvan and Newman, PNAS 2002 Image from Yeast Proteomics, Genome News Network, 1-18-02 Looking for Complexes One goal of studying the interaction network is to discover previously unknown protein complexes. Methods: Look for cliques or near cliques Look for vertices with high clustering Community structure: Partitioning methods 13

Community structure Proteins in a community may be involved in a common process or function Finding the communities Hierarchical clustering Betweenness centrality Dense subgraphs Similar subgraphs Spectral clustering Party and date hubs from Girvan and Newman, PNAS 2002 Hierarchical clustering (1) Using natural edge weights Gene co-expression e.g., Eisen MB, et al., PNAS 1998 Hierarchical clustering (2) Topological overlap A measure of neighborhood similarity l i,j is 1 if there is a direct link between i and j, 0 otherwise Ravasz, et al., Science 2002 from www.medscape.com Hierarchical clustering (3) Adjacency vector Function cluster: Tong et al., Science 2004 Find drug targets: Parsons et al., Nature Biotechnology 2004 Party and date hubs Protein interaction network Partition hubs by expression correlation of neighbors Han, et al., Nature 2004 14

Network connectivity Scale-free networks are: Robust to random failures Vulnerable to attacks on hubs Removing hubs quickly disconnects a network and reduces the size of the largest component Removing date hubs shatters network into communities Date Hubs" Party Hubs" Many sub-networks" Albert, et al., Nature 2000 A single main component" Similar subgraphs Across species Interaction network and genome sequence e.g., Ogata, et al., Nucleic Acids Research 2000 Betweenness centrality Consider the shortest path(s) between all pairs of nodes Betweenness centrality of an edge is a measure of how many shortest paths traverse this edge Edges between communities have higher centrality Girvan, et al., PNAS 2002 Spectral clustering Compute adjacency matrix eigenvectors Each eigenvector defines a cluster: Proteins with high magnitude contributions Bu, et al., Nucleic Acids Research 2003 Overview Networks, protein interaction networks (PINs) Network models What can we learn from PINs Discovering protein complexes PIN evolution Final words positive eigenvalue negative eigenvalue 15

Questions How does the WWW evolve? How might protein interaction networks (PINs) evolve? How can we determine if our model is incorrect? Model for scale-free networks Growth and preferential attachment New node has edge to existing node v with probability proportional to degree of v Biologically plausible? Gene duplication gives functional diversity A primary mechanism for diversity After duplication, 2 routes to diversity: Subfunctionalization: function loss yields complementary subsets of original functions Neofunctionalization: de novo acquisition of functions Protein interactions are convenient proxy for functions Edge Loss Edge Gain Gene duplication in a PIN Barabási and Oltvai, Nature Reviews Genetics (2004) Another scale-free network model Advantages of this model Duplication and divergence New nodes are copies of existing nodes Same neighbors, then some gain/ loss This model generates networks that are: scale-free highly clustered PINs are also scale-free, highly clustered Solé, Pastor-Satorras, et al. (2002)" 16

Question Paralogs: x & w or y & t y v x w t Overview Networks, protein interaction networks (PINs) Network models What can we learn from PINs Discovering protein complexes PIN evolution Final words Final words Network analysis has become an essential tool for analyzing complex systems There is still much biologists can learn from scientists in other disciplines There is much other scientists can learn from us An exciting new direction 17