Comparative Network Analysis

Similar documents
Multiple Whole Genome Alignment

Inferring Models of cis-regulatory Modules using Information Theory

Identifying Signaling Pathways

Network alignment and querying

Inferring Models of cis-regulatory Modules using Information Theory

Bioinformatics: Network Analysis

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BLAST. Varieties of BLAST

Graph Alignment and Biological Networks

Protein Complex Identification by Supervised Graph Clustering

Introduction to Bioinformatics

Network Alignment 858L

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Predicting Protein Functions and Domain Interactions from Protein Interactions

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Computational Network Biology Biostatistics & Medical Informatics 826 Fall 2018

Bioinformatics. Transcriptome

Computational methods for predicting protein-protein interactions

Frequent Pattern Finding in Integrated Biological Networks

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Comparison of Protein-Protein Interaction Confidence Assignment Schemes

RODED SHARAN, 1 TREY IDEKER, 2 BRIAN KELLEY, 3 RON SHAMIR, 1 and RICHARD M. KARP 4 ABSTRACT

MULE: An Efficient Algorithm for Detecting Frequent Subgraphs in Biological Networks

Network motifs in the transcriptional regulation network (of Escherichia coli):

Protein-protein Interaction: Network Alignment

Systems biology and biological networks

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Complex (Biological) Networks

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Learning Sequence Motif Models Using Gibbs Sampling

Discovering modules in expression profiles using a network

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Protein function prediction via analysis of interactomes

Analysis of Biological Networks: Network Integration

An Efficient Algorithm for Protein-Protein Interaction Network Analysis to Discover Overlapping Functional Modules

Example of Function Prediction

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Comparative Genomics: Sequence, Structure, and Networks. Bonnie Berger MIT

Genomics and bioinformatics summary. Finding genes -- computer searches

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

The geneticist s questions. Deleting yeast genes. Functional genomics. From Wikipedia, the free encyclopedia

Computational Systems Biology

Complex (Biological) Networks

Discovering molecular pathways from protein interaction and ge

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Pathway Association Analysis Trey Ideker UCSD

Networks & pathways. Hedi Peterson MTAT Bioinformatics

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen

EECS730: Introduction to Bioinformatics

Proteomics Systems Biology

Introduction to clustering methods for gene expression data analysis

Multiple Sequence Alignment: HMMs and Other Approaches

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

Gibbs Sampling Methods for Multiple Sequence Alignment

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Networks in systems biology

Comparative Bioinformatics Midterm II Fall 2004

Motivating the need for optimal sequence alignments...

Analysis of Biological Networks: Network Integration

Inferring Protein-Signaling Networks II

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION

Interaction Network Analysis

Basic Local Alignment Search Tool

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Detecting Conserved Interaction Patterns in Biological Networks

In post-genomic biology, the nature and scale of data. Algorithmic and analytical methods in network biology. Mehmet Koyutürk 1,2

Inferring Transcriptional Regulatory Networks from High-throughput Data

SUPPLEMENTARY METHODS

Communities Via Laplacian Matrices. Degree, Adjacency, and Laplacian Matrices Eigenvectors of Laplacian Matrices

Biological Networks. Gavin Conant 163B ASRC

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Algorithms and tools for the alignment of multiple protein networks

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

FROM UNCERTAIN PROTEIN INTERACTION NETWORKS TO SIGNALING PATHWAYS THROUGH INTENSIVE COLOR CODING

A Max-Flow Based Approach to the. Identification of Protein Complexes Using Protein Interaction and Microarray Data

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs

Learning in Bayesian Networks

Towards Detecting Protein Complexes from Protein Interaction Data

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Protein-Signaling Networks

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

The geneticist s questions

Comparative Analysis of Molecular Interaction Networks

V 6 Network analysis

Measuring TF-DNA interactions

GOSAP: Gene Ontology Based Semantic Alignment of Biological Pathways

Introduction to clustering methods for gene expression data analysis

COMPARATIVE ANALYSIS OF BIOLOGICAL NETWORKS. A Thesis. Submitted to the Faculty. Purdue University. Mehmet Koyutürk. In Partial Fulfillment of the

BIOINFORMATICS. Improved Network-based Identification of Protein Orthologs. Nir Yosef a,, Roded Sharan a and William Stafford Noble b

Overview. Overview. Social networks. What is a network? 10/29/14. Bioinformatics I. Networks are everywhere! Introduction to Networks

Transcription:

Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Mark Craven, Colin Dewey, and Anthony Gitter

Protein-protein Interaction Networks Yeast protein interactions from yeast twohybrid experiments Largest cluster in network contains 78% of proteins (Jeong et al., 2001, Nature) Knock-out phenotype lethal non-lethal slow growth unknown

Overview Experimental techniques for determining networks Comparative network tasks

Experimental techniques Yeast two-hybrid system Protein-protein interactions Microarrays or RNA-Seq Expression patterns of mrnas Similar patterns imply involvement in same regulatory or signaling network Knock-out or perturbation studies Identify genes required for synthesis of certain molecules

Yeast two-hybrid system (Stephens & Banting, 2000, Traffic)

Microarrays genes (Eisen et al., 1998, PNAS)

Knock-out studies Yeast with one gene deleted Growth? Rich media His - media

Network problems Network inference Infer network structure Motif finding Identify common subgraph topologies Pathway or module detection Identify subgraphs of genes that perform the same function or active in same condition Network comparison, alignment, querying Conserved modules Identify modules that are shared in networks of multiple species

Network motifs Problem: Find subgraph topologies that are statistically more frequent than expected Brute force approach Count all topologies of subgraphs of size m Randomize graph (retain degree distribution) and count again Output topologies that are over/under represented Feed-forward loop: overrepresented in regulatory networks not very common

Network modules Modules: dense (highly-connected) subgraphs (e.g., large cliques or partially incomplete cliques) Problem: Identify the component modules of a network Difficulty: definition of module is not precise Hierarchical networks have modules at multiple scales At what scale to define modules?

Comparative network analysis Compare or integrate networks from different... Interaction detection methods Yeast 2-hybrid, mass spectrometry, etc. Conditions Heat, media, other stresses Time points Development, cell cycle, stimulus response Species

Comparative tasks Integration Combine networks derived from different methods (e.g. experimental data types) Alignment Identify nodes, edges, modules common to two networks (e.g., from different species) Database query Identify subnetworks similar to query in database of networks

Conserved modules Identify modules in multiple species that have conserved topology Typical approach: Use sequence alignment to identify homologous proteins and establish correspondence between networks Using correspondence, output subsets of nodes with similar topology

Conserved interactions orthologs (nodes) interaction A X B Y human yeast interologs (edges) Network comparison between species also requires sequence comparison (typically) Protein sets compared to identify orthologs Common technique: highest scoring BLAST hits used for establishing correspondences

Conserved modules X A Z B Y yeast W C human D Conserved module: orthologous subnetwork with significantly similar edge presence/absence

Network alignment graph X Z Y W A,X B,Z A B C,Y D,W C D network alignment graph Analogous to pairwise sequence alignment

Conserved module detection (Sharan & Ideker, 2006)

Real module example Three species alignment (Sharan et al., 2005, 2006) Ras-mediated regulation of cell cycle Cell proliferation Protein may have more than one ortholog in another network

Basic alignment strategy Define scoring function on subnetworks High score conserved module Use BLAST to infer orthologous proteins Identify seeds around each protein: small conserved subnetworks centered around the protein Grow seeds by adding proteins that increase alignment score

Scoring functions via subnetwork modeling We wish to calculate the likelihood of a certain subnetwork U under different models Subnetwork model (Ms) Connectivity of U given by target graph H, each edge in H appearing in U with probability β (large) Null model (Mn) Each edge appears with probability according to random graph distribution (but with degree distribution fixed) (Sharan et al., 2005)

Noisy observations Typically weight edges in graph according to confidence in interaction (expressed as a probability) Let Tuv: event that proteins u, v interact Fuv: event that proteins u, v do not interact Ouv: observations of possible interactions between proteins u and v

Subnetwork model probability Assume (for explanatory purposes) that subnetwork model is a clique:

Null model probability Given values for puv: probability of edge (u,v) in random graph with same degrees How to get random graph if we don t know true degree distribution? Estimate them:

Likelihood ratio Score subnetwork with (log) ratio of likelihoods under the two models Note the decomposition into sum of scores for each edge

Seed construction Finding heavy induced subgraphs is NP-hard (Sharan et al., 2004) Heuristic: Find high-scoring subgraph seeds Grow seeds greedily Seed techniques: for each node v: Find heavy subgraph of size 4 including v Find highest-scoring length 4 path with v

Randomizing graphs For statistical tests, need to keep degree distribution the same Shuffle step: Choose two edges (a,b), (c,d) in the current graph Remove those edges Add edges (a,d), (c,b) A B A B C D C D

Predictions from alignments Conserved modules of proteins enriched for certain functions often indicate shared function of other proteins Use to predict function of unannotated proteins Sharan et al., 2005: annotated 4,645 proteins with estimated accuracy of 58-63% Predict missing interactions Sharan et al., 2005: 2,609 predicted interactions Test 60 in yeast, 40-52% accurate

Parallels to sequence analysis (Sharan & Ideker, 2006)