Graph Alignment and Biological Networks

Similar documents
Gene regulation: From biophysics to evolutionary genetics

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Finding Motifs in Protein-Protein Interaction Networks

Network alignment and querying

Procedure to Create NCBI KOGS

Introduction to Bioinformatics

Network Alignment 858L

Computational Biology: Basics & Interesting Problems

networks in molecular biology Wolfgang Huber

Comparative Network Analysis

Bioinformatics Chapter 1. Introduction

Types of biological networks. I. Intra-cellurar networks

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Biol478/ August

Comparing Genomes! Homologies and Families! Sequence Alignments!

Network motifs in the transcriptional regulation network (of Escherichia coli):

Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites

Fitness landscapes and seascapes

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Computational approaches for functional genomics

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Bioinformatics: Network Analysis

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

BINF6201/8201. Molecular phylogenetic methods

12-5 Gene Regulation

Systems biology and biological networks

Inferring Protein-Signaling Networks

EVOLUTIONARY DISTANCES

Biological Networks. Gavin Conant 163B ASRC

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

GRAPH-THEORETICAL COMPARISON REVEALS STRUCTURAL DIVERGENCE OF HUMAN PROTEIN INTERACTION NETWORKS

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

Exploring Evolution & Bioinformatics

Emergence of gene regulatory networks under functional constraints

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Basic modeling approaches for biological systems. Mahesh Bule

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Lecture 10: May 19, High-Throughput technologies for measuring proteinprotein

Casting Polymer Nets To Optimize Molecular Codes

Effects of Gap Open and Gap Extension Penalties

56:198:582 Biological Networks Lecture 8

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Lecture 4: Transcription networks basic concepts

Supporting Information

Sequence analysis and comparison

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Pairwise & Multiple sequence alignments

Protein-protein Interaction: Network Alignment

Control of Gene Expression in Prokaryotes

Analysis of Biological Networks: Network Robustness and Evolution

Self Similar (Scale Free, Power Law) Networks (I)

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

7.32/7.81J/8.591J: Systems Biology. Fall Exam #1

Chapter 8: The Topology of Biological Networks. Overview

Quantifying sequence similarity

Introduction to Bioinformatics

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

How much non-coding DNA do eukaryotes require?

Integrative Protein Function Transfer using Factor Graphs and Heterogeneous Data Sources

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Physical network models and multi-source data integration

Introduction to protein alignments

ACTA PHYSICA DEBRECINA XLVI, 47 (2012) MODELLING GENE REGULATION WITH BOOLEAN NETWORKS. Abstract

Computational Structural Bioinformatics

CHAPTER : Prokaryotic Genetics

Introduction to Bioinformatics

Network Biology: Understanding the cell s functional organization. Albert-László Barabási Zoltán N. Oltvai

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

UNIT 5. Protein Synthesis 11/22/16

Comparison of Protein-Protein Interaction Confidence Assignment Schemes

Advanced Algorithms and Models for Computational Biology

Multiple Sequence Alignment. Sequences

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Chapter 15 Active Reading Guide Regulation of Gene Expression

Today s Lecture: HMMs

Prediction o f of cis -regulatory B inding Binding Sites and Regulons 11/ / /

Networks & pathways. Hedi Peterson MTAT Bioinformatics

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

1. In most cases, genes code for and it is that

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

Tools and Algorithms in Bioinformatics

Learning in Bayesian Networks

Biological Networks Analysis

Hereditary Hemochromatosis

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

ECS 253 / MAE 253 April 26, Intro to Biological Networks, Motifs, and Model selection/validation

Sequence Alignment (chapter 6)

BIOLOGICAL PHYSICS MAJOR OPTION QUESTION SHEET 1 - will be covered in 1st of 3 supervisions

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

SYSTEMS BIOLOGY 1: NETWORKS

Transcription:

Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12

Networks in molecular biology New large-scale experimental data in the form of networks: transcription networks protein interaction networks co-regulation networks signal transduction networks, metabolic networks, etc. p.2/12

Networks in molecular biology New large-scale experimental data in the form of networks: transcription networks transcription factors bind to regulatory DNA polymerase molecule begins transcription of the gene p.2/12

Networks in molecular biology New large-scale experimental data in the form of networks: transcription networks transcription factors bind to regulatory DNA polymerase molecule begins transcription of the gene p.2/12

Networks in molecular biology New large-scale experimental data in the form of networks: transcription networks transcription factors bind to regulatory DNA polymerase molecule begins transcription of the gene sea urchin Bolouri &Davidson (2001) p.2/12

Networks in molecular biology New large-scale experimental data in the form of networks: protein interaction networks proteins interact to form larger units protein aggregates may catalyze reactions etc. p.2/12

Networks in molecular biology New large-scale experimental data in the form of networks: protein interaction networks proteins interact to form larger units protein aggregates may catalyze reactions etc. protein interactions in yeast Uetz et al. (2000) p.2/12

Sequence alignment in molecular biology more than 100 organisms are fully sequenced genome sizes range from 3 10 7 to 7 10 11 basepairs p.3/12

Sequence alignment in molecular biology more than 100 organisms are fully sequenced genome sizes range from 3 10 7 to 7 10 11 basepairs Global alignment: search for related sequences across species evolutionary relationships hints at common functionality p.3/12

Sequence alignment in molecular biology more than 100 organisms are fully sequenced genome sizes range from 3 10 7 to 7 10 11 basepairs Motif search: search for short repeated subsequences binding sites in transcription control p.3/12

Sequence alignment in molecular biology Tools more than 100 organisms are fully sequenced genome sizes range from 3 10 7 to 7 10 11 basepairs statistical models are used infer non-random correlations against a background build score function from statistical models design efficient algorithms to maximize score evaluate statistical significance of a given score p.3/12

Sequence alignment in molecular biology Tools more than 100 organisms are fully sequenced genome sizes range from 3 10 7 to 7 10 11 basepairs statistical models are used infer non-random correlations against a background build score function from statistical models design efficient algorithms to maximize score evaluate statistical significance of a given score organism number of genes worm C. elegans 19 000 fruit fly drosophila 17 000 human homo sapiens 25 000 p.3/12

Graph alignment What can be learned from network data? Can we distinguish functional patterns from a random background? 1. Search for network motifs [Alon lab] patterns occurring repeatedly within a given network 2. Alignment of networks across species identify conserved regions pinpoint functional innovations p.4/12

Graph alignment What can be learned from network data? Can we distinguish functional patterns from a random background? 1. Search for network motifs [Alon lab] patterns occurring repeatedly within a given network 2. Alignment of networks across species identify conserved regions pinpoint functional innovations Tools scoring function based on statistical models heuristic algorithms: algorithmic complexity p.4/12

Graph alignment I: The search for network motifs patterns occurring repeatedly in the network building blocks of information processing [Alon lab] p.5/12

Graph alignment I: The search for network motifs patterns occurring repeatedly in the network building blocks of information processing [Alon lab] counting of identical patterns: Subgraph census alignment of topologically similar regions of a network allow for mismatches construct a scoring function comparing the aligned subgraphs to a background model p.5/12

Graph alignment I: The search for network motifs patterns occurring repeatedly in the network building blocks of information processing [Alon lab] counting of identical patterns: Subgraph census alignment of topologically similar regions of a network allow for mismatches construct a scoring function comparing the aligned subgraphs to a background model p.5/12

Graph alignment I: The search for network motifs patterns occurring repeatedly in the network building blocks of information processing [Alon lab] counting of identical patterns: Subgraph census alignment of topologically similar regions of a network allow for mismatches construct a scoring function comparing the aligned subgraphs to a background model α=3 Alignment α=2 α=1 p.5/12

Statistical properties of alignments α=3 Alignment α=2 α=1 i=1 i=2 consensus motif c = Σ c ij α α ij p.6/12

Statistical properties of alignments α=3 Alignment α=2 α=1 i=1 i=2 consensus motif c = Σ c ij α α ij consensus motif c = 1 p p α=1 cα number of internal links average correlation between two subgraphs fuzziness of motif p.6/12

Statistics of network motifs null model: ensemble of uncorrelated networks with the same connectivities as the data p.7/12

Statistics of network motifs null model: ensemble of uncorrelated networks with the same connectivities as the data model describing network motifs ensemble with enhanced number of links enhanced correlation of subgraphs divergent vs convergent evolution? p.7/12

Statistics of network motifs null model: ensemble of uncorrelated networks with the same connectivities as the data model describing network motifs ensemble with enhanced number of links enhanced correlation of subgraphs divergent vs convergent evolution? Log likelihood score S(c 1,...,c p ) = log ( Q(c 1,...,c p ) ) p α=1 P σ(c α ) p = (σ σ 0 ) L(c α ) µ 2p α=1 p α,β=1 M(c α,c β ) log Z p.7/12

Statistics of network motifs null model: ensemble of uncorrelated networks with the same connectivities as the data model describing network motifs ensemble with enhanced number of links enhanced correlation of subgraphs divergent vs convergent evolution? Log likelihood score S(c 1,...,c p ) = log ( Q(c 1,...,c p ) ) p α=1 P σ(c α ) p = (σ σ 0 ) L(c α ) µ 2p α=1 p α,β=1 M(c α,c β ) log Z Algorithm: Mapping onto a model from statistical mechanics (Potts model) p.7/12

Consensus motif of the E. coli transcription network µ = µ = 2.25 µ = 5 µ = 12 p.8/12

Consensus motif of the E. coli transcription network µ = µ = 2.25 µ = 5 µ = 12 1 0.8 c α 0.6 0.4 α c c β 0.2 0 0 0.2 0.4 <c α > 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 <c α c β > p.8/12

Graph alignment II: Comparing networks across species p.9/12

Graph alignment II: Comparing networks across species Alignment: Pairwise association of nodes across species p.9/12

Graph alignment II: Comparing networks across species Last common ancestor p.9/12

Graph alignment II: Comparing networks across species Evolutionary dynamics: Link attachment and deletion p.9/12

Graph alignment II: Comparing networks across species Evolutionary dynamics: Link attachment and deletion p.9/12

Graph alignment II: Comparing networks across species Representation of the alignment in a single network. Conserved links are shown in green. p.9/12

Scoring graph alignments across species null model P : Q-model ensemble of uncorrelated networks with the same connectivities as the data correlated networks (due to functional constraints or common ancestry) statistical assessment of orthologs: interplay between sequence similarity and network topology Scoring alignments log-likelihood score S = log(q/p) is used to search for conserved parts of the networks p.10/12

Application to Co-Expression networks alignment of H. sapiens and M. musculus p.11/12

Application to Co-Expression networks skeletal muscle proteins ribosomal proteins mitochondrial precursors myelin proteolipid protein alignment of H. sapiens and M. musculus p.11/12

Genomic systems biology and network analysis New concept and tools are needed to fully utilize high-throughput data functional design versus noise: statistical analysis evolutionary conservation indicates function Topological conservation versus sequence conservation genes may change functional role in network with small corresponding change in sequence the role of a gene in one species may be taken on by an entirely unrelated gene in another species References: J. Berg and M. Lässig, "Local graph alignment and motif search in biological networks, Proc. Natl. Acad. Sci. USA, 101 (41) 14689-14694 (2004) J. Berg, M. Lässig, and A. Wagner, Structure and Evolution of Protein Interaction Networks: A Statistical Model for Link Dynamics and Gene Duplications, BMC Evolutionary Biology 4:51 (2004) J. Berg, S. Willmann und M. Lässig, Adaptive evolution of transcription factor binding sites, BMC Evolutionary Biology 4(1):42 (2004) J. Berg and M. Lässig, "Correlated random networks", Phys. Rev. Lett. 89(22), 228701 (2002) p.12/12