Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010
Networks are graphs Nodes Edges
Edges Directed, undirected, weighted
Nodes Genes Proteins Metabolites Enzymes Organisms
Why do we build networks? Motivation: most biological processes are not performed by a single, independent macromolecule
Slide from W. Huber
Slide from W. Huber
Pathways Do not exist in the cell! A human abstraction to help organizeour understanding of biology. A description of chronological ordering of proteins/dna/small molecule interactions. Unlike proteins and genes, pathwaysrepresent processes that may not be clearly defined. Slide from Doron Betel, MSKCC
From experiments to networks Source: http://www.nature.com/ng/journal/v37/n6s/full/ng1561.html Slide from https://extras.csc.fi/biosciences/courses/cytoscape
Example networks signal transduction pathway Source: http://www.sigmaaldrich.com/img/assets/6460/egf_r.gif Slide from https://extras.csc.fi/biosciences/courses/cytoscape
Transcription Source: http://www.medscape.com/pi/editorial/conferences/2001/873/art-amc.fig2.jpg Slide from https://extras.csc.fi/biosciences/courses/cytoscape
Complexes Source: Science. 2005 Feb 4;307(5710) Slide from https://extras.csc.fi/biosciences/courses/cytoscape
Public data repositories Protein-protein interaction data BIND, DIP, MINT, MIPS, InACT, Protein-DNA interaction data BIND, Transfac, Metabolic pathway data BioCyc, KEGG, WIT, Text-mining, coexpression Pre-BIND, Tmm, Slide from https://extras.csc.fi/biosciences/courses/cytoscape
Slide from Doron Betel, MSKCC
Slide from Doron Betel, MSKCC
Slide from Doron Betel, MSKCC
Slide from Doron Betel, MSKCC
Slide from Doron Betel, MSKCC
Slide from Doron Betel, MSKCC
Slide from Doron Betel, MSKCC
Slide from Doron Betel, MSKCC
Slide from W. Huber
Slide from W. Huber
Slide from W. Huber
Slide from W. Huber
Slide from W. Huber
Slide from W. Huber
autoregulation approximately 10% of yeast genes encoding regulators are autoregulated autoregulation is thought to provide several selective growth advantages -- response to environmental stimuli -- decreased biosynthetic cost of regulation --increased stability of gene expression
Multi-component loop - consists of a regulatory circuit whose closure involves two or more factors - provides the capacity for feedback control and offers the potential to produce bistable systems that can switch between two alternative states
feedforward loop contains a regulator that controls a second regulator and have the additional feature that both regulators bind a common target gene -- the most popular among yeast gene regulatory motifs(about 10% of genes that are bound in the genomewide location data set) very sensitive
single input motif single regulator that binds a set if genes under a specific condition mostly useful in metabolic pathways
Multiinput motif set of regulators that bind together to a set of genes two different inputs would allow coordinate expression of a set of genes under these two different condition
regulatory chain consists of three or more regulators in witch one regulator binds the promoter for a second regulator, the second binds the promoter for a third regulator, and so forth best example is cell cycle Slide from W. Huber
http://biit.cs.ut.ee/graphweb
Motivation Graph-based methods for mining functional and regulatory modules from heterogeneous data Biological knowledge represented as network 1+1 > 2 combining different datasources brings out most important connections
Main idea of GraphWeb Input is a graph (gene pairs, connected by an edge) Gene/probe/protein names are converted to common ground - Ensembl Extract modules by applying graph algorithms Each module is described Biologically significant annotations are used to score modules
What to search for? Hubs in regulatory network these are usually transcription factors Cliques highly connected genes, most probably protein complex members Connected components genes participating in similar functions MCL find genes that might share a function Network neighbourhood what genes are having connections to my favourite ones?
Reactome What is Reactome? An open, on-line, human-curated knowledgebase of biological pathways and reactions in humans Pathways and reactions in other species are predicated based on protein orthologies Data in the Reactome is interconnected with other databases: pubmed, Gene Ontology, NCBI, Ensembl, UniProt, OMIM, etc. Authored and created by experts Slide by L.Stein
Reactome http://www.reactome.org Menu bar Search Box Reaction Map Topic Table Slide by L.Stein
Kyoto Encyclopedia of Genes and Genomes Integrates: current knowledge of molecular interaction networks information about genes and proteins information about chemical compounds and reactions
Core Features Customize network data display using visual styles Powerful graph layout tools Easily organize multiple networks Easily navigate large networks Filter the network Plugin API Input/Output Protein protein interactions from BIND, TRANSFAC databases Gene functional annotations from Gene Ontology (GO) and KEGG databases Biological models from Systems Biology Markup Language (SBML) cpath: Cancer Pathway database Proteomics Standards Initiative Molecular Interaction (PSI-MI) or Biopathway Exchange Language (BioPAX) formats Oracle Spatial Network data model Cytoscape.org Cytoscape is a freely-available (open-source, java-based) bioinformatics software platform for visualizing biological networks (e.g. molecular interaction networks) and analyzing networks with gene expression profiles and other state data. Additional features are available as plugins. jactivemodules: identify significant active subnetworks Expression Correlation Network: cluster expression data Agilent Literature Search: build networks by extracting interactions from scientific literature. MCODE: finds clusters of highly interconnected regions in networks cpath: query, retrieve and visualize interactions from the MSKCC Cancer Pathway database BiNGO: determine which Gene Ontology (GO) categories are statistically over-represented in a set of genes Motif Finder: runs a Gibbs sampling motif detector on sequences for nodes in a Cytoscape network. CytoTalk: Interact with Cytoscape from Perl, Python, R, shell scripts or C or C++ programs.
www.hedipeterson.com/cytoscape.pdf Graphweb Exercises Use Rual network (the same fro Cytoscape) together with public protein-protein interaction (PPI) data (From a file in our server) and find gene pairs being present in both networks by applying Remove edges with less than N labels Filter PPI networks with Oct4, Sox2, Nanog, TP53 genes of interest by using Network neighbourhood, keep the distance 1. Look for modules (hubs, cliques, strongly connected components tc) in the data using different network algorithms and filter out different module sizes. Check the functional annotations of resulting modules.