STRING: Protein association networks. Lars Juhl Jensen

Similar documents
Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Information Extraction from Biomedical Text

Computational methods for predicting protein-protein interactions

Information Extraction from Biomedical Text. BMI/CS 776 Mark Craven

Computational approaches for functional genomics

Systems biology and biological networks

Sequence Alignment Techniques and Their Uses

Integration of functional genomics data

ATLAS of Biochemistry

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Computational Systems Biology

10/17/04. Today s Main Points

MTopGO: a tool for module identification in PPI Networks

Interaction Network Topologies

Protein-protein interaction networks Prof. Peter Csermely

Hands-On Nine The PAX6 Gene and Protein

Biological Networks. Gavin Conant 163B ASRC

CLRG Biocreative V

Comparative genomics: Overview & Tools + MUMmer algorithm

ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature

CSCE555 Bioinformatics. Protein Function Annotation

Bioinformatics Exercises

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Initiation of DNA Replication Lecture 3! Linda Bloom! Office: ARB R3-165! phone: !

Types of biological networks. I. Intra-cellurar networks

Supplementary online material

The EcoCyc Database. January 25, de Nitrógeno, UNAM,Cuernavaca, A.P. 565-A, Morelos, 62100, Mexico;

Lecture 4: Yeast as a model organism for functional and evolutionary genomics. Part II

BIOINFORMATICS: An Introduction

Comparative Network Analysis

MOLECULAR BIOLOGY BIOL 021 SEMESTER 2 (2015) COURSE OUTLINE

BIOINFORMATICS LAB AP BIOLOGY

Identifying Signaling Pathways

Linear Classifiers IV

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Hidden Markov Models in Language Processing

Homology and Information Gathering and Domain Annotation for Proteins

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Predicting Protein Functions and Domain Interactions from Protein Interactions

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Open a Word document to record answers to any italicized questions. You will the final document to me at

Basic Local Alignment Search Tool

I. Molecules & Cells. A. Unit One: The Nature of Science. B. Unit Two: The Chemistry of Life. C. Unit Three: The Biology of the Cell.

Regulation of mycotoxin production and kinome analysis in Fusarium graminearum

Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network

Cellular Function Prediction for Hypothetical Proteins Using High-Throughput Data

De Novo Peptide Sequencing

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

Special Topics in Computer Science

A framework of integrating gene relations from heterogeneous data sources: an experiment on Arabidopsis thaliana

Computational Biology From The Perspective Of A Physical Scientist

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Markov Random Field Models of Transient Interactions Between Protein Complexes in Yeast

2. Yeast two-hybrid system

Computational Genomics and Molecular Biology, Fall

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Mining coreference relations between formulas and text using Wikipedia

Genomics and bioinformatics summary. Finding genes -- computer searches

Motivating the need for optimal sequence alignments...

bi lecture 17 integration of genetic interaction data

STSE (Class - X) ANALYSIS

Context dependent visualization of protein function

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

The geneticist s questions

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Network Alignment 858L

Exhaustive search. CS 466 Saurabh Sinha

Bio-Medical Text Mining with Machine Learning

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

arxiv: v1 [q-bio.mn] 5 Feb 2008

Three types of RNA polymerase in eukaryotic nuclei

Text mining and natural language analysis. Jefrey Lijffijt

Inferring disease-associated lncrnas using expression data and disease-associated protein coding genes

VCell Tutorial. Building a Rule-Based Model

Supplementary Information

I. Molecules and Cells: Cells are the structural and functional units of life; cellular processes are based on physical and chemical changes.

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Supplemental Materials

Review Article From Experimental Approaches to Computational Techniques: A Review on the Prediction of Protein-Protein Interactions

Log-Linear Models, MEMMs, and CRFs

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

networks in molecular biology Wolfgang Huber

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Multimodal Deep Learning for Predicting Survival from Breast Cancer

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

IMMERSIVE GRAPH-BASED VISUALIZATION AND EXPLORATION OF BIOLOGICAL DATA RELATIONSHIPS

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Evolutionary Rate Covariation of Domain Families

Basic modeling approaches for biological systems. Mahesh Bule

comparative genomics of high throughput data between species and evolution of function

Domain-based computational approaches to understand the molecular basis of diseases

Ligand screening system using fusion proteins of G protein coupled receptors with G protein α subunits

Biological Knowledge Discovery Through Mining Multiple Sources of High-Throughput Data

What Are The Core Concepts in Biochemistry? Part II. Abstract

Spring 2018 Biology 1 End-of-Course (EOC) Assessment Next Generation Sunshine State Standards (NGSSS) Form 1

The Drosophila Interactions Database: Integrating The Interactome And Transcriptome

Transport between cytosol and nucleus

AP BIOLOGY SUMMER ASSIGNMENT

Transcription:

STRING: Protein association networks Lars Juhl Jensen

interaction networks

association networks

guilt by association

protein networks

STRING

9.6 million proteins

common foundation

Exercise 1 Go to http://string-db.org/ Query for human insulin receptor (INSR) using the search by name functionality Make sure you are in evidence view" (check the buttons below the network) Why are there multiple lines connecting the same to two proteins?

curated knowledge

(what we know)

protein complexes

3D structures

pathways

metabolic pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

signaling pathways

very incomplete

experimental data

(what we measured)

physical interactions

Long Yeast two-hybrid Bait Prey 0.2 0.1 Nuclear Hydrophobic Protein-fragment complementation assay Transmembrane Disordered Cytosolic TAP tag Co-purified proteins Bait Prey Reconstituted enzyme Basic Abundant Bait Tandem affinity purification Jensen & Bork, Science, 2008

genetic interactions

Beyer et al., Nature Reviews Genetics, 2007

gene coexpression

microarrays

RNAseq

Exercise 2 (Continue from where exercise 1 ended) Which types of evidence support the interaction between INSR and IRS1? Click on the interaction to view the popup, which has buttons linking to full details Which types of experimental assays support the INSR IRS1 interaction?

predictions

(what we infer)

genomic context

evolution

gene fusion

Korbel et al., Nature Biotechnology, 2004

gene neighborhood

Korbel et al., Nature Biotechnology, 2004

phylogenetic profiles

Korbel et al., Nature Biotechnology, 2004

a real example

Cell Cellulosomes Cellulose

complications

many databases

different formats

different identifiers

variable quality

not comparable

not same species

hard work

parsers

mapping files

quality scores

affinity purification

von Mering et al., Nucleic Acids Research, 2005

phylogenetic profiles

score calibration

gold standard

von Mering et al., Nucleic Acids Research, 2005

implicit weighting by quality

common scale

homology-based transfer

orthologous groups

Franceschini et al., Nucleic Acids Research, 2013

missing most of the data

Exercise 3 (Continue from where exercise 2 ended) Change the network to the confidence view Change the confidence cutoff to 0.9; any changes in proteins or interactions shown? Turn off all but experiments; what changes? Increase the number of interactors shown to 50; how many proteins do you get? Why?

text mining

>10 km

too much to read

exponential growth

~40 seconds per paper

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

cyclin dependent kinase 1

CDC2

orthographic variation

expansion rules

prefixes and suffixes

CDC2

hcdc2

flexible matching

spaces and hyphens

cyclin dependent kinase 1

cyclin-dependent kinase 1

black list

SDS

information extraction

co-mentioning

counting

within documents

within paragraphs

within sentences

scoring scheme

score calibration

NLP Natural Language Processing

part-of-speech tagging

what you learned in school pronoun pronoun verb preposition noun

semantic tagging

grammatical analysis

Gene and protein names Cue words for entity recognition Verbs for relation extraction [ nxexpr The expression of" [ nxgene the cytochrome genes" [ nxpg CYC1 and CYC7]]]" is controlled by" [ nxpg HAP1] Saric et al., Proceedings of ACL, 2004

type and direction

complex sentences

anaphoric references

it

summary

association networks

heterogeneous data

common identifiers

quality scores

protein networks

string-db.org Szklarczyk et al., Nucleic Acids Research, 2015

STITCH

chemical networks

stitch-db.org Kuhn et al., Nucleic Acids Research, 2014

COMPARTMENTS

subcellular localization

compartments.jensenlab.org Binder et al., Database, 2014

TISSUES

tissue expression

tissues.jensenlab.org Santos et al., PeerJ, 2015

DISEASES

disease associations

diseases.jensenlab.org Frankild et al., Methods, 2015

Exercise 4 Open http://tissues.jensenlab.org Look up tissue associations for insulin (INS) Open http://diseases.jensenlab.org Search for insulin receptor (INSR) What is the strongest associated disease? Inspect the underlying text-mining evidence