FuncNet a distributed platform for high-throughput protein function analysis. Andrew Clegg University College London. funcnet.eu

Similar documents
EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

Ákos Tarcsay CHEMAXON SOLUTIONS

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

EBI web resources II: Ensembl and InterPro

Introduction to Portal for ArcGIS. Hao LEE November 12, 2015

Integration of functional genomics data

Portal for ArcGIS: An Introduction

CS612 - Algorithms in Bioinformatics

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Chemical Data Retrieval and Management

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Introduction to Portal for ArcGIS

Gene Ontology and overrepresentation analysis

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Reaxys Pipeline Pilot Components Installation and User Guide

WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS

IUCLID Substance Data

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

Patent Searching using Bayesian Statistics

Reaxys Medicinal Chemistry Fact Sheet

A Model of GIS Interoperability Based on JavaRMI

Pipelining RDP Data to the Taxomatic Background Accomplishments vs objectives

Application integration: Providing coherent drug discovery solutions

Using Web Technologies for Integrative Drug Discovery

NRProF: Neural response based protein function prediction algorithm

Synteny Portal Documentation

Machine Learning in Action

Supplementary Figure 1. The workflow of the KinomeXplorer web interface.

3D Urban Information Models in making a smart city the i-scope project case study

Road Surface Condition Analysis from Web Camera Images and Weather data. Torgeir Vaa (SVV), Terje Moen (SINTEF), Junyong You (CMR), Jeremy Cook (CMR)

Homology and Information Gathering and Domain Annotation for Proteins

Farewell, PipelinePilot Migrating the Exquiron cheminformatics platform to KNIME and the ChemAxon technology

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Leveraging ArcGIS Online Elevation and Hydrology Services. Steve Kopp, Jian Lange

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Towards Automatic Nanomanipulation at the Atomic Scale

PROTEIN STRUCTURE PREDICTION Bioinformatic Approach

FROM MOLECULAR FORMULAS TO MARKUSH STRUCTURES

Experiences and Directions in National Portals"

Homology. and. Information Gathering and Domain Annotation for Proteins

Leveraging Web GIS: An Introduction to the ArcGIS portal

Bentley Map Advancing GIS for the World s Infrastructure

SemaDrift: A Protégé Plugin for Measuring Semantic Drift in Ontologies

Innovation. The Push and Pull at ESRI. September Kevin Daugherty Cadastral/Land Records Industry Solutions Manager

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Using the File Geodatabase API. Lance Shipman David Sousa

Portal for ArcGIS: An Introduction. Catherine Hynes and Derek Law

RMS/Coverage Graphs: A Qualitative Method for Comparing Three-Dimensional Protein Structure Predictions

UNIPD RESEARCH UNIT Progress Report Francesca FISSORE, Marco PIRAGNOLO, Francesco PIROTTI

ArcGIS 10.1 An Overview of the System

SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH

A Cotton Irrigator s Decision Support System Using National, Regional and Local Data

The Interactivity of a Virtual Museum at the Service of the Teaching of Applied Geology

Week 10: Homology Modelling (II) - HHpred

Compression Techniques for 3D SDI

June 19 Huntsville, Alabama 1

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Spatial Analysis with Web GIS. Rachel Weeden

Introduction to Bioinformatics

Enabling ENVI. ArcGIS for Server

Managing bathymetric data in a hydrographic survey company

Genome Databases The CATH database

SpatialKit and SEXTANTE

Forecast solutions for the energy sector

ALGORITHMS FOR BIOLOGICAL NETWORK ALIGNMENT

Comparative genomics of gene families in relation with metabolic pathways for gene candidates highlighting

Gabriella Rustici EMBL-EBI

What Would John Snow Do (Today)? Part 1

Cloud-based WRF Downscaling Simulations at Scale using Community Reanalysis and Climate Datasets

Crime Analyst Extension. Christine Charles

Web GIS in Agriculture Land Use, Crop Management and Planning. Kevin Knapp Bryan Baker, Phd.

Oracle Spatial: Essentials

Domain-based computational approaches to understand the molecular basis of diseases

ATINER's Conference Paper Series PLA

Cheminformatics platform for drug discovery application

7.1 Sampling Error The Need for Sampling Distributions

Introducing a Bioinformatics Similarity Search Solution

Deep-dive into PyMISP MISP - Malware Information Sharing Platform & Threat Sharing

Large-Scale Genomic Surveys

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Environmental Response Management Application

Esri Training by Microcenter Prepare to Innovate. Microcenter Course Catalog

Esri Overview for Mentor Protégé Program:

Supplementary Materials for mplr-loc Web-server

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Overview of IslandPick pipeline and the generation of GI datasets

Management of Geological Information for Mining Sector Development and Investment Attraction Examples from Uganda and Tanzania

Changes in Esri GIS, practical ways to be ready for the future

GENERALIZATION IN THE NEW GENERATION OF GIS. Dan Lee ESRI, Inc. 380 New York Street Redlands, CA USA Fax:

The Schrödinger KNIME extensions

PSICQUIC. Paul Shannon. July 9, Introduction 1. 2 Quick Start: find interactions between Myc and Tp53 2

Incorporating ArcGIS Pro in your Curriculum

EMMA : ECDC Mapping and Multilayer Analysis A GIS enterprise solution to EU agency. Sharing experience and learning from the others

YYT-C3002 Application Programming in Engineering GIS I. Anas Altartouri Otaniemi

Reproducible Neural Data Analysis with Elephant, Neo and odml

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

Transcription:

FuncNet a distributed platform for high-throughput protein function analysis Andrew Clegg University College London

Outline of talk Introduction and background Working with FuncNet APIs and extensions Further information

Aims of FuncNet Given one set of proteins which are known to share a particular biological function which of these other proteins also share that function?

Usage scenario Erich Nigg Max Planck Inst. proteomics study. 150 known spindle proteins > 600 unknown/novel proteins How can we help identify true spindle proteins? UCL, DTU and CNIO each contributed distinct prediction algorithms.

Accuracy Experimentalists using this approach achieved doubled hit rate for wet-lab assays.

Motivations Turning manual analysis into a pipeline... Accessibility (zero installation) Automation Throughput Robustness

Outline of talk Introduction and background Working with FuncNet APIs and extensions Further information

Inputs Reference proteins: known function Query proteins: unknown function

Outputs Reference proteins: known function Query proteins: unknown function predictor_1 predictor_2 predictor_3

Outputs Reference proteins: known function Query proteins: unknown function k Fisher score = 2 log e pi i=1

Prediction services CODA UCL (London) Genomic context domain fusion enginedb ITB (Bari) Similarity of GO annotations GECO UCL (London) Gene expression correlation hippi UCL (London) Protein interactions via domain homology ihop CNIO (Madrid) Text mining MEDLINE co-occurrence JACOP SIB (Lausanne) Subsequence homology PIPS Dundee University Protein interactions via Bayesian classifier SpindleP CBS (Lyngby) Neural network using sequence & structure

Overview of workflow CODA (CATH) CODA (Pfam) GECO hippi Query proteins FuncNet front-end server ihop Reference proteins ihop-conn JACOP SpindleP adaptor SpindleP enginedb PIPS Score aggregation Pairwise predictions

FuncNet web interface Upload protein lists from your PC Monitor progress of job Download results as CSV or XML http:///gwt-client/

Outline of talk Introduction and background Working with FuncNet APIs and extensions Further information

FuncNet SOAP API Workflow tools like Taverna, KNIME, Bioclipse, Pipeline Pilot, etc. etc. Open-source clients (Java and Perl so far) Your own scripts or programs Each predictor can be queried separately

FuncNet in EnVision2

Spindle protein prediction CODA (CATH) CODA (Pfam) GECO hippi Query proteins FuncNet front-end server ihop Reference proteins ihop-conn JACOP SpindleP adaptor SpindleP enginedb PIPS Integrating with SpindleP Score aggregation Pairwise predictions

Query set SpindleP neural network Reference set = SpindleP training set FuncNet front-end service Score integration

Outline of talk Introduction and background Working with FuncNet APIs and extensions Further information

Work in progress Evaluation precision & recall Load testing Other species currently human-only Score integration Bayesian classifier Annotation of results Visualization Wet-lab partners?

The FuncNet consortium University College London Christine Orengo Corin Yeats Jonathan Lees Ian Sillitoe Adam Reid University of Malaga Juan Ranea Ian Morilla SIB, Lausanne Marco Pagni Heinz Stockinger EBI, Hinxton Henning Hermjakob Florian Reisinger CNR-ITB, Bari Andreas Gisel CNIO, Madrid Alfonso Valencia José María Fernández González David González Pisano Ana Rojas CBS, Lyngby Søren Brunak Kristoffer Rapacki University of Dundee Geoff Barton Mark McDowall Michelle S. Scott Tom P. Walsh... and myself, Andrew Clegg