Gabriella Rustici EMBL-EBI

Similar documents
Introduction to the EMBL-EBI Ontology Lookup Service

Imaging-based high-throughput phenotyping

Gene Ontology and overrepresentation analysis

Automated Analysis of the Mitotic Phases of Human Cells in 3D Fluorescence Microscopy Image Sequences

Gene Network Science Diagrammatic Cell Language and Visual Cell

Machine Learning. Wolfgang Huber Bernd Fischer EMBL

Supplementary Discussion:

Product Guide. Thermo Scientific Cellomics HCS Solution

Clustering and classification with applications to microarrays and cellular phenotypes

mitoode: Dynamical modelling of phenotypes in a genome-wide RNAi live-cell imaging assay

NOT FOR CIRCULATION. FEATURE REVIEW IDrugs (11): The Thomson Corporation ISSN

INTEGRATED REMOTE SENSING AND VISUALIZATION (IRSV) SYSTEM FOR TRANSPORTATION INFRASTRUCTURE. Project Description and Year I Achievement Report

The Systems Biology Graphical Notation

Synteny Portal Documentation

Introduction to Bioinformatics

Gene Ontology. Shifra Ben-Dor. Weizmann Institute of Science

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

How much non-coding DNA do eukaryotes require?

BMD645. Integration of Omics

MTopGO: a tool for module identification in PPI Networks

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Predicting Protein Functions and Domain Interactions from Protein Interactions

Statistical Clustering of Vesicle Patterns Practical Aspects of the Analysis of Large Datasets with R

MSc Drug Design. Module Structure: (15 credits each) Lectures and Tutorials Assessment: 50% coursework, 50% unseen examination.

Proteome-wide High Throughput Cell Based Assay for Apoptotic Genes

EBI web resources II: Ensembl and InterPro

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Chemical Data Retrieval and Management

Variant visualisation and quality control

Essential knowledge 1.A.2: Natural selection

1 Introduction. Abstract

the map Redrawing Donald Hobern takes a look at the challenges of managing biodiversity data [ Feature ]

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Computational Biology Course Descriptions 12-14

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

Context dependent visualization of protein function

Use of data mining and chemoinformatics in the identification and optimization of high-throughput screening hits for NTDs

The anatomy of phenotype ontologies: principles, properties and applications

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Sample Size Estimation for Studies of High-Dimensional Data

New Computational Methods for Systems Biology

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

CS612 - Algorithms in Bioinformatics

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Differential Modeling for Cancer Microarray Data

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

Identifying Signaling Pathways

Introduction: The Cell Cycle and Mitosis

Supplementary Information 16

Monitoring neurite morphology and synapse formation in primary neurons for neurotoxicity assessments and drug screening

Exam 1 ID#: October 4, 2007

protein biology cell imaging Automated imaging and high-content analysis

Honors Biology Test Chapter 8 Mitosis and Meiosis

AP Biology Essential Knowledge Cards BIG IDEA 1

Computational Structural Bioinformatics

BIOINFORMATICS LAB AP BIOLOGY

DNA and GENETICS UNIT NOTES

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Mitosis and Meiosis. 2. The distribution of chromosomes in one type of cell division is shown in the diagram below.

Spatial Data Management of Bio Regional Assessments Phase 1 for Coal Seam Gas Challenges and Opportunities

Compare and contrast the cellular structures and degrees of complexity of prokaryotic and eukaryotic organisms.

Biology 2018 Final Review. Miller and Levine

Chromosome Chr Duplica Duplic t a ion Pixley

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

Biological Concepts and Information Technology (Systems Biology)

BME 5742 Biosystems Modeling and Control

STATE UNIVERSITY OF NEW YORK COLLEGE OF TECHNOLOGY CANTON, NEW YORK. COURSE OUTLINE BIOL 310 The Human Genome. Prepared By: Ron Tavernier

-max_target_seqs: maximum number of targets to report

Big Idea 1: The process of evolution drives the diversity and unity of life.

Which row in the chart correctly identifies the functions of structures A, B, and C? A) 1 B) 2 C) 3 D) 4

Document Navigation: Ontologies or Knowledge Organisation Systems

CLRG Biocreative V

Molecular Biology: from sequence analysis to signal processing. University of Sao Paulo. Junior Barrera

Introducing the Morphologi G3 ID The future of particle characterization

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

AP Curriculum Framework with Learning Objectives

Introducing a Bioinformatics Similarity Search Solution

CyFlow Ploidy Analyser High-resolution DNA analysis

Plant Molecular and Cellular Biology Lecture 8: Mechanisms of Cell Cycle Control and DNA Synthesis Gary Peter

Comparative analysis of RNA- Seq data with DESeq2

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

The Saguaro Genome. Toward the Ecological Genomics of a Sonoran Desert Icon. Dr. Dario Copetti June 30, 2015 STEMAZing workshop TCSS

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Topic 2: Cells (12 hours)

Richik N. Ghosh, Linnette Grove, and Oleg Lapets ASSAY and Drug Development Technologies 2004, 2:

Proteomics Systems Biology

Manual: R package HTSmix

Mitochondrial Genome Annotation

Hands-On Nine The PAX6 Gene and Protein

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Geert Geeven. April 14, 2010

E. Incorrect! At telophase II, cells are nearly completed with meiosis, with no cross-over.

Process Analytical Technology Diagnosis, Optimization and Monitoring of Chemical Processes

Pathway Association Analysis Trey Ideker UCSD

A A A A B B1

Warm Up. What are some examples of living things? Describe the characteristics of living things

Computational approaches for functional genomics

Transcription:

Towards a cellular phenotype ontology Gabriella Rustici EMBL-EBI gabry@ebi.ac.uk

Systems Microscopy NoE at a glance The Systems Microscopy NoE (2011 2015) is a FP7 funded project, involving 15 research groups across Europe The aim of this consortium is to develop tools and strategies to achieve a systems biology understanding of the living cell Done combining automated fluorescence microscopy, cell microarray, RNAi screening, quantitative image analysis and data mining Capture data and build models in four dimensions, three-dimensional space and time, and measure dynamic events in single living cells Focus on two cellular processes that are highly relevant to human cancer: cell division and cell migration

Systems Microscopy NoE at a glance WP7 Translational Applications and Outcome of Systems Microscopy WP8 Development of Standards to Enable Systems Microscopy WP9 Development and Application of a Public Database for Systems Microscopy WP1 Systems Biology Analysis of Cell Division WP2 Systems Biology Analysis of Cell Migration WP3 Development of high throughput imaging and screening platforms to enable production of systems biology data WP4 Development of software for automated multi-dimensional quantitative extraction and analysis of live cell image WP6 Development and Application of Modeling Methods for Systems Microscopy WP5 Development of Statistics and Bioinformatics Tools for Multidimensional Image-based Data

What is the current state of this field? Systems microscopy has not achieved the degree of standardization of other omics approaches We need reporting standards, to adequately capture experimental information. We need a repository for quantitative data derived from images (and not a collection of raw image datasets!) that will: 1. provide access to data for the broader research community 2. accelerate the development of analytical methods for this field and 3. promote the integration of independent systems microscopy studies

What metadata/data do we need to capture? study description including study general information and specific screen information, including protocols sirna library information study results as a list of sirnas, associated phenotypes and scores used to assign phenotypes to each sirna

Example of study results GENE SYMBOL ENSEMBL ID sirna ID DEVIATION CELL NUMBER VALID PLATES % INHIBITOR TRANSCRIPTS HIT TRANSCRIPT ID(S) Phenotype ALAS1 ENSG00000023330 116885 0.750 155 4 0.50 8/8 ENST00000493402/ENSmild inhibition of secretion TOR1AIP1 ENSG00000143337 23193 0.750 102 3 0.67 7/8 ENST00000528443/ENSmild inhibition of secretion TRAPPC5 ENSG00000181029 229347 0.750 196 4 0.50 3/3 ENST00000452406/ENSmild inhibition of secretion KCTD10 ENSG00000110906 104812 0.750 87 4 0.50 3/3 ENST00000424763/ENSmild inhibition of secretion ZNF252 ENSG00000196922 233977 0.750 174 4 0.75 2/4 ENST00000426361/ENSmild inhibition of secretion ANXA5 ENSG00000164111 147066 0.751 272 4 0.50 4/4 ENST00000512232/ENSmild inhibition of secretion TRAM1L1 ENSG00000174599 149174 0.751 268 4 0.50 1/1 ENST00000310754 mild inhibition of secretion RSAD2 ENSG00000134321 125833 0.751 196 3 0.67 1/2 ENST00000382040 mild inhibition of secretion SMC2 ENSG00000136824 135817 0.751 271 4 0.50 5/5 ENST00000440179/ENSmild inhibition of secretion C16orf79 ENSG00000182685 215394 0.751 70 3 0.67 1/1 ENST00000328540 mild inhibition of secretion OR2T1 ENSG00000175143 237324 0.751 172 4 0.50 1/1 ENST00000366474 mild inhibition of secretion PCDHB8 ENSG00000120322 27850 0.751 343 4 0.50 1/1 ENST00000239444 mild inhibition of secretion MYL6B ENSG00000196465 11792 0.751 96 4 0.50 1/2 ENST00000207437 mild inhibition of secretion POLG2 ENSG00000136480 120501 0.751 118 4 0.50 1/1 ENST00000322670 mild inhibition of secretion ING1 ENSG00000153487 119240 0.751 178 4 0.50 3/5 ENST00000333219/ENSmild inhibition of secretion EMILIN3 ENSG00000183798 149047 0.752 297 4 0.75 1/1 ENST00000332312 mild inhibition of secretion KCTD12 ENSG00000178695 129129 0.752 277 4 0.50 2/2 ENST00000317765/ENSmild inhibition of secretion GABARAPL1 ENSG00000139112 109698 0.752 287 4 0.50 1/2 ENST00000266458 mild inhibition of secretion SQLE ENSG00000104549 118680 0.752 99 4 0.50 2/4 ENST00000265896/ENSmild inhibition of secretion PITRM1 ENSG00000107959 22207 0.752 163 4 0.50 8/11 ENST00000380994/ENSmild inhibition of secretion LMF2 ENSG00000100258 125791 0.752 181 4 0.75 4/4 ENST00000474879/ENSmild inhibition of secretion CHD3 ENSG00000170004 216241 0.752 166 4 0.50 4/8 ENST00000358181/ENSmild inhibition of secretion BAT2L1 ENSG00000130723 226775 0.752 217 4 0.50 3/9 ENST00000320547/ENSmild inhibition of secretion SLFN13 ENSG00000154760 141355 0.752 168 3 0.67 5/6 ENST00000285013/ENSmild inhibition of secretion DLX6 ENSG00000006377 224974 0.753 293 4 0.50 3/3 ENST00000437638/ENSmild inhibition of secretion SHE ENSG00000169291 228153 0.753 267 4 0.75 1/1 ENST00000304760 mild inhibition of secretion PVALB ENSG00000100362 12249 0.753 386 5 0.60 6/6 ENST00000417718/ENSmild inhibition of secretion

Search types currently supported The current interface prototype supports 5 basic type of searches: 1. for a gene, by gene symbol or Ensembl IDs; 2. for a reagent or sirna, by manufacturer or internal screen ID; 3. for a gene attribute, using Gene Ontology terms; 4. for a phenotype, or multiple phenotypes, within an individual screen and across screens; and 5. for a study, using keywords.

Gene summary view Provides information on a gene and the phenotypes associated with the silencing of the selected gene, across independent screens

Reagent summary view Provides information on a sirna reagent and the phenotypes associated with it, across independent studies

Phenotype summary view across screens Provides a list of genes, whose silencing with a specific reagents, has given rise to a particular set of phenotypes, across screens

Challenges Integrate data derived from independent studies and provide a meaningful representation of the experimental results Two levels of integration: 1. At the quantitative level, through the development of pipelines for data analysis, and 2. At the level of phenotypic descriptions, through the development of an ontology for cellular phenotypes An ontology would help to resolve naming ambiguities (i.e. large nucleus vs large nuclei) as well as group related phenotypes together (i.e. mitotic phenotypes), facilitating the integration of independent datasets at the level of phenotypic description

Can we integrate phenotypic descriptions? Fuchs et al, 2010, Molecular Systems Biology 6: 370 Neumann, Walter et al, 2010, Nature 464:721

Cellular Phenotype Ontology (CPO) Pre-composed ontology based on terms from GO BP, GO CC, GO extensions and PATO Split into structural (morphological) and physiological (process) abnormalities Mitochondrion (GO:0005739) Mitochondrion phenotype (CPO:XX0005739) Mitochondrion normal phenotype Mitochondrion abnormal phenotype Abnormal mitochondrion morphology Abnormal mitochondrion physiology Absence of mitochondrion Hoehndorf R et al. Bioinformatics 2012;28:1783-1789

Cellular Phenotype Ontology (CPO) Physiological abnormalities are split into single and multiple occurrence processes PATO used to refine qualities of each Single occurrence processes Durations (increased or decreased) Participants (increased or decreased) Multiple occurrence processes Abnormal frequency (increased or decreased) Abnormal onset (increased or decreased) Hoehndorf R et al. Bioinformatics 2012;28:1783-1789

Automatic pre-composition Built a beast 140K classes, 220K with imports Good underlying theoretic model Needs extending for Cellular Phenotype Db use case Potential for unrealistic classes EL expressivity Should be scalable with enough computing power Practically a struggle to work with on a modest PC

Entities, processes and qualities Cellular component Biological Processes Abnormal Size Cell types Temporal quality Shapes Gene Ontology Biological process Gene Ontology Cellular Component Cell type ontology (CTO) Phenotype and trait ontology (PATO) Absent

Composing a phenotype description Entity Quality pattern Entity (a bearer of some quality) Quality (some characteristic of the entity) Phenotype: Large nucleus Entity: nucleus (GO_000xxxx) Quality: large (PATO_000xxxx) Phenotype: Cells stuck in metaphase due to metaphase arrest Entity: mitotic metaphase (GO_0000089) Quality: arrested (PATO_0000297)

New strategy for ontology building Annotation tool Ontology Terms Distribute tool to consortium members for phenotype annotation Map the phenotype terms annotated using ontologies to CPO

Phenotypes annotation tool Original phenotypic description Ontology based annotations

Mapping phenotypes from different biological scales Cellular phenotypes from single cultured mammalian cells Cellular phenotypes from mouse tissues Cellular phenotypes from human tissues Collect terms used to annotate cellular phenotypes in the different domains Map the resulting ontologies onto each other to enable correlative analysis

Open questions Sometimes phenotypes are not linked to biological processes because you don t have enough information to make this association We mostly observe cell population phenotypes but the technology is moving towards single cell observations How do we deal with quantitative phenotypes (cell size, nucleus size, actin content, DNA content, )? Existing ontologies might not be granular enough to describe what we want to describe How do we deal with the temporal information?

Goal Develop a data driven, generic upper level ontology for cellular phenotypes Open access tool for annotating phenotypes and capture necessary metadata Templates for new terms and ontology extension Pilot study with Systems Microscopy and BioMedBridges scientists to see how they can use a EQ based annotation tool

Acknowledgements EMBL-EBI: Catherine Kirsanova, Simon Jupp, James Malone, Alvis Brazma EMBL-Heidelberg: Jean-Karim Heriche, Beate Neumann, Bernd Fisher, Wolfgang Huber, Jan Ellenberg, Christoph Moehl, Mayumi Isokane, Celine Revenu CU: Robert Hoehndorf, George Gkoutos Prototype URL: http://www.ebi.ac.uk/fg/sym