Towards a cellular phenotype ontology Gabriella Rustici EMBL-EBI gabry@ebi.ac.uk
Systems Microscopy NoE at a glance The Systems Microscopy NoE (2011 2015) is a FP7 funded project, involving 15 research groups across Europe The aim of this consortium is to develop tools and strategies to achieve a systems biology understanding of the living cell Done combining automated fluorescence microscopy, cell microarray, RNAi screening, quantitative image analysis and data mining Capture data and build models in four dimensions, three-dimensional space and time, and measure dynamic events in single living cells Focus on two cellular processes that are highly relevant to human cancer: cell division and cell migration
Systems Microscopy NoE at a glance WP7 Translational Applications and Outcome of Systems Microscopy WP8 Development of Standards to Enable Systems Microscopy WP9 Development and Application of a Public Database for Systems Microscopy WP1 Systems Biology Analysis of Cell Division WP2 Systems Biology Analysis of Cell Migration WP3 Development of high throughput imaging and screening platforms to enable production of systems biology data WP4 Development of software for automated multi-dimensional quantitative extraction and analysis of live cell image WP6 Development and Application of Modeling Methods for Systems Microscopy WP5 Development of Statistics and Bioinformatics Tools for Multidimensional Image-based Data
What is the current state of this field? Systems microscopy has not achieved the degree of standardization of other omics approaches We need reporting standards, to adequately capture experimental information. We need a repository for quantitative data derived from images (and not a collection of raw image datasets!) that will: 1. provide access to data for the broader research community 2. accelerate the development of analytical methods for this field and 3. promote the integration of independent systems microscopy studies
What metadata/data do we need to capture? study description including study general information and specific screen information, including protocols sirna library information study results as a list of sirnas, associated phenotypes and scores used to assign phenotypes to each sirna
Example of study results GENE SYMBOL ENSEMBL ID sirna ID DEVIATION CELL NUMBER VALID PLATES % INHIBITOR TRANSCRIPTS HIT TRANSCRIPT ID(S) Phenotype ALAS1 ENSG00000023330 116885 0.750 155 4 0.50 8/8 ENST00000493402/ENSmild inhibition of secretion TOR1AIP1 ENSG00000143337 23193 0.750 102 3 0.67 7/8 ENST00000528443/ENSmild inhibition of secretion TRAPPC5 ENSG00000181029 229347 0.750 196 4 0.50 3/3 ENST00000452406/ENSmild inhibition of secretion KCTD10 ENSG00000110906 104812 0.750 87 4 0.50 3/3 ENST00000424763/ENSmild inhibition of secretion ZNF252 ENSG00000196922 233977 0.750 174 4 0.75 2/4 ENST00000426361/ENSmild inhibition of secretion ANXA5 ENSG00000164111 147066 0.751 272 4 0.50 4/4 ENST00000512232/ENSmild inhibition of secretion TRAM1L1 ENSG00000174599 149174 0.751 268 4 0.50 1/1 ENST00000310754 mild inhibition of secretion RSAD2 ENSG00000134321 125833 0.751 196 3 0.67 1/2 ENST00000382040 mild inhibition of secretion SMC2 ENSG00000136824 135817 0.751 271 4 0.50 5/5 ENST00000440179/ENSmild inhibition of secretion C16orf79 ENSG00000182685 215394 0.751 70 3 0.67 1/1 ENST00000328540 mild inhibition of secretion OR2T1 ENSG00000175143 237324 0.751 172 4 0.50 1/1 ENST00000366474 mild inhibition of secretion PCDHB8 ENSG00000120322 27850 0.751 343 4 0.50 1/1 ENST00000239444 mild inhibition of secretion MYL6B ENSG00000196465 11792 0.751 96 4 0.50 1/2 ENST00000207437 mild inhibition of secretion POLG2 ENSG00000136480 120501 0.751 118 4 0.50 1/1 ENST00000322670 mild inhibition of secretion ING1 ENSG00000153487 119240 0.751 178 4 0.50 3/5 ENST00000333219/ENSmild inhibition of secretion EMILIN3 ENSG00000183798 149047 0.752 297 4 0.75 1/1 ENST00000332312 mild inhibition of secretion KCTD12 ENSG00000178695 129129 0.752 277 4 0.50 2/2 ENST00000317765/ENSmild inhibition of secretion GABARAPL1 ENSG00000139112 109698 0.752 287 4 0.50 1/2 ENST00000266458 mild inhibition of secretion SQLE ENSG00000104549 118680 0.752 99 4 0.50 2/4 ENST00000265896/ENSmild inhibition of secretion PITRM1 ENSG00000107959 22207 0.752 163 4 0.50 8/11 ENST00000380994/ENSmild inhibition of secretion LMF2 ENSG00000100258 125791 0.752 181 4 0.75 4/4 ENST00000474879/ENSmild inhibition of secretion CHD3 ENSG00000170004 216241 0.752 166 4 0.50 4/8 ENST00000358181/ENSmild inhibition of secretion BAT2L1 ENSG00000130723 226775 0.752 217 4 0.50 3/9 ENST00000320547/ENSmild inhibition of secretion SLFN13 ENSG00000154760 141355 0.752 168 3 0.67 5/6 ENST00000285013/ENSmild inhibition of secretion DLX6 ENSG00000006377 224974 0.753 293 4 0.50 3/3 ENST00000437638/ENSmild inhibition of secretion SHE ENSG00000169291 228153 0.753 267 4 0.75 1/1 ENST00000304760 mild inhibition of secretion PVALB ENSG00000100362 12249 0.753 386 5 0.60 6/6 ENST00000417718/ENSmild inhibition of secretion
Search types currently supported The current interface prototype supports 5 basic type of searches: 1. for a gene, by gene symbol or Ensembl IDs; 2. for a reagent or sirna, by manufacturer or internal screen ID; 3. for a gene attribute, using Gene Ontology terms; 4. for a phenotype, or multiple phenotypes, within an individual screen and across screens; and 5. for a study, using keywords.
Gene summary view Provides information on a gene and the phenotypes associated with the silencing of the selected gene, across independent screens
Reagent summary view Provides information on a sirna reagent and the phenotypes associated with it, across independent studies
Phenotype summary view across screens Provides a list of genes, whose silencing with a specific reagents, has given rise to a particular set of phenotypes, across screens
Challenges Integrate data derived from independent studies and provide a meaningful representation of the experimental results Two levels of integration: 1. At the quantitative level, through the development of pipelines for data analysis, and 2. At the level of phenotypic descriptions, through the development of an ontology for cellular phenotypes An ontology would help to resolve naming ambiguities (i.e. large nucleus vs large nuclei) as well as group related phenotypes together (i.e. mitotic phenotypes), facilitating the integration of independent datasets at the level of phenotypic description
Can we integrate phenotypic descriptions? Fuchs et al, 2010, Molecular Systems Biology 6: 370 Neumann, Walter et al, 2010, Nature 464:721
Cellular Phenotype Ontology (CPO) Pre-composed ontology based on terms from GO BP, GO CC, GO extensions and PATO Split into structural (morphological) and physiological (process) abnormalities Mitochondrion (GO:0005739) Mitochondrion phenotype (CPO:XX0005739) Mitochondrion normal phenotype Mitochondrion abnormal phenotype Abnormal mitochondrion morphology Abnormal mitochondrion physiology Absence of mitochondrion Hoehndorf R et al. Bioinformatics 2012;28:1783-1789
Cellular Phenotype Ontology (CPO) Physiological abnormalities are split into single and multiple occurrence processes PATO used to refine qualities of each Single occurrence processes Durations (increased or decreased) Participants (increased or decreased) Multiple occurrence processes Abnormal frequency (increased or decreased) Abnormal onset (increased or decreased) Hoehndorf R et al. Bioinformatics 2012;28:1783-1789
Automatic pre-composition Built a beast 140K classes, 220K with imports Good underlying theoretic model Needs extending for Cellular Phenotype Db use case Potential for unrealistic classes EL expressivity Should be scalable with enough computing power Practically a struggle to work with on a modest PC
Entities, processes and qualities Cellular component Biological Processes Abnormal Size Cell types Temporal quality Shapes Gene Ontology Biological process Gene Ontology Cellular Component Cell type ontology (CTO) Phenotype and trait ontology (PATO) Absent
Composing a phenotype description Entity Quality pattern Entity (a bearer of some quality) Quality (some characteristic of the entity) Phenotype: Large nucleus Entity: nucleus (GO_000xxxx) Quality: large (PATO_000xxxx) Phenotype: Cells stuck in metaphase due to metaphase arrest Entity: mitotic metaphase (GO_0000089) Quality: arrested (PATO_0000297)
New strategy for ontology building Annotation tool Ontology Terms Distribute tool to consortium members for phenotype annotation Map the phenotype terms annotated using ontologies to CPO
Phenotypes annotation tool Original phenotypic description Ontology based annotations
Mapping phenotypes from different biological scales Cellular phenotypes from single cultured mammalian cells Cellular phenotypes from mouse tissues Cellular phenotypes from human tissues Collect terms used to annotate cellular phenotypes in the different domains Map the resulting ontologies onto each other to enable correlative analysis
Open questions Sometimes phenotypes are not linked to biological processes because you don t have enough information to make this association We mostly observe cell population phenotypes but the technology is moving towards single cell observations How do we deal with quantitative phenotypes (cell size, nucleus size, actin content, DNA content, )? Existing ontologies might not be granular enough to describe what we want to describe How do we deal with the temporal information?
Goal Develop a data driven, generic upper level ontology for cellular phenotypes Open access tool for annotating phenotypes and capture necessary metadata Templates for new terms and ontology extension Pilot study with Systems Microscopy and BioMedBridges scientists to see how they can use a EQ based annotation tool
Acknowledgements EMBL-EBI: Catherine Kirsanova, Simon Jupp, James Malone, Alvis Brazma EMBL-Heidelberg: Jean-Karim Heriche, Beate Neumann, Bernd Fisher, Wolfgang Huber, Jan Ellenberg, Christoph Moehl, Mayumi Isokane, Celine Revenu CU: Robert Hoehndorf, George Gkoutos Prototype URL: http://www.ebi.ac.uk/fg/sym