Distributed Common Ground System Army (DCGS-A) The Role of Ontology in the Era of Big (Military) Data Barry Smith Director National Center for Ontological Research 1
Distributed Development of a Shared Semantic Resource (SSR) in support of US Army s Distributed Common Ground System Standard Cloud (DSC) initiative with thanks to: Tanya Malyuta, Ron Rudnicki Background materials: http://x.co/yyxn 2
3
Making data (re-)usable through common controlled vocabularies Allow multiple databases to be treated as if they were a single data source by eliminating terminological redundancy in ways data are described not Person, and Human, and Human Being, and Pn, and HB, but simply: person Allow development and use of common tools and techniques, common training, single validation of data, focused around semantic technology coordinated ontology development and use 4
Ontology =def. controlled vocabulary organized as a graph nodes in the graph are terms representing types in reality each node is associated with definition and synonyms edges in the graph represent well-defined relations between these types the graph is structured hierarchically via subtype relations 5
Ontologies computer-tractable representations of types in specific areas of reality divided into more and less general upper = organizing ontologies, provide common architecture and thus promote interoperability lower = domain ontologies, provide grounding in reality reflecting top-down and bottom-up strategy 6
Success story in biomedicine Goal: integration of biological and clinical data across different species across levels of granularity (organ, organism, cell, molecule) across different perspectives (physical, biological, clinical) within and across domains (growth, aging, environment, genetic disease, toxicity ) 8
RELATION TO TIME CONTINUANT OCCURRENT GRANULARITY INDEPENDENT DEPENDENT ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT Organism (NCBI Taxonomy) Cell (CL) Anatomical Entity (FMA, CARO) Cellular Component (FMA, GO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Cellular Function (GO) Biological Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry 9
RELATION TO TIME CONTINUANT OCCURRENT GRANULARITY INDEPENDENT DEPENDENT COMPLEX OF ORGANISMS ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT Family, Community, Population Organism (NCBI Taxonomy) Cell (CL) Anatomical Entity (FMA, CARO) Cellular Component (FMA, GO) Organ Function (FMP, CPRO) Cellular Function (GO) Population Phenotype Phenotypic Quality (PaTO) Population Process Biological Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Population-level ontologies 10
Environment Ontology RELATION TO TIME CONTINUANT OCCURRENT INDEPENDENT DEPENDENT GRANULARITY ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Environment Ontology 11
RELATION TO TIME CONTINUANT OCCURRENT GRANULARITY INDEPENDENT DEPENDENT ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT Organism (NCBI Taxonomy) Cell (CL) Anatomical Entity (FMA, CARO) Cellular Component (FMA, GO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Cellular Function (GO) Organism-Level Process (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RNAO, PRO) Molecular Function (GO) Molecular Process (GO) rationale of OBO Foundry coverage 12
OBO Foundry approach extended into other domains NIF Standard ISF Ontologies OGMS and Extensions IDO Consortium crop Neuroscience Information Framework Integrated Semantic Framework Ontology for General Medical Science Infectious Disease Ontology Common Reference Ontologies for Plants 13
Modular organization + Extension strategy top level Basic Formal Ontology (BFO) domain level Anatomy Ontology (FMA*, CARO) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Sequence Ontology (SO*) Protein Ontology (PRO*) Molecular Function (GO*) 14
~100 ontologies using BFO US Army Biometrics Ontology Brucella Ontology (IDO-BRU) eagle-i and VIVO (NCRR) Financial Report Ontology (to support SEC through XBRL) IDO Infectious Disease Ontology (NIAID) Malaria Ontology (IDO-MAL) Nanoparticle Ontology (NPO) Ontology for Risks Against Patient Safety (RAPS/REMINE) Parasite Experiment Ontology (PEO) Subcellular Anatomy Ontology (SAO) Vaccine Ontology (VO) 15
Basic Formal Ontology BFO:Continuant BFO:Entity BFO:Occurrent BFO BFO:Independent Continuant BFO:Dependent Continuant BFO:Process BFO:Disposition Thursday, April 18, 2013 16
Basic Formal Ontology and Mental Functioning Ontology (MFO) BFO:Entity BFO:Continuant BFO:Occurrent BFO MFO BFO:Independent Continuant BFO:Dependent Continuant BFO:Process Organism Mental Functioning Related Anatomical Structure BFO:Disposition BFO:Quality Behaviour inducing state Cognitive Representation Affective Representation Bodily Process Mental Process Thursday, April 18, 2013 17
Emotion Ontology extends MFO BFO:Entity BFO MFO BFO:Continuant BFO:Occurrent MFO-EM BFO:Independent Continuant BFO:Dependent Continuant BFO:Process Organism inheres_in BFO:Disposition Cognitive Representation Physiological Response to Emotion Process Bodily Process Mental Process Emotional Action Tendencies Affective Representation Appraisal is_output_of Appraisal Process Subjective Emotional Feeling Emotional Behavioural Process has_part agent_of Emotion Occurrent
Sample from Emotion Ontology: Types of Feeling Thursday, April 18, 2013 19
The problem of joint / coalition operations Intelligence Fire Support Targeting Maneuver & Blue Force Tracking Air Operations Civil-Military Operations Logistics 23
US DoD Civil Affairs strategy for non-classified information sharing 24
Ontologies / semantic technology can help to solve this problem Intelligence Fire Support Targetin g Maneuver & Blue Force Tracking Air Operations Civil-Military Operations Logistics 25
But each community produces its own ontology, this will merely create new, semantic siloes Intelligence Fire Support Targeting Maneuver & Blue Force Tracking Air Operations Civil-Military Operations Logistics 26
What we are doing to avoid the problem of semantic siloes Distributed Development of a Shared Semantic Resource Pilot testing to demonstrate feasibility 27
creating the analog of this in the military domain top level Basic Formal Ontology (BFO) domain level Anatomy Ontology (FMA*, CARO) Cell Ontology (CL) Cellular Component Ontology (FMA*, GO*) Environment Ontology (EnvO) Subcellular Anatomy Ontology (SAO) Infectious Disease Ontology (IDO*) Phenotypic Quality Ontology (PaTO) Biological Process Ontology (GO*) Sequence Ontology (SO*) Protein Ontology (PRO*) Molecular Function (GO*) 28
Semantic Enhancement Annotation (tagging) of source data models using terms from coordinated ontologies data remain in their original state (are treated at arms length) tagged using interoperable ontologies created in tandem can be as complete as needed, lossless, long-lasting because flexible and responsive big bang for buck measurable benefit even from first small investments Coordination through shared governance and training 29
Main challenge: Will it scale? The problem of scalability turns on the ability to accommodate ever increasing volumes and types of data and numbers of users can we preserve coordination (consistency, non-redundancy) as ever more domains become involved? can we respond in agile fashion to ever changing bodies of source data? 31
Strategy for agile ontology creation Identify or create carefully validated general purpose plug-and-play reference ontology modules for principal domains Develop a method whereby these reference ontologies can be extended very easily to cope with specific, local data through creation of application ontologies 32
Reference Ontology vehicle =def: an object used for transporting people or goods tractor =def: a vehicle that is used for towing crane =def: a vehicle that is used for lifting and moving heavy objects vehicle platform=def: means of providing mobility to a vehicle wheeled platform=def: a vehicle platform that provides mobility through the use of wheels Application Ontology artillery vehicle = def. vehicle designed for the transport of one or more artillery weapons wheeled tractor = def. a tractor that has a wheeled platform Russian wheeled tractor type T33 = def. a wheeled tractor of type T33 manufactured in Russia Ukrainian wheeled tractor type T33 = def. a wheeled tractor of type T33 manufactured in Ukraine tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks
Reference Ontology vehicle =def: an object used for transporting people or goods tractor =def: a vehicle that is used for towing crane =def: a vehicle that is used for lifting and moving heavy objects vehicle platform=def: means of providing mobility to a vehicle wheeled platform=def: a vehicle platform that provides mobility through the use of wheels tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks Application Ontology artillery vehicle = def. vehicle designed for the transport of one or more artillery weapons wheeled tractor = def. a tractor that has a wheeled platform Russian wheeled tractor type T33 = def. a wheeled tractor of type T33 manufactured in Russia Ukrainian wheeled tractor type T33 = def. a wheeled tractor of type T33 manufactured in Ukraine
Basic Formal Ontology (BFO) Extended Relation Ontology Agent Ontology Artifact Ontology Event Ontology Geospatial Ontology Information Entity Ontology Quality Ontology Time Ontology
http://milportal.org 40
41
42
43
An example of agile application ontology development: The Bioweapons Ontology (BWO) 44
Kinds of chemical and biological weapons Chemical Nerve agents (sarin gas) Blister agents (mustard gas) Blood agents (cyanide gas) Biological Infectious agents BWO(I) Toxic agents (botulinum toxin, ricin) BWO(T) 45
We focus here on BWO(I) Infectious agents Bacterial (anthrax, bubonic plague, tularemia, brucellosis, cholera ) Viral (Ebola, Marburg ) 46
Examples of ontology terms BFO IDO StaphIDO Independent Continuant Infectious disorder Staph. aureus disorder Dependent Continuant Infectious disease Protective resistance MRSA Methicillin resistance Occurrent Infectious disease course MRSA course 47
Infectious Disease Ontology (IDO) with thanks to Lindsay Cowell (University of Texas SW Medical Center) and Albert Goldfain (Blue Highway, Inc.) IDO Core (Reference Ontology) General terms in the ID domain. IDO Extensions (Application Ontologies) Disease-, host-, pathogen-specific. Developed by subject matter experts. The hub-and-spokes strategy ensures that logical content of IDO Core is automatically inherited by the IDO Extensions
IDO Core Contains general terms in the ID domain: E.g., colonization, pathogen, infection A contract between IDO extension ontologies and the datasets that use them. Intended to represent information along several dimensions: biological scale (gene, cell, organ, organism, population) discipline (clinical, immunological, microbiological) organisms involved (host, pathogen, and vector types)
Examples of ontology terms BFO IDO StaphIDO Independent Continuant Infectious disorder Staph. aureus disorder Dependent Continuant Infectious disease Protective resistance MRSA Methicillin resistance Occurrent Infectious disease course MRSA course 50
IDO Extensions IDO Brucellosis IDO Dengue Fever IDO Influenza IDO Malaria IDO Staphylococcus Aureus Bacteremia IDO Vector Surveillance and Management IDO Plant VO Vaccine Ontology BWO(I) Bioweapons Ontology (Infectious Agents) 51
How IDO evolves: the case of Staph. aureus IDOMAL IDOFLU IDORatSa IDOCore IDORatStrep IDOHIV HUB and SPOKES: Domain ontologies IDOSa IDOStrep IDOHumanSa IDOMRSa IDOHumanStrep IDOAntibioticResistant SEMI-LATTICE: By subject matter experts in different communities of interest. IDOHumanBacterial
54
BWO:disease by infectious agent = def. a disease that is the consequence of the presence of pathogenic microbial agents, including pathogenic viruses, pathogenic bacteria, fungi, protozoa, multicellular parasites, and aberrant proteins known as prions
Strategy used to build BWO(I) with thanks to Lindsay Cowell and Oliver He (Michigan) 1. Start with a glossary such as: http://www.emedicinehealth.com/biological_warfare/ 2. Select corresponding terms from IDO core and related ontologies such as the CHEBI Chemistry Ontology terms needed to describe bioweapons 3. All ontology terms keep their original definitions and IDs. 4. The result is a spreadsheet 57
5. Where glossary terms have no ontology equivalent, create BWO ontology terms and definitions as needed no corresponding ontology term 58
6. Use the Ontofox too to create the first version of the BWO(I) application ontology (http://ontofox.hegroup.org/) 7. Use BWO(I) in annotations, and where gaps are identified create extension terms, for instance weaponized brucella aerosol anthrax smallpox incubation period This establishes a virtuous cycle between ontology development and use in annotations 59
Potential uses of BWO semantic enhancement of bioweapons intelligence data results will be automatically interoperable with relevant bioinformatics and public health IT tools for dealing with infections, epidemics, vaccines, forensics, to annotate research literature and research data on bioweapons to create computable definitions to substitute for definitions in free text glossaries 60
Why do people think they need lexicons Training Compiling lessons learned Compiling results of testing, e.g. of proposed new doctrine Collective inferencing Official reporting Doctrinal development Standard operating procedures Sharing of data People need to (ensure that they) understand each other