Bio-Medical Text Mining with Machine Learning

Similar documents
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

CLRG Biocreative V

Applied Natural Language Processing

Information Extraction from Biomedical Text. BMI/CS 776 Mark Craven

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Special Topics in Computer Science

Colin Batchelor, Nicholas Bailey, Peter Corbett, Aileen Day, Jeff White and John Boyle

Lecture 5 Neural models for NLP

Conditional Language modeling with attention

Natural Language Processing and Recurrent Neural Networks

with Local Dependencies

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Explainable AI that Can be Used for Judgment with Responsibility

SABIO-RK Integration and Curation of Reaction Kinetics Data Ulrike Wittig

Chemical Named Entity Recognition: Improving Recall Using a Comprehensive List of Lexical Features

Research Article HomoKinase: A Curated Database of Human Protein Kinases

Medical Question Answering for Clinical Decision Support

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

Chemical Data Retrieval and Management

Chemical Ontologies. Chemical Ontologies. ChemAxon UGM May 23, 2012

Neural Networks in Structured Prediction. November 17, 2015

Formal Ontology Construction from Texts

Chemistry-specific Features and Heuristics for Developing a CRF-based Chemical Named Entity Recogniser

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Information Extraction from Text

Task-Oriented Dialogue System (Young, 2000)

Francisco M. Couto Mário J. Silva Pedro Coutinho

Subcellular Localisation of Proteins in Living Cells Using a Genetic Algorithm and an Incremental Neural Network

Recurrent Neural Networks. Jian Tang

Introduction to the EMBL-EBI Ontology Lookup Service

Two-Stream Bidirectional Long Short-Term Memory for Mitosis Event Detection and Stage Localization in Phase-Contrast Microscopy Images

Introduction to Pattern Recognition. Sequence structure function

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Permanent Link:

CSC321 Lecture 16: ResNets and Attention

Outline. Terminologies and Ontologies. Communication and Computation. Communication. Outline. Terminologies and Vocabularies.

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Bayesian Hierarchical Classification. Seminar on Predicting Structured Data Jukka Kohonen

STRING: Protein association networks. Lars Juhl Jensen

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Cross Media Entity Extraction and Linkage for Chemical Documents

An Introduction to GLIF

Deep Learning for NLP

Hierachical Name Entity Recognition

Term Generalization and Synonym Resolution for Biological Abstracts: Using the Gene Ontology for Subcellular Localization Prediction

A Database of human biological pathways

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Navigating between patents, papers, abstracts and databases using public sources and tools

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

PREDICTION OF HETERODIMERIC PROTEIN COMPLEXES FROM PROTEIN-PROTEIN INTERACTION NETWORKS USING DEEP LEARNING

Automated annotation of chemical names in the literature with tunable accuracy

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

86 Part 4 SUMMARY INTRODUCTION

Explaining Predictions of Non-Linear Classifiers in NLP

Topology based deep learning for biomolecular data

Araport, a community portal for Arabidopsis. Data integration, sharing and reuse. sergio contrino University of Cambridge

Gene mention normalization in full texts using GNAT and LINNAEUS

Machine Learning. Boris

Machine Learning Overview

Explaining Predictions of Non-Linear Classifiers in NLP

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

Modern Information Retrieval

Lecture 17: Neural Networks and Deep Learning

Reservoir Computing and Echo State Networks

(cell)/erik Bonsdorff (environment)

Information Extraction from Biomedical Text

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Neural Architectures for Image, Language, and Speech Processing

Dialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department

Document Navigation: Ontologies or Knowledge Organisation Systems

Biological Systems: Open Access

Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou

Some Applications of Machine Learning to Astronomy. Eduardo Bezerra 20/fev/2018

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Context dependent visualization of protein function

Natural Language Processing

Integration of functional genomics data

NEURAL LANGUAGE MODELS

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Evolution of a Foundational Model of Physiology: Symbolic Representation for Functional Bioinformatics

A Protein Ontology from Large-scale Textmining?

Graphical models for part of speech tagging

Feature Based Gene Summary Extraction with Re-ranking

Application integration: Providing coherent drug discovery solutions

Segmentation of Cell Membrane and Nucleus using Branches with Different Roles in Deep Neural Network

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

Lecture 8: Recurrent Neural Networks

Using C-OWL for the Alignment and Merging of Medical Ontologies

An overview of deep learning methods for genomics

Lecture 13: Structured Prediction

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

Dimensionality Reduction and Principle Components Analysis

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

CS612 - Algorithms in Bioinformatics

Cold-start knowledge base population using ontology-based information extraction with conditional random fields

Uncovering Protein-Protein Interactions in the Bibliome

Machine learning methods to infer drug-target interaction network

Transcription:

Sumit Madan Department of Bioinformatics - Fraunhofer SCAI Textual Knowledge PubMed Journals, Books Patents EHRs

What is Bio-Medical Text Mining? Phosphorylation of glycogen synthase kinase 3 beta at Threonine, 668 increases the degradation of amyloid precursor protein Biological Expression Language (BEL) BEL Functions Namespace Identifiers Relationship Entity Definitions 2

Biological Entity Classes Protein / Gene - HGNC - ENTREZGENE - UNIPROT - MGI - RGD - Chemical - MESH - ChEBI - ChEMBL - Small RNA - MIRBASE - pirnabank - Disease - MESH - DO - ICD10 - Further Classes - Organism (NCBITaxonomy) - Anatomy (UBERON) - Phenotype (HP) - Biological Process (GO) - Cell (CL) - Cell lines (CLO) - Several clinical features - Bold: Entity Class Normal: Controlled Vocabulary 3

Several Spelling Variants (Synonyms) for One Single Concept AD Over 28 Synonyms available just in Experimental Factor Ontology (EFO). Presenile Dementia Alzheimer s Dementia Alzheimer s Disease (EFO:0000249) Alzheimer s Disease (EFO:0000249) DAT Alzheimer s disorder Source: http://www.ebi.ac.uk/efo/efo_0000249 4

Biological Relations (Example) Source: PMID:10880445 Interestingly, BLP also activates caspase 1 through TLR2, resulting in proteolysis and secretion of mature IL-1beta. Entity Entity Entity p(hgnc:tlr2) increases cat(p(hgnc:casp1)) cat(p(hgnc:casp1)) increases p(hgnc:il1b) Subject Relationship Type Object 5

Relation Extraction Workflow based on Convolutional Neural Network Interestingly, BLP also activates caspase 1 through TLR2, resulting in proteolysis and secretion of mature IL-1beta Named Entity Recognition (NER) Interestingly, BLP also activates caspase 1 through TLR2, resulting in proteolysis and secretion of mature IL-1beta Pre-processing Interestingly, BLP also activates ENTITY-1 through ENTITY-2, resulting in proteolysis and secretion of mature IL-1beta (Source: PMID:10880445) Sentence-matrix creation (usage of Word2Vec model) Subject/object detection model 0.04-0.54......... -0.61... 0.082 Get Associations Merge Relation Association detection model Relationship type detection model Ali, M., Madan, S., Fischer, A., et al. (2017) Automatic Extraction of BEL-Statements based on Neural Networks, Proc. BioCreative VI Chall. Work. 6

Multichannel Convolutional Neural Network (CNN) Architecture Interestingly, BLP also activates ENTITY-1 through ENTITY-2, resulting in proteolysis and secretion of mature IL-1beta (PMID:10880445) Embedding layer Convolution and max-pooling Max-pooling Fully-connected and classification X^x^ Predictions Ali, M., Madan, S., Fischer, A., et al. (2017) Automatic Extraction of BEL-Statements based on Neural Networks, Proc. BioCreative VI Chall. Work. 7

Larger Workflow for Relation Extraction (Project BELIEF) Natural Language Processing Named Entity Recognition Rule-based & Machine Learning (CRF, NN) TXT SD SD etc. etc. SD, POS Tagger etc. ProMiner ProMiner JProMiner Entity Unifier Relations Relations Merger & Writer ProMiner ProMiner SimpleRE, TEES etc. Relation extraction Machine Learning (SVM, NN) Madan, S., Hodapp, S., Senger, P., et al. (2016) The BEL information extraction workflow (BELIEF): evaluation in the BioCreative V BEL and IAT track, Database, 2016, baw136. http://belief.scai.fraunhofer.de/beliefdashboard/ 8

BELIEF Dashboard Functionalities of Web-based Curation Interface Project and Document Management Export curated document User Authentication Document Curation View (visualizes text mining results and also integrates several semantic search tools) 9

Outlook Usage of bidirectional recurrent neural networks (BiLSTM) Use more training data from different tasks (such as BioNLP, and also BioCreative) Create further models to predict biological functions Automate hyper-parameter optimization Train new optimized word2vec models (use PubMed, PMC & ontologies) Build hybrid systems: machine learning + rules-based system (UIMA Ruta) 10