C.-A. Azencott, M. A. Kayala, and P. Baldi. Institute for Genomics and Bioinformatics Donald Bren School of Information and Computer Sciences

Similar documents
Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and. University College Dublin

Introduction to Machine Learning Spring 2018 Note Neural Networks

Neural Networks Task Sheet 2. Due date: May

Learning from Data: Multi-layer Perceptrons

Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network

Lecture - 24 Radial Basis Function Networks: Cover s Theorem

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Integrated Cheminformatics to Guide Drug Discovery

A summary of Deep Learning without Poor Local Minima

Feed-forward Network Functions

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Chemical Reaction Databases Computer-Aided Synthesis Design Reaction Prediction Synthetic Feasibility

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Nonlinear Classification

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Midterm: CS 6375 Spring 2015 Solutions

Large scale classification of chemical reactions from patent data

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

A neural network representation of the potential energy surface in Si- and Si-Li systems

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CS 188: Artificial Intelligence Spring Announcements

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING

Data Informatics. Seon Ho Kim, Ph.D.

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Holdout and Cross-Validation Methods Overfitting Avoidance

Protein tertiary structure prediction with new machine learning approaches

Biological Systems: Open Access

TensorFlow. Dan Evans

Automatic Differentiation and Neural Networks

Deep Learning Theory and Applications in the Natural Sciences

FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION

CS 188: Artificial Intelligence Spring Announcements

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

A General Method for Combining Predictors Tested on Protein Secondary Structure Prediction

CSC321 Lecture 16: ResNets and Attention

Agenda & Announcements

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Big Idea 1: Structure of Matter Learning Objective Check List

Recent Advances in Bayesian Inference Techniques

Warm up: risk prediction with logistic regression

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

Portugaliae Electrochimica Acta 26/4 (2008)

Detecting Statistical Interactions from Neural Network Weights

Explaining Results of Neural Networks by Contextual Importance and Utility

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

How To Juggle. Chemistry, Computers, and Business Interests: Perspectives on the Career Transition from Information Buyer to Information Supplier

COMS 4771 Introduction to Machine Learning. Nakul Verma

10701/15781 Machine Learning, Spring 2007: Homework 2

SUPPLEMENTARY MATERIALS

y(x n, w) t n 2. (1)

Regularization in Neural Networks

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Statistical Pattern Recognition

Chemistry Informatics in Academic Laboratories: Lessons Learned

Search for Inspiral GW Signals Associated with Short GRBs using Artificial Neural Networks

STEM study methods review: Please turn in concept map / review questions from week 2

Chapter ML:VI (continued)

The Perceptron Algorithm 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

STRUCTURAL BIOINFORMATICS I. Fall 2015

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Multilayer Perceptrons and Backpropagation

Introduction to Bioinformatics

CHEMISTRY 413 Spring 2014

REQUIREMENTS FOR THE BIOCHEMISTRY MAJOR

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Neural Networks: Backpropagation

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

Restricted Boltzmann Machines

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

Ch.6 Deep Feedforward Networks (2/3)

Strength-based and time-based supervised networks: Theory and comparisons. Denis Cousineau Vision, Integration, Cognition lab. Université de Montréal

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Bridging the Dimensions:

Application of Artificial Neural Network for Short Term Load Forecasting

A PARAMETRIC INVESTIGATION OF THE DIAGNOSTIC ABILITY OF PROBABILISTIC NEURAL NETWORKS ON TURBOFAN ENGINES

CN2 1: Introduction. Paul Gribble. Sep 10,

Parts 3-6 are EXAMPLES for cse634

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

COMP-4360 Machine Learning Neural Networks

SOLUTIONS CHEMISTRY EXAMPLES

A Deep Interpretation of Classifier Chains

Chemistry Courses -1

Multivariate Analysis, TMVA, and Artificial Neural Networks

Stat 502X Exam 2 Spring 2014

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

1 What a Neural Network Computes

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Address for Correspondence

Deep Feedforward Networks

CSC Neural Networks. Perceptron Learning Rule

The Quantum Landscape

Gaussian Processes: We demand rigorously defined areas of uncertainty and doubt

Graph-Based Anomaly Detection with Soft Harmonic Functions

Discriminative Training. March 4, 2014

Transcription:

Learning Scoring Functions for Chemical Expert Systems C.-A. Azencott, M. A. Kayala, and P. Baldi Institute for Genomics and Bioinformatics Donald Bren School of Information and Computer Sciences 237th ACS Meeting March 24, 2009 Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 1 / 20

Goals Chemical space exploration Reproduce Augment problem-solving ability of organic chemists Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 2 / 20

Reaction Simulator Input: Starting materials Conditions (ph, temperature) Output: Products Energy diagram Reaction favorability scoring function? Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 3 / 20

Energy of Activation G Measure of reaction favorability Directly related to reaction rates Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 4 / 20

Energy of Activation G Measure of reaction favorability Directly related to reaction rates How can we compute G? Direct computation not possible Laboratory experiment or QM/MM simulation Infer with Machine Learning Data Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 4 / 20

G numerical data Experimental data Quantum Mechanical / Molecular Modeling Problems: Time Consuming Expensive Very little available What can we do? Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 5 / 20

How do chemists solve problems? Without: performing experiments producing numerical G chemists can (almost) always: tell which of several possible reactions is more favorable produce relative rates Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 6 / 20

How do chemists solve problems? Without: performing experiments producing numerical G chemists can (almost) always: tell which of several possible reactions is more favorable produce relative rates How do we get this qualitative knowledge? Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 6 / 20

Reaction Explorer Jonathan H. Chen: http://www.reactionexplorer.org Rule-based expert system Basic undergraduate organic chemistry Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 7 / 20

Reaction Explorer Jonathan H. Chen: http://www.reactionexplorer.org Rule-based expert system Basic undergraduate organic chemistry Create pairs of elementary steps ordered by favorability Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 7 / 20

Approach 1: Priority Ordering Rules priority order elementary movements Most favorable electron movement Less favorable electron movement Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 8 / 20

Approach 2: Implicitly Unfavorable Reactions From what simple elementary movement should happen Deduce what elementary movements should not happen Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 9 / 20

Data Sets CHNO + Halides, Ionic Reactions. Data Set Number of Pairs Priority ordered 3, 457 Implicitly unfavored 475, 684 (from 16, 806 most favorable reactions) Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 10 / 20

Feature Representation, f : Elementary Step R n Electron movement between pair of source / sink orbitals Principled choice of 150 features Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 11 / 20

Neural Network Architecture I Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 12 / 20

Neural Network Architecture II Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 13 / 20

Classification Performance Cross-Validated accuracy: Data Set Architecture Accuracy Priority ordered Perceptron (no hidden nodes) 92.8% 15 hidden nodes 93.7% Implicitly unfavored Perceptron (no hidden nodes) 98.8% 15 hidden nodes 99.0% Altogether Perceptron (no hidden nodes) 98.6% 15 hidden nodes 99.0% Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 14 / 20

Performance in the Simulator I Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 15 / 20

Performance in the Simulator II Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 16 / 20

Performance in the Simulator III Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 17 / 20

Performance in the Simulator IV Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 18 / 20

Conclusion Lack of quantitative data Compensate with qualitative knowledge Applied to the prediction of G Ionic Reactions C, H, N, O + halides Qualitative trends / ranking order of reactivities necessary to solve relevant chemistry problems Future Work Expand coverage Validation on external reaction databases (e.g. SPRESI) Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 19 / 20

Thanks Matthew A. Kayala Jonathan H. Chen Reaction Simulation Expert System for Synthetic Organic Chemistry, talk Thursday, March 26th at 10:30 am (CINF General Papers) Prof. Pierre Baldi The Baldi Lab OpenEye Academic Software License ChemAxon Acadedmic Software License NIH/NLM Biomedical Informatics Training Program NSF Grants 0321390 and 0513376 The Dreyfus Foundation Special Grant Program in Chemical Sciences Azencott (ACS Spring 09) Learning Chem Scoring Functions March 24, 2009 20 / 20