Molecular Similarity Searching Using Inference Network

Similar documents
An Enhancement of Bayesian Inference Network for Ligand-Based Virtual Screening using Features Selection

Universities of Leeds, Sheffield and York

Bayesian Networks BY: MOHAMAD ALSABBAGH

Directed Graphical Models

Functional Group Fingerprints CNS Chemistry Wilmington, USA

Directed Graphical Models or Bayesian Networks

Bayesian Networks. Motivation

The Monte Carlo Method: Bayesian Networks

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

1. Some examples of coping with Molecular informatics data legacy data (accuracy)

Cheminformatics analysis and learning in a data pipelining environment

STA 4273H: Statistical Machine Learning

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

has its own advantages and drawbacks, depending on the questions facing the drug discovery.

Introduction to Bayesian Learning

Introduction to Machine Learning

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Introduction to Machine Learning

Rapid Introduction to Machine Learning/ Deep Learning

Learning Bayesian networks

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Probabilistic Graphical Models (I)

Bayesian network modeling. 1

A Framework For Genetic-Based Fusion Of Similarity Measures In Chemical Compound Retrieval

Probabilistic Classification

Uncertainty and Bayesian Networks

Information Retrieval

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

An Empirical-Bayes Score for Discrete Bayesian Networks

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

Introduction to Artificial Intelligence. Unit # 11

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Learning Bayesian Networks (part 1) Goals for the lecture

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Bayesian Networks: Representation, Variable Elimination

IR Models: The Probabilistic Model. Lecture 8

Learning in Bayesian Networks

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Inference in Bayesian Networks

{ p if x = 1 1 p if x = 0

Probabilistic Logics and Probabilistic Networks

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Probability and Statistics

Parameter Learning: Binary Variables

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Introduction. OntoChem

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Illustration of the K2 Algorithm for Learning Bayes Net Structures

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Studying the effect of noise on Laplacian-modified Bayesian Analysis and Tanimoto Similarity

Beyond Uniform Priors in Bayesian Network Structure Learning

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

PROBABILISTIC REASONING SYSTEMS

2 : Directed GMs: Bayesian Networks

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Machine Learning Concepts in Chemoinformatics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Exact model averaging with naive Bayesian classifiers

3 : Representation of Undirected GM

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Bayes Nets: Independence

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Machine Learning Lecture 14

Chris Bishop s PRML Ch. 8: Graphical Models

Published in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Directed Graphical Models

Graphical Models - Part I

Ranked Retrieval (2)

Quantifying uncertainty & Bayesian networks

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Bayesian Networks. Distinguished Prof. Dr. Panos M. Pardalos

Structure-Activity Modeling - QSAR. Uwe Koch

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian networks. Chapter Chapter

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

Similarity methods for ligandbased virtual screening

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Bayesian Networks to design optimal experiments. Davide De March

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

CHAPTER-17. Decision Tree Induction

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Bayesian Networks 2:

Transcription:

Molecular Similarity Searching Using Inference Network Ammar Abdo, Naomie Salim* Faculty of Computer Science & Information Systems Universiti Teknologi Malaysia

Molecular Similarity Searching Search for chemical compounds with similar structure or properties to a known compound A variety of methods used in these searches Graph theory 1 D, 2D and 3D shape similarity, docking similarity, electrostatic similarity and others. Machine learning methods e.g. BKD,SVM,NBC,NN Vector space model using 2D fingerprints and Tanimoto coefficients is one the most widely used molecular similarity measure

Rationale for Chemical Similarity Similar property principle structurally similar molecules are likely to have similar properties Given a known active molecule, a similarity search can identify further molecules in the database for testing

Probabilistic models (Alternative approach) Why probabilistic models Information Retrieval deals with Uncertain Information Query and compounds characterizations are incomplete Probability theory seems to be the most natural way to quantify uncertainty Applied in IR for text document

Why Bayesian Networks Bayesian Nets is the most popular way of doing probabilistic inference in AI Clear formalism to combine evidences Modularize the world (dependencies) Bayesian Network Models for IR Inference Network (Turtle & Croft, 1991) Belief Network (Ribeiro-Neto & Muntz, 1996) Simple

Bayesian inference Bayes Rule : the heart of Bayesian techniques P(H E) = P(E H)P(H) / P(E) where, H is a hypothesis and E is an evidence P(H) : prior probability P(H E) : posterior probability P(E H) : probability of E if H is true P(E) : a normalizing constant, then we write: P(H E) ~ P(E H)P(H)

Bayesian Networks What is a Bayesian networks? It is directed acyclic graphs (DAGs) in which nodes represent random variables, The parents of a node are those judged to be direct causes for it. The root of the network are the nodes without parents. The arcs represent casual relationships between these variables, and the strengths of these casual influences are expressed by conditional probabilities. x 1 x n : parent nodes, X the set of parents of y (in this case, root nodes) y : child node x i cause y The influence of X on y can be quantified by any function (conditional probabilities) F(y,X)=P(y X) x 1 x 2 y x n

Bayesian networks p(a) a c b p(c a,b) for all values for a,b,c p(b) Conditional dependence Running Bayesian Nets: Given probability distributions for roots and conditional probabilities of nodes, we can compute apriori probability of any instance Changes in parents (e.g., b was observed) will cause recomputation of probabilities

How to describe and compare molecules Network Model generation Description of the system in a suitable network form Representation of importance of descriptors (weighting schemes) Probability estimation for the network model Calculate the similarity scores Bayesian networks approach to molecular similarity searching

Bayesian inference network Nodes compounds (c j ) features (f i ) queries (q 1, q 2, and q r ) target (A) Edges from c j to its feature nodes f i indicate that the observation of c j increase the belief in the variables f i.

Definitions f 1, c j, and q 1 are random variables. F=(f 1, f 2,...,f n ) is an n-dimensional vector (equal to fingerprint length) f i, i {0, 1}, then f has 2x2 n possible states c j, j {0, 1}; q {0, 1} The rank of a compound c j is computed as P(q=true c j =true) (c j stands for a state where c j =true and i j c i =false, because we observe one compound at a time)

Direct Acyclic Graph (DAG) of compound nodes as roots, contain prior probability of observing compound feature nodes as leaves, contain probability associated with node given set of parent compounds Construct Compound Network (once)

Inverted DAG with single leaf for target molecule, multiple roots that correspond to the features that express query. A set of intermediate query nodes may also be used in case of a multiple query used to express the target. Attach it to compound Network Construct Query Network for each query

Find probability that target molecule (A) is satisfied given compound c j has been observed Instantiate each c j which corresponds to attaching evidence to network by stating that c j is true and rest of compounds as false Find subset of c j s which maximizes the probability value of node A (best subset). Retrieve these c j s as the answer to query. Similarity calculation

Bayesian inference network The retrieval of an active compound compared to a given target structure is obtained by means of an inference process through a network of dependences. To achieve the inference process We need to estimate the strength of the relationships represented by network This involves estimating & encoding a set of conditional probability distributions The inference network we have described comprise of four different layers of nodes (four different random variables), first layer comprise the compound nodes (roots) The probability associated with these nodes is define as: P(c j )=1/(collection size) prior probability

Bayesian inference network The second layer comprise of the feature nodes, so we need to compute P(f i ). P(f i c j ) will be computed as follows, since dependency is based on first layer (parent nodes). Weighting function is used to estimate the probability in p(f i /c j ) where α is a constant and experiments using the inference network (Turtle, 1991) show that the best value for α is 0.4, ff ij is the frequency of the i th feature within j th compound, icf i is the inverse compound frequency of the i th feature in the collection, cl j is the size of j th compound, total_cl is the total length of compounds in the collection, and m is total number of compounds in the collection (this Eq. has been adapted from Okapi retrieval system (Robertson et al., 1995))

Bayesian inference network The third layer comprises only the query nodes p(q k ) where c jk the set of features in common between j th compound and k th query, cl j is the size of j th compound, nff ik is the normalized frequency of the i th feature within k th query, nicf i is the normalized inverse compound frequency of the i th feature in the collection and p i is the estimated probability at the i th feature node. where ff ij is the frequency of the i th feature within j th compound,

Bayesian inference network The last layer comprises only the activity-need node (target) or bel(a) in the case of where more than one query is used. Weighted MAX Weighted SUM where c jk is the set of feature in common between j th compound and k th query, ql k is the size of the k th query, p jk is the estimated probability that the k th query is met by the j th compound, and r is the number of queries.

Experimental details Subset of MDDR with 40751 molecules 12 activity classes In all, 6804 actives in the 12 classes 10 set of 10 randomly chosen compounds from each activity class (to form a set of queries). For comparison purpose, similarity calculation is also done using non-binary Tanimoto coefficient Six different type of weighted fingerprints from Scitegic atom type extended-connectivity counts (ECFC), functional class extended-connectivity counts (FCFC), atom type atom environment counts (EEFC), functional class atom environment counts (FEFC), atom type hashed atom environment counts (EHFC), and functional class hashed atom environment counts (FHFC)

MDDR Data no. of unique av. no. mols. diversity Code Activity class Actives AF a MF b AF MF mean SD 5H3 5HT3 antagonists 213 133 87 1.60 2.45 0.8537 0.008 5HA 5HT1A agonists 116 67 54 1.73 2.15 0.8496 0.007 D2A D2 antagonists 143 109 75 1.31 1.91 0.8526 0.005 Ren Renin inhibitors 993 542 328 1.83 3.03 0.7188 0.002 Ang Angiontensin II AT1 antagonists 1367 698 396 1.96 3.45 0.7762 0.002 Thr Thrombin inhibitors 885 528 335 1.68 2.64 0.8283 0.002 SPA Substance P antagonists 264 119 78 2.22 3.38 0.8284 0.006 HIV HIV-1 protease inhibitors 715 455 330 1.57 2.17 0.8048 0.004 Cyc Cyclooxygenase inhibitors 162 83 44 1.95 3.68 0.8717 0.006 Kin Tyrosin protein kinase inhibitors 453 247 162 1.83 2.80 0.8699 0.006 PAF PAF antagonists 716 381 252 1.88 2.84 0.8669 0.004 HMG HMG-CoA reductase inhibitors 777 337 168 2.31 4.63 0.8230 0.002 a Unique AF is the number of unique atomic frameworks present in the class. b Unique MF is the number of unique molecular frameworks present in the class.

Use of a single reference structure Highest diverse class

Use of a single reference structure Comparison of the average percentage of unique atomic frameworks obtained in the top 5% of the ranked test set using BIN & Tan with EHFC_4 Highest diverse class

Use of multiple reference structures Comparison between BIN & Tan using MAX rule and ECFC_4

Use of multiple reference structures Comparison of the average percentage of atomic frameworks retrieved obtained in the top 5% of the ranked test set using BIN-MAX & Tan-MAX with ECFC_4

BIN with multiple molecular descriptors So far have considered using just a single molecular descriptor and multiple reference structures as the basis for a search Further work to search with multiple molecular descriptors (ECFC4, EHFC4, FHFC4, FPFC4,PHPFC3) with single and multiple reference structures

Use of a single molecular descriptors and a single reference structure Compound nodes c 1 c 2 c m Feature nodes D 1 f 1 f n f 1 f n f 1 f n D 2 D s Query nodes D 1 D 2 D s q 1 q r q 1 q r q 1 q r Weighted-max link matrices wmax 1 wmax 2 wmax s wsum Target node A

Use of multiple molecular descriptors and a single reference structure Comparison between multiple descriptors and single descriptor with single reference structure using BIN

Use of multiple molecular descriptors and a multiple reference structures Comparison between multiple descriptors and single descriptor with multiple reference structures using BIN

Summary I BIN method with a single active reference structure outperforms the Tanimoto similarity method in 11 classes (between 6% to 71%) 19% overall improvement only in one activity (Cyclooxygenase inhibitors) BIN is slightly inferior to Tan (-5%) BIN with multiple reference structures superior to Tan in all activity classes (between 5% to 118%) significantly outperform Tan with overall improvements 35% performance improvement in the overall average recall rate

Summary II BIN with multiple descriptors and single reference structure slightly outperform the BIN with single descriptor and single reference BIN with multiple descriptors and multiple reference structures slightly outperform the BIN with multiple descriptors and multiple references BIN with multiple descriptors will enhance performance (with a high percentage) when the sought actives are structurally heterogeneous. But it will slightly enhance performance when the sought actives are structurally homogeneous.

Summary III Some evidence to suggest that the BIN is more effective at scaffold hopping for the more diverse data sets. The networks do not impose additional costs because the networks do not include cycles. The major strength is net combination of distinct evidential sources to support the rank of a given compound. BIN provide the ability to integrate into a single framework, several descriptors and several references

Use of a single reference structure Thank you