QSAR in Green Chemistry

Similar documents
Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Structure-Activity Modeling - QSAR. Uwe Koch

OECD QSAR Toolbox v.4.1. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Priority Setting of Endocrine Disruptors Using QSARs

Quantitative Structure Activity Relationships: An overview

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Data Mining in the Chemical Industry. Overview of presentation

Quantitative Structure-Activity Relationship (QSAR) computational-drug-design.html

Introduction to Chemoinformatics and Drug Discovery

Machine Learning Concepts in Chemoinformatics

OECD QSAR Toolbox v.4.1

OECD QSAR Toolbox v.3.3

OECD QSAR Toolbox v.4.1. Step-by-step example for predicting skin sensitization accounting for abiotic activation of chemicals

OECD QSAR Toolbox v.3.4

QMRF# Title. number and title in JRC QSAR Model Data base 2.0 (new) number and title in JRC QSAR Model Data base 1.0

An Integrated Approach to in-silico

OECD QSAR Toolbox v.3.3. Predicting skin sensitisation potential of a chemical using skin sensitization data extracted from ECHA CHEM database

Computational Methods and Drug-Likeness. Benjamin Georgi und Philip Groth Pharmakokinetik WS 2003/2004

Screening and prioritisation of substances of concern: A regulators perspective within the JANUS project

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Development of a Structure Generator to Explore Target Areas on Chemical Space

OECD QSAR Toolbox v.3.4. Example for predicting Repeated dose toxicity of 2,3-dimethylaniline

OECD QSAR Toolbox v.3.4

Structural biology and drug design: An overview

KATE2017 on NET beta version Operating manual

Docking. GBCB 5874: Problem Solving in GBCB

OECD QSAR Toolbox v.3.2. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.4. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build a userdefined

OECD QSAR Toolbox v.4.1. Step-by-step example for building QSAR model

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

OECD QSAR Toolbox v.3.3. Step-by-step example of how to build and evaluate a category based on mechanism of action with protein and DNA binding

OECD QSAR Toolbox v.3.0

Receptor Based Drug Design (1)

OECD QSAR Toolbox v.3.3. Step-by-step example of how to categorize an inventory by mechanistic behaviour of the chemicals which it consists

QSAR Modeling of Human Liver Microsomal Stability Alexey Zakharov

Chapter 8: Introduction to QSAR

COMPUTER AIDED DRUG DESIGN (CADD) AND DEVELOPMENT METHODS

Using Self-Organizing maps to accelerate similarity search

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options for grouping with metabolism

OECD QSAR Toolbox v.3.3. Predicting acute aquatic toxicity to fish of Dodecanenitrile (CAS ) taking into account tautomerism

Read-Across or QSARs?

molecules ISSN

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

Holdout and Cross-Validation Methods Overfitting Avoidance

Course in Data Science

Using AutoDock for Virtual Screening

PATTERN CLASSIFICATION

Notes of Dr. Anil Mishra at 1

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

QSAR APPLICATION TOOLBOX ADVANCED TRAINING WORKSHOP. BARCELONA, SPAIN 3-4, June 2015 AGENDA

Contents 1 Open-Source Tools, Techniques, and Data in Chemoinformatics

In silico pharmacology for drug discovery

A Comparison of Probabilistic, Neural, and Fuzzy Modeling in Ecotoxicity

Exploring the black box: structural and functional interpretation of QSAR models.

Microarray Data Analysis: Discovery

Regulatory use of (Q)SARs under REACH

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Modern Information Retrieval

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

Plan. Day 2: Exercise on MHC molecules.

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

DISCOVERING new drugs is an expensive and challenging

A Review on Computational Methods in Developing Quantitative Structure-Activity Relationship (QSAR)

The In Silico Model for Mutagenicity

ECO24 Prediction of Non-Extractable Residues Using Structural Information ( Structural Alerts )

#33 - Genomics 11/09/07

In Silico Investigation of Off-Target Effects

AMBIT. Modules. Database. Chemical compounds. Table Error! No text of specified style in document..1

OECD QSAR Toolbox v.3.3

OECD QSAR Toolbox v.4.1. Tutorial illustrating new options of the structure similarity

Representation of molecular structures. Coutersy of Prof. João Aires-de-Sousa, University of Lisbon, Portugal

Translating Methods from Pharma to Flavours & Fragrances

Statistical concepts in QSAR.

Quantitative structure activity relationship and drug design: A Review

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CHAPTER 6 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP (QSAR) ANALYSIS

Computational Approaches towards Life Detection by Mass Spectrometry

Final Exam, Machine Learning, Spring 2009

Introduction. OntoChem

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

1.QSAR identifier 1.1.QSAR identifier (title): QSAR for Relative Binding Affinity to Estrogen Receptor 1.2.Other related models:

ChemAxon. Content. By György Pirok. D Standardization D Virtual Reactions. D Fragmentation. ChemAxon European UGM Visegrad 2008

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Coefficient Symbol Equation Limits

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

Chemical Space: Modeling Exploration & Understanding

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Introduction to Machine Learning Midterm Exam

Molecular Descriptors Theory and tips for real-world applications

6.036 midterm review. Wednesday, March 18, 15

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

MultiCASE CASE Ultra model for severe skin irritation in vivo

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Grouping and Read-Across for Respiratory Sensitisation. Dr Steve Enoch School of Pharmacy and Biomolecular Sciences Liverpool John Moores University

Quiz QSAR QSAR. The Hammett Equation. Hammett s Standard Reference Reaction. Substituent Effects on Equilibria

Transcription:

QSAR in Green Chemistry Activity Relationship QSAR is the acronym for Quantitative Structure-Activity Relationship Chemistry is based on the premise that similar chemicals will behave similarly The behavior/activity of a chemical is derived from its structure QSAR research searches for relationships between chemical structure and activity Why QSAR? Limited toxicity data on majority of chemicals Priority setting for testing In QSAR, estimating behavior of untested chemicals has been used for >60 years

QSAR basics QSARs associated with endpoint ( i.e.lc50) QSAR attempt to group chemical behavior in terms of toxicity QSAR predicts biological activity (endpoints) directly from models of chemical structure Each QSAR predicts a specific endpoint Discrete Endpoints (e.g., Carcinogenicity, Mutagenicity) Chemical + Structure + Data SAR Software Toxicity Response Predictions

What a QSAR needs Set of chemicals Molecular Descriptors for chemicals Data Training and Test Model or Algorithm linking descriptors to endpoint Independent Validation (hopefully)

QSAR flow diagram

Open Tox Examples

Biomolecular Interactions Chemical Interactions DH f products metabolic transformation Intrinsic Chemical Reactivity Structure 2D 3D SurfArea chirality planarity volume distances pka functional groups fragments bonds electrostatic potential Toxicity MW Risk Assessment solvation energies H-bond energies electron affinities frontier orbital energies increasing model sophistication bond energies radical formation transition state energies rxn w/ model nuclephiles increasing prior knowledge energy of rxn intermediates

Molecular Descriptors Descriptors representing properties of complete molecules Descriptors calculated from 2D structure Descriptors requiring 3D representations

Descriptors representing properties of complete molecules SMILES-simplified molecular-input lineentry system -a machine-readable format -2D representation of molecules as linear strings of alpha-numeric characters. LogP, Hydrophobicity LogP the logarithm of the partition coefficient between n-octanol and water ClogP (Leo and Hansch) based on small set of values from a small set of simple molecules

Molecular Descriptors SMILES-simplified molecular-input line-entry system

Descriptors calculated from 2D structure Single-valued descriptors calculated from the 2D graph of the molecule Simple counts of features Lipinski Rule of Five (H bonds, MW, etc.) Number of ring systems Number of rotatable bonds Characterize structures according to size, degree of branching, connectivity and overall shape Wiener Index counts the number of bonds between pairs of atoms and sums the distances between all pairs Randić (et al.) branching index - Defines a degree of an atom as the number of adjacent non-hydrogen atoms Bond connectivity value is the reciprocal of the square root of the product of the degree of the two atoms in the bond. Branching index is the sum of the bond connectivities over all bonds in the molecule. Chi indexes introduces valence values to encode sigma, pi, and lone pair electrons Kappa Shape Indexes -characterize aspects of molecular shape Compare the molecule with the extreme shapes possible for that number of atoms Range from linear molecules to completely connected graph

2D fingerprints Nodes do not represent atoms but features such as functionally important groups or whole ring system.

DESCRIPTORS BASED ON 3D REPRESENTATIONS Require the generation of 3D conformations Can be computationally time consuming with large data sets Usually must take into account conformational flexibility 3D fragment screens encode spatial relationships between atoms, ring centroids, and planes Toxicophores Based on atoms or substructures thought to be relevant for toxicity Expert driven

DRAGON Dragon - http://www.vcclab.org/lab/edragon/ an application for the calculation of molecular descriptors 1,600 molecular descriptors that are divided into 20 logical blocks. ACD Labs -http://www.acdlabs.com ACD Labs values now incorporated into the CAS Registry File for millions of compounds I-Lab: http://ilab.acdlabs.com/ Name generation NMR prediction Physical property prediction

Open Tox Descriptor Algorithms ToxDesc http://opentox-dev.informatik.tu-muenchen.de:8080/toxdesc 11 descriptor calculation algorithms Existing methodss OpenBabel, JOELib2, The Chemistry Development Kit (CDK) and gspan, New methods FMiner, MakeMNA, and MakeQNA. AMBIT The single descriptor calculation web service developed by OpenTox partner IDEA offers descriptors calculated by several packages http://www.opentox.org/dev/documentation/components/ambit http://ambit.sourceforge.net/

Open Tox Descriptor Algorithms

Open Tox Descriptor Algorithms

Open Tox Descriptor Algorithms

Open Tox Descriptor Algorithms

Open Tox Data Sets Textual databases (e.g., IARC [35], NTP [36]); Machine readable files (e.g.,.sdf) DSSTox, ISSCAN, AMBIT,RepDose Public Databases PubChem, ACToR, CPDBAS, DBPCAN, EPAFHM, KIERBL, IRISTR, FDAMDD, ECETOC skin irritation, LLNA skin sensitization and the Bioconcentration factor (BCF) Gold Standard Database

Open Tox Dataset Selection

Open Tox Dataset Selection

Open Tox Dataset Selection

Chemistry Space of the NCTR estrogen binding Dataset NCTR dataset: x Strong Weak inactive EPA dataset (58K) Competitive binding assays for the estrogen nuclear receptor proteins NCTR dataset cover a significant part of the chemistry space defined by the EPA dataset Most active compounds cluster together, whereas inactives scatter over the space.

Number of Chemicals Data sets NCTR Dataset (130 active and 100 inactive) 50 120 100 80 RBA=100 for E 2 40 30 60 40 20 0 Inactive <0.001 0.01~0.001 0.01~0.1 0.1~1 1~10 10~100 >100 Relative Binding Activity (RBA) 20 10 0 Steroids DESs Phytoestrogens DDTs PCBs Alkyphenols phthalates pesticides others Over six orders of magnitude of RBA range A wide range of chemical classes

Algorithms Linear QSAR Models Supervised classification Unsupervised classification Clustering approaches Machine Learning methods Genetic Algorithms Neural Networks Self Organizing Maps (SOM) Toxicophore search

Open Tox Algorithms Feature Selection Algorithms -algorithms for the reduction of the dimensionality of a dataset, by selecting only a subset of a full set of descriptors included in the dataset including a partial least-squares filter, principle components analysis (PCA), chi-squared attribute evaluation, information-gain attribute evaluation, a scaling filter, and a missing value replacer; Un Supervised Clustering Algorithms Standard algorithms - unsupervised learning algorithms that group objects of similar kind into respective categories. TUM s StructuralClustering procedure Supervised Classification and Regression Algorithm several common (Q)SAR modelling algorithms (Linear Regression, Multiple Linear Regression, PLS, Gaussian Processes, Neural Networks) standard machine learning methods (SVMs, KNN, M5, J48, Naïve Bayes) in house algorithms from partners (lazar, ToxTree, BBRC, LastPM, LoMoGraph, MaxTox, isar, Three Conditional Density Estimators, MakeSCR) Applicability domain estimation algorithms is the chemical space adequately assessed

Linear QSAR models Simple linear model ypred is the model output, xi and wi the ith descriptor (out of a total of M) and corresponding coefficient (weight) respectively, and w0 an offset term Multiple linear Regression models Principal component regression models Partial Least squares regression Linear Discriminant analysis Naïve Bayes Aromatic amines mutagenicity model log TA100 = 0.92 Kow + 1.17 HOMO 1.18 LUMO + 7.35 log TA100 is the mutagenic potency (revertants/nmol) Kow hydrophobicity HOMO is the energy of the highest occupied molecular orbital LUMO is the energy of the lowest unoccupied molecular orbital.

Supervised Classification Approach Chemical Structures (represented by descriptors) Pattern recognition Pattern recognition methods: Classification and regression tree (CART) K-nearest neighbor (KNN) Artificial neural network (ANN) SIMCA Similar model results obtained using various pattern recognition methods Overall correct prediction rate is between 70-80% The nature of descriptors are more critical Active Inactive

CART Model Classification and Regression Tree (CART) > < Unique contribution to the overall prediction Descriptors are selected based on genetic algorithm clogp (hydrophobicity) Phenolic ring indicator Shadow-XY (length of structure) RPCS (relative positive charge surface area) PNSA (total charge weighted negative surface area CART uses a decision tree to display how chemical may be classified or predicted through a series of rules. These rules are expressed as if then..

Similarity Search Similarity Index 1.0000 Triazolam Cluster Identification 0.8721 Br 0.9365 Safety Data on the Chemical and Similar Substances Test Compound 0.9315

Unsupervised clustering Compound Classification and Selection Agglomerative methods start at the bottom and merge similar clusters (bottom-up) Divisive hierarchical clustering starts with all compounds in a single cluster and partitions the data (top-down)

Machine Learning Methods Genetic algorithms Different parameters and model solutions to given problems are encoded in a chromosome and subjected to iterative random variation, thus generating a population. Solutions provided by these chromosomes are evaluated by fitness function that assign high scores to desired results. Chromosomes yielding best intermediate solutions are subjected to mutation and crossover operation that correspond to random genetic mutations and gene recombination events. The resulting modified chromosomes represent the next generation and the process is continued until the obtained results meet a satisfactory convergence criterion

Toxicophore Searching Template Structure Query Hits Searching Match Database with multiple conformers

Estimating Toxic Potential Using toxicophore searching Reduce test molecule to all possible 2-10 atom fragments Compare molecular fragments to active and inactive fragments in the control toxicity database Fragments Exclusively Associated with Toxicity (~200, e.g.) Fragments NOT Exclusively Associated with Toxicity (~500,000, e.g.) Identity of Chemicals with Structural Alert Fragments Toxic Potency Number of Compounds / SA Presence and Identity of Modulators QSAR Properties Estimate Toxic Potential

Open Tox Model/Algorithm Selection

Open Tox Model/Algorithm Selection

Open Tox Model/Algorithm Selection

Assignment for next week Apply ToxPredict on your set of chemicals Focus on algorithms/models which use QSAR/SAR based predictions Compare QSAR predictions with Toxicophore based predictions Modify your compound in similar ways as last week Run QSAR predictions again Compare with Toxicophore based predictions

Validation False Positives Actual Negative Predicted Positive Predictive Value Predicted Positive Actual Positive Specificity Actual Negative Predicted Negative Sensitivity Actual Positive Predicted Positive