Kernel-based Machine Learning for Virtual Screening

Similar documents
Introduction to Chemoinformatics and Drug Discovery

Machine Learning. Support Vector Machines. Manfred Huber

Kernel Methods for Virtual Screening. Matthias Rupp

Statistical concepts in QSAR.

Machine learning for ligand-based virtual screening and chemogenomics!

Kernel methods, kernel SVM and ridge regression

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Statistical learning theory, Support vector machines, and Bioinformatics

Structure-Activity Modeling - QSAR. Uwe Koch

Supervised Machine Learning: Learning SVMs and Deep Learning. Klaus-Robert Müller!!et al.!!

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Machine Learning Concepts in Chemoinformatics

Brief Introduction of Machine Learning Techniques for Content Analysis

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Applied Machine Learning Annalisa Marsico

Support Vector Machine (SVM) and Kernel Methods

Principal Component Analysis

Interaction Potentials in Molecules and Non-Local Information in Chemical Space

Kernels and the Kernel Trick. Machine Learning Fall 2017

Canonical Correlation Analysis with Kernels

CS534 Machine Learning - Spring Final Exam

Supplementary Materials for

Advanced Introduction to Machine Learning CMU-10715

9.2 Support Vector Machines 159

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Classifier Complexity and Support Vector Classifiers

CS 7140: Advanced Machine Learning

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Support Vector Machine (SVM) and Kernel Methods

Notes of Dr. Anil Mishra at 1

With the establishment of the endowed chair, the Beilstein-Institut supported Frank Schulz in his research into new strategies for the synthesis and

Lecture Notes on Support Vector Machine

Introduction to SVM and RVM

Support Vector Machine (SVM) and Kernel Methods

Linear & nonlinear classifiers

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Introduction to Chemoinformatics

Lecture 5: GPs and Streaming regression

Support Vector Machines

ECE521 week 3: 23/26 January 2017

Supervised Learning Coursework

Receptor Based Drug Design (1)

Support Vector Machines

STA414/2104 Statistical Methods for Machine Learning II

Support Vector Machine

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

CPSC 540: Machine Learning

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

STA 4273H: Sta-s-cal Machine Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Machine learning methods to infer drug-target interaction network

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Support Vector Machines

Fragment-based de novo Design

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

CS145: INTRODUCTION TO DATA MINING

Support Vector Machines

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Structure-Based Drug Discovery An Overview

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

Machine learning for crystal structure prediction

Statistical Methods for SVM

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

Support Vector Machines: Maximum Margin Classifiers

Drug Design 2. Oliver Kohlbacher. Winter 2009/ QSAR Part 4: Selected Chapters

Least Squares Regression

Machine Learning Linear Regression. Prof. Matteo Matteucci

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Review: Support vector machines. Machine learning techniques and image analysis

Support Vector Machine for Classification and Regression

Perceptron Revisited: Linear Separators. Support Vector Machines

GWAS V: Gaussian processes

Linear Models for Classification

c 4, < y 2, 1 0, otherwise,

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Linear & nonlinear classifiers

Support Vector Machines

Support Vector Machine

Least Squares Regression

Support Vector Machines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Linear Regression (continued)

SVMs: nonlinearity through kernels

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

CS798: Selected topics in Machine Learning

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

DD Advanced Machine Learning

Interactive Feature Selection with

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Machine learning for automated theorem proving: the story so far. Sean Holden

Jeff Howbert Introduction to Machine Learning Winter

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Transcription:

Kernel-based Machine Learning for Virtual Screening Dipl.-Inf. Matthias Rupp Beilstein Endowed Chair for Chemoinformatics Johann Wolfgang Goethe-University Frankfurt am Main, Germany 2008-04-11, Helmholtz Center, Munich

2 Outline Virtual screening Representation Methods Application Setting, definition, aspects Descriptors, graphs, shape, densities Gaussian process regression, novelty detection Virtual screening for PPARγ agonists

3 Virtual screening: Drug development Disease Target Screening Optimization Preclinical Clinical Phases I, II, III Market authorization Clinical Phase IV

4 Virtual screening: Drug development Disease Systematic testing of compounds for activity Target Biochemical assay High-throughput screening Screening Virtual screening Optimization Receptor-based versus ligand-based Preclinical Clinical Phases I, II, III Market authorization Clinical Phase IV COX-2 Celecoxib

5 Virtual screening: Ligand-based approach Input: Known ligands (training samples) Compound library (test samples) Output: Molecules with best predicted activity Particularities Small training sets (10 1 to 10 3 ) Large test sets (10 5 to 10 6 ) False positives worse than false negatives Only top predictions are of interest Available binding activity information varies Key questions How to represent (and compare) molecules? How to learn from the training data?

Representation: Descriptors Computable properties in vector form Most frequently used representation Comparison by metric, inner product or similarity coefficient 1-pentyl acetate Bonds in longest chain: 7 Rotatable bonds: 4 Negative partial charge surface fraction: 0.13 Hydrogen bond acceptors: 1... Figure courtesy Dr. Michael Schmuker M. Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches, in preparation, 2008. 6

Representation: Descriptors Computable properties in vector form Most frequently used representation Comparison by metric, inner product or similarity coefficient Alternatives: Structured data representations Graph models (structure graph) Surface models (molecular shape) Density models (spatial distribution)... M. Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches, in preparation, 2008. 7

Representation: ISOAK Iterative similarity optimal assignment graph kernel Iterative graph similarity V V matrix X of pairwise vertex similarities Two vertices are similar if their neighbours are similar Recursive definition; iterative computation X i,j = (1 α)k v (v i, v j 1 )+α max π v j v n(v i ) X v,π(v) k e ( {vi, v}, {v j, π(v)} ) Optimal assignment Find assignment ρ : V V such that V i=1 X i,ρ(i) is maximal M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity, Journal of Chemical Information and Modeling 47(6): 2280 2286, 2007. 8

Representation: ISOAK example ISOAK with α = 1 2, Dirac vertex kernel using element types and Dirac edge kernel using bond types. Overall similarity is 4.64/ 5 7 = 0.78. 10 2 X ij 1 2 3 4 5 6 7 1 98 50 00 00 00 00 50 2 50 98 11 34 16 17 89 3 00 11 96 14 68 78 13 4 00 34 14 91 13 20 38 5 00 24 67 17 81 77 20 Pairwise atom similarities Glycine Serine M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity, Journal of Chemical Information and Modeling 47(6): 2280 2286, 2007. 9

10 Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 6 4 2 0 2 4 6 x not linearly separable 2. Implicit computation of inner products 3. Rewrite linear algorithms using only inner products

11 Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 6 4 2 0 2 4 6 x x ( x, sin(x) ) 1.0 0.5 6 4 2 2 4 6 0.5 not linearly separable linearly separable 2. Implicit computation of inner products 3. Rewrite linear algorithms using only inner products

Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 2. Implicit computation of inner products kernel k : X X R, k(x, x ) = Φ(x), Φ(x ) Example: Quadratic kernel Φ : R n R n2, x (x i x j ) n i,j=1 k(x, x ) = Φ(x), Φ(x ) n n n = x i x j x i x j = x i x i x j x j = x, x 2 i,j=1 i=1 j=1 3. Rewrite linear algorithms using only inner products 12

13 Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 2. Implicit computation of inner products 3. Rewrite linear algorithms using only inner products Example: Centering in feature space H k (x, x ) = Φ(x) 1 n n Φ(x i ), Φ(x ) 1 n i=1 = Φ(x), Φ(x ) 1 n 1 n i=1 = k(x, x ) 1 n n Φ(x i ) i=1 n Φ(x i ), Φ(x ) i=1 n Φ(x), Φ(x i ) + 1 n 2 n k(x i, x ) 1 n i=1 n i,j=1 Φ(x i ), Φ(x j ) n k(x, x i ) + 1 n 2 i=1 n i,j=1 k(x i, x j )

14 Methods: Gaussian process regression Gaussian process as data model Generalization of multivariate normal distribution to functions Determined by mean and covariance Kernel matrix as covariance matrix Conditioning of prior on training data yields posterior distribution Variance as confidence estimates for predictions target 3 2 1 0-1 - 2-3 - 4-2 0 2 4 input target 3 2 1 0-1 - 2-3 + + + + + - 4-2 0 2 4 input

15 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic

16 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic

17 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic Non-linear variants recover underlying Riemannian manifolds

18 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic Non-linear variants recover underlying Riemannian manifolds

19 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic Non-linear variants recover underlying Riemannian manifolds Novelty detection via projection error

20 Application: Material and methods Target: PPARγ (peroxisome proliferator-activated receptor γ) Dataset: 144 published ligands with pk i values Screening library: Asinex Gold and Platinum (360 000 cpds.) Representation: Vectorial (CATS2D, MOE 2D, Ghose-Crippen fragments) ISOAK molecular graph kernel Method: Gaussian process regression Multiple kernel learning Leave-one-cluster-out cross-validation Fraction of actives (FA20 ) as success measure T. Schroeter, M. Rupp, K. Hansen, E. Proschak, K.-R. Müller, G. Schneider: Virtual screening for PPARγ ligands using ISOAK molecular graph kernel and Gaussian processes, 4th German Conference on Chemoinformatics, 2008.

Application: Results Top 30 of three best performing models 16 cherry-picked compounds with novel scaffolds PPARγ selective activator (EC 50 9.3 ± 0.3 µm), natural product related 3 dual PPARα/γ activators (µm range, two 10µM) 4 selective PPARα activators (µm range, one 10µM) 8 out of 16 compounds are active 4 out of 16 compounds with EC 50 10µM Results preliminary since testing is still on-going M. Rupp, T. Schroeter, R. Steri, E. Proschak, K. Hansen, O. Rau, M. Schubert- Zsilavecz, K.-R. Müller, G. Schneider, in preparation, 2008. 21

22 Summary Virtual screening as a machine learning problem Importance of molecular representation Virtual screening using only positive samples

23 Acknowledgements Prof. Dr. Gisbert Schneider and modlab team (molecular design laboratory, www.modlab.de) Prof. Dr. Klaus Robert-Müller, Timon Schroeter, Katja Hansen (TU Berlin and Fraunhofer FIRST) Prof. Dr. Manfred Schubert-Zsilavecz, Ramona Steri (University of Frankfurt) Beilstein-Institute for the advancement of chemical sciences FIRST (Frankfurt international research graduate school on translational biomedicine) Thank you for your attention