Kernel-based Machine Learning for Virtual Screening

Kernel-based Machine Learning for Virtual Screening Dipl.-Inf. Matthias Rupp Beilstein Endowed Chair for Chemoinformatics Johann Wolfgang Goethe-University Frankfurt am Main, Germany 2008-04-11, Helmholtz Center, Munich

2 Outline Virtual screening Representation Methods Application Setting, definition, aspects Descriptors, graphs, shape, densities Gaussian process regression, novelty detection Virtual screening for PPARγ agonists

3 Virtual screening: Drug development Disease Target Screening Optimization Preclinical Clinical Phases I, II, III Market authorization Clinical Phase IV

4 Virtual screening: Drug development Disease Systematic testing of compounds for activity Target Biochemical assay High-throughput screening Screening Virtual screening Optimization Receptor-based versus ligand-based Preclinical Clinical Phases I, II, III Market authorization Clinical Phase IV COX-2 Celecoxib

5 Virtual screening: Ligand-based approach Input: Known ligands (training samples) Compound library (test samples) Output: Molecules with best predicted activity Particularities Small training sets (10 1 to 10 3 ) Large test sets (10 5 to 10 6 ) False positives worse than false negatives Only top predictions are of interest Available binding activity information varies Key questions How to represent (and compare) molecules? How to learn from the training data?

Representation: Descriptors Computable properties in vector form Most frequently used representation Comparison by metric, inner product or similarity coefficient 1-pentyl acetate Bonds in longest chain: 7 Rotatable bonds: 4 Negative partial charge surface fraction: 0.13 Hydrogen bond acceptors: 1... Figure courtesy Dr. Michael Schmuker M. Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches, in preparation, 2008. 6

Representation: Descriptors Computable properties in vector form Most frequently used representation Comparison by metric, inner product or similarity coefficient Alternatives: Structured data representations Graph models (structure graph) Surface models (molecular shape) Density models (spatial distribution)... M. Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches, in preparation, 2008. 7

Representation: ISOAK Iterative similarity optimal assignment graph kernel Iterative graph similarity V V matrix X of pairwise vertex similarities Two vertices are similar if their neighbours are similar Recursive definition; iterative computation X i,j = (1 α)k v (v i, v j 1 )+α max π v j v n(v i ) X v,π(v) k e ( {vi, v}, {v j, π(v)} ) Optimal assignment Find assignment ρ : V V such that V i=1 X i,ρ(i) is maximal M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity, Journal of Chemical Information and Modeling 47(6): 2280 2286, 2007. 8

Representation: ISOAK example ISOAK with α = 1 2, Dirac vertex kernel using element types and Dirac edge kernel using bond types. Overall similarity is 4.64/ 5 7 = 0.78. 10 2 X ij 1 2 3 4 5 6 7 1 98 50 00 00 00 00 50 2 50 98 11 34 16 17 89 3 00 11 96 14 68 78 13 4 00 34 14 91 13 20 38 5 00 24 67 17 81 77 20 Pairwise atom similarities Glycine Serine M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular Similarity Based on Iterative Graph Similarity, Journal of Chemical Information and Modeling 47(6): 2280 2286, 2007. 9

10 Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 6 4 2 0 2 4 6 x not linearly separable 2. Implicit computation of inner products 3. Rewrite linear algorithms using only inner products

11 Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 6 4 2 0 2 4 6 x x ( x, sin(x) ) 1.0 0.5 6 4 2 2 4 6 0.5 not linearly separable linearly separable 2. Implicit computation of inner products 3. Rewrite linear algorithms using only inner products

Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 2. Implicit computation of inner products kernel k : X X R, k(x, x ) = Φ(x), Φ(x ) Example: Quadratic kernel Φ : R n R n2, x (x i x j ) n i,j=1 k(x, x ) = Φ(x), Φ(x ) n n n = x i x j x i x j = x i x i x j x j = x, x 2 i,j=1 i=1 j=1 3. Rewrite linear algorithms using only inner products 12

13 Methods: Kernel-based machine learning Linear algorithms and the kernel trick 1. Transformation into higher-dimensional space 2. Implicit computation of inner products 3. Rewrite linear algorithms using only inner products Example: Centering in feature space H k (x, x ) = Φ(x) 1 n n Φ(x i ), Φ(x ) 1 n i=1 = Φ(x), Φ(x ) 1 n 1 n i=1 = k(x, x ) 1 n n Φ(x i ) i=1 n Φ(x i ), Φ(x ) i=1 n Φ(x), Φ(x i ) + 1 n 2 n k(x i, x ) 1 n i=1 n i,j=1 Φ(x i ), Φ(x j ) n k(x, x i ) + 1 n 2 i=1 n i,j=1 k(x i, x j )

14 Methods: Gaussian process regression Gaussian process as data model Generalization of multivariate normal distribution to functions Determined by mean and covariance Kernel matrix as covariance matrix Conditioning of prior on training data yields posterior distribution Variance as confidence estimates for predictions target 3 2 1 0-1 - 2-3 - 4-2 0 2 4 input target 3 2 1 0-1 - 2-3 + + + + + - 4-2 0 2 4 input

15 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic

16 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic

17 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic Non-linear variants recover underlying Riemannian manifolds

18 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic Non-linear variants recover underlying Riemannian manifolds

19 Methods: Principle component analysis novelty detection Orthogonal directions of maximum variance Dimensionality reduction Descriptive statistic Non-linear variants recover underlying Riemannian manifolds Novelty detection via projection error

20 Application: Material and methods Target: PPARγ (peroxisome proliferator-activated receptor γ) Dataset: 144 published ligands with pk i values Screening library: Asinex Gold and Platinum (360 000 cpds.) Representation: Vectorial (CATS2D, MOE 2D, Ghose-Crippen fragments) ISOAK molecular graph kernel Method: Gaussian process regression Multiple kernel learning Leave-one-cluster-out cross-validation Fraction of actives (FA20 ) as success measure T. Schroeter, M. Rupp, K. Hansen, E. Proschak, K.-R. Müller, G. Schneider: Virtual screening for PPARγ ligands using ISOAK molecular graph kernel and Gaussian processes, 4th German Conference on Chemoinformatics, 2008.

Application: Results Top 30 of three best performing models 16 cherry-picked compounds with novel scaffolds PPARγ selective activator (EC 50 9.3 ± 0.3 µm), natural product related 3 dual PPARα/γ activators (µm range, two 10µM) 4 selective PPARα activators (µm range, one 10µM) 8 out of 16 compounds are active 4 out of 16 compounds with EC 50 10µM Results preliminary since testing is still on-going M. Rupp, T. Schroeter, R. Steri, E. Proschak, K. Hansen, O. Rau, M. Schubert- Zsilavecz, K.-R. Müller, G. Schneider, in preparation, 2008. 21

22 Summary Virtual screening as a machine learning problem Importance of molecular representation Virtual screening using only positive samples

23 Acknowledgements Prof. Dr. Gisbert Schneider and modlab team (molecular design laboratory, www.modlab.de) Prof. Dr. Klaus Robert-Müller, Timon Schroeter, Katja Hansen (TU Berlin and Fraunhofer FIRST) Prof. Dr. Manfred Schubert-Zsilavecz, Ramona Steri (University of Frankfurt) Beilstein-Institute for the advancement of chemical sciences FIRST (Frankfurt international research graduate school on translational biomedicine) Thank you for your attention