Characterizing Activity Landscapes Using an Information-Theoretic Approach

Size: px

Start display at page:

Download "Characterizing Activity Landscapes Using an Information-Theoretic Approach"

Julia Townsend
5 years ago
Views:

1 Characterizing Activity Landscapes Using an Information-Theoretic Approach Veer Shanmugasundaram & Gerry Maggiora Computer-Aided Drug Discovery Pharmacia Corporation, Kalamazoo, MI

biological activity Smooth Landscape - Flint Hills, Kansas Gentle rising hills of activity represent smooth landscape

2 What are Activity Landscapes? Activity landscapes are abstract surfaces drawn on chemistry space containing compounds where the height represents biological activity Smooth Landscape - Flint Hills, Kansas Gentle rising hills of activity represent smooth landscape where small structural changes produce gradual changes in activity Rough Landscape - Bryce Canyon, Utah Rough activity landscapes are characterized by cliffs where small changes in structure lead to large changes in activity

3 Why do we need to characterize them? Activity Smooth Landscape Descriptor 2 Descriptor 1 Rough Landscape What should be the minimum size of a representative dissimilarity subset of the corporate collection? Is it assay dependent? Develop stopping-rules to assess have we screened enough? Activity Comparing activity landscapes of different biological targets Descriptor 2 Descriptor 1

4 Shannon s Theory of Communication A B A B A B A B Perfect Mapping Noisy Mapping Equivocal Mapping Mixed Mapping Transmission of Messages

5 Shannon s Theory of Communication Receiver b 1 b 2 b 3 b 4 Sender a 1 a 2 a 3 N ab 2 2 N a a 4 N b a b N ab = N Probabilities or frequencies with which messages are sent How each sent message is received Probabilities or frequencies with which messages are received How each received message was sent

6 Shannon s Theory of Communication Receiver b 1 b 2 b 3 b 4 Sender a 1 a 2 a 3 p ab 2 2 p a p b ab a 4 p b p a ab p = ab 1 a b Shannon s entropy HA ( ) = palog 2 p a a HX ( ) = pxlog 2 p x x HB ( ) = pblog 2 pb HAB ( ) pablog2 p b = a b ab Sender s entropy Receiver s entropy Joint entropy

7 Structure - Activity Mapping Similarity in Activity Structural Similarity a 1 a 2 a 3 a 4 b 1 b 2 b 3 b 4 p ab 2 2 p a p b ab p b p a ab p = ab 1 a b Structural similarity - Tanimoto similarity (S ij ) or inter-compound distances in chemistry space Activity similarity can be defined such that compounds that have similar IC 5 or % inhibition values have a high similarity in activity

8 Structure - Activity Similarity Map Multiple pharmacophores or promiscuous compounds Similarity in Activity HIGH Smooth Landscapes Poor information content Rugged Landscapes LOW HIGH Structural Similarity

9 Structure - Activity Similarity Map Activity Rough Activity Landscape HIGH Rugged Regions in Similarity Map Descriptor 2 Similarity in Activity Descriptor 1 LOW Structural Similarity HIGH

10 Similarity in Activity Similarity in Activity Information theoretic measure HIGH HIGH LOW HIGH Structural Similarity LOW HIGH Structural Similarity Kullback-Leibler information theoretic measure could be used as a global index to characterize the topographic character of activity landscape and to compare the similarities between two different structure-activity maps

11 Kullback-Leibler Index D px ( ) p q = px ( )log qx ( ) ( ) x X log q = p p log = Kullback-Leiber index is always non-negative Index is zero, if and only if p=q Not a true distance - not symmetric and does not satisfy the triangle inequality

12 Biological Assay 1 Biological Assay 2 Biological Assay 3 18 cpds 1611 comparisons 465 ( S M 85 ) 582 cpds comparisons 6569 ( S M 85 ) 19 cpds comparisons 958 ( S M 85 ) 1 8 S M 2 15 S M S M S A 6 4 S A 4 3 S A

13 1 8 S M Biological Assay 1 3 S A S A S M

14 Biological Assay 1 12E E-2 9 6E-2 6 3E-2 3 E Similarity Map of an Idealized Rough Landscape Similarity Map of an Idealized Smooth Landscape Similarity Map of Assay S M 1 6 S A

15 Biological Assay 1 Biological Assay 2 Biological Assay 3 18 cpds 1611 comparisons 465 ( S M 85 ) 582 cpds comparisons 6569 ( S M 85 ) 19 cpds comparisons 958 ( S M 85 ) DISTANCES TO IDEALIZED LANDSCAPES ASSAY 1 ASSAY 2 ASSAY 3 SMOOTH ROUGH

16 Summary Activity landscapes tend to have smooth and rugged regions Kullback-Leibler information-theoretic index can be used to measure the similarity of a given activity landscape to smooth and rough landscapes If activity landscapes are like Bryce Canyon, we need to sample chemistry space more thoroughly to identify important peaks of activity

Natural Image Statistics and Neural Representations

Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory