Development of K-step Yard sampling method and Apply to the ADME-T In Silico Screening. In Silico Data, Ltd.

Size: px

Start display at page:

Download "Development of K-step Yard sampling method and Apply to the ADME-T In Silico Screening. In Silico Data, Ltd."

Alexander Lambert
5 years ago
Views:

1 Development of K-step Yard sampling method and Apply to the ADME-T In ilico creening Kohtaro Yuta In ilico Data, Ltd.

2 1. Toxicity prediction and Pattern recognition (PR) 2. General features of data analysis by PR. 3. Building process to the features of KY-method *tep1 ;Yard sampling methods *tep2 ; K-step approach *tep3 ; Merge two approaches Yard sampling and K-step handling 4. Applicability statement of KY-method Classifying 7000 sample set of Ames test 5. ummary and conclusion

3 Approaches for toxicity screening Toxicity 1. Unexplained & complex mechanisms 2. High structural diversity CADD Multi-variate Pattern recognition Artificial intelligence Factorial analysis Black box Know-how Prediction

4 Problems of toxicity screening by pattern recognition Only a few methods can be applicable on toxicity screening Most of drug design methods can not be applied. Un-known mechanisms Inability of Hypothesis testing method Extremely high compounds diversity From methane to macrolide Large sample population Normal D.D. approach handle small samples

5 Basic concept of prediction by Pattern recognition Principle of Information Equivalence Compound Protein Genome tarting Black Box (Correlation) Information Equivalence Inevitability Explained by Pattern recognition Activity Toxicity ide-effects ADME Biodegradability Results

6 Basic concept of prediction by Pattern recognition Principle of Information Equivalence NH 2 Activity HO OH O F N N N N Cl Clofarabin: Antileukemia Initial Descriptors MW Atoms/Bonds HOMO/LUMO CP Others Black Box (Correlation) Information Equivalence Inevitability Feature election Final Descriptor et Toxicity ide-effects ADME Biodegradability

7 General features of data analysis by Pattern recognition techniques Linear / Non-linear and DA / Fitting

8 :Positive :Negative Discriminant function for perfect classification ample space :two cluster samples

9 :Positive :Negative ample space :highly overlapped space Discriminant function generated by various methods

10 :Positive :Negative ample space :highly overlapped space Discriminant function : Linear and non-linear

partitioning Classified by pattern space Feature selection Linear discriminant Pattern space classified

11 imple classification and scientific classification Pattern space impossible to be classified by linear discriminant N N N A N A A A N A N A N A N A A N N A N A A A A A A A N N A Neural network Recursive partitioning Classified by pattern space Feature selection Linear discriminant Pattern space classified by linear discriminant ADMEWORK N N N N N A A A N A N N A A N N A A A N N A A A A A Classified by scientific reasons

12 cience based approach Extrapolation outlier imple fitting and scientific fitting No science based approach U Interpolation Fitting by Neural Networks R=0.99>R=0.70 Feature selection Extrapolation ADMEWORK Y=a 1 x 1 +a 2 x 2 + +a n x n +const. Which is true outlier? Extrapolation Linear regression line

13 Non-linear approach Fit lines on existed sample space No-remake sample space cientific approach Linear approach Remake sample space trong feature selection is required Fit samples for individual end point

14 Building process to the features of K-step Yard sampling method tep1: Yard sampling methods

15 patial region on sample space Both side of sample space Pure and no-overlapping on this region

16 patial region on sample space Both side of sample space Pure and no-overlapping on this region Highly overlapped Two Discriminant function

17 Property of AP(All Positive)model :Positive :Negative All Positive samples were correctly classified No-classification on Negative samples

18 Property of AN(All Negative)model :Positive :Negative All Negative samples were correctly classified No-classificateion on Positive samples

19 Combination of AN and AP models Positive Zone Negative Zone Grey Zone High reliability High reliability Not to be classified AP AN

20 Linear and non-linear discriminant on AP and AN models Positive Zone Negative Zone Grey Zone AP AN High reliability High reliability Not to be classified

21 Relations between ample space & AN and AP models Positive Zone Negative Zone Grey Zone Determination Determination No conclusion Negative Positive AN model Positive Negative AP model

22 Class determination by AN and AP models ample Classification and prediction must be done by Combination of the results of AP and AN models. AP model AN model Results 1AP;POI,AN;POI POI 2AP;POI,AN;NEGA 3AP;NEGA,AN;POI 4AP;NEGA,AN;NEGA GREY GREY NEGA

23 Building steps to the features of K-step Yard sampling method tep2: K-step approach

24 Problems of Yard sampling methods 50% of samples 80% of samples 90% of samples The ratio of Grey zone:highly overlapped sample space

25 teps to the K-step methods Grey Zone 50% Expanded space of the former Grey zone Grey Zone 50% Expanded space of the former Grey zone Grey Zone 50%

26 K-step Yard sampling (KY)Method Improvement by repeated classification of Grey Zone samples Grey Zone (1) New Positive Zone Generate new sample pattern space Grey Zone (2) New Negative Zone

27 Relocation of Grey Zone samples on new sample space Grey Zone (1) K-step Yard sampling (KY)Method

28 Building steps to the features of K-step Yard sampling method tep3: Merge two approaches: Yard sampling and K-step handling

29 The way to perfect classification Partially realized by Yard sampling process Not determined class on Grey zone compounds Fixed up by K-step approach Perfect classification for all samples: any case, any time, any condition, others

30 K-step Yard sampling method Yard sampling process For perfect classification K-step repeated processes For no Grey zone

31 Applicability statement of K-step Yard sampling method Classifying 7000 sample set of Ames test

data analysis method The most difficult classification

32 Challenge for classification and prediction K-step Yard sampling method KY-method The most powerful and advanced data analysis method The most difficult classification problem 6,965 sample of Ames test samples were, Classified perfectly

33 Application test of K-step Yard sampling amples 1. Ames test data 2. ample population total :6,965 Mutagen; 2,932 Non-mutagen; 4,033 Result of KY-method 1. Number of steps : 23 steps ; 22 (2 models) + 1 (1 model) 2.Classification ratio : 100 % Used system ADMEWORK / ModelBuilder V Used parameters (Initial condition) Number of generated parameters : 838 Number of parameters for step 1 : 98 Confidency index (amples(6965) / Parameters(98)) : 71.1 > 4.0

34 Application test by various D.A. methods 1. Linear discriminant analysis with linear least-squares method Classification ratio : total; 73.50(6965), Mutagen;73.02(2932), Non mutagen;73.84(4033) Number of mis-classified : (1846), ( 791) (1055) Prediction ratio (L100 out) 72.58% deviance(0.92%) (L500 out) 73.32% deviance(0.18%) 2. VM (upport Vector Machine with Kernel) Classification ratio : total; 90.87(6965), Mutagen;86.83(2932) Non mutagen; 93.80(4033) Number of mis-classified : ( 636), ( 386) ( 250) Prediction ratio (L500 out) 80.99% deviance(9.88%) 3. AdaBoost Classification ratio : total; 77.24(6965), Mutagen;66.13(2932) Non-mutagen; 85.32(4033) Number of mis-classified : (1585) ( 993) ( 592) Prediction ratio (L500 out) 75.16% deviance(2.08%)

35 Classification results by AdaBoost ample distribution of 6,965 of 77.24% Mis-classified Correctryclassified Correctry-classified Mis-classified

36 K-step Yard sampling (KY)Method Total steps : 23 steps (2 models) + 1 step (1 model) ステップ ID(KY 法 ) tarting samples(total) Mutagen (Initial) Non-mutagen (Initial) Grey sample (Initial) Final samples Mutagen (Final) Non-mutagen (Final) Grey sample (Final) Determined samples(total) Determined samples(mut.) etermined samples(non-mu Grey ratio(%) (Grey/Total)

37 K-step Yard sampling (KY)Method (1 model)

38 K-step Yard sampling (KY)Method Classification results by 3 steps

39 ummary patial features of K-step Yard sampling Features 1. KY-method is a meta-method Used with various DA and Fitting methods. 2. Perfect classification is achieved in any condition Disadvantages 1. Relatively complex operation to generate discriminant functions 2. Need powerful computer power

40 ummary patial features of K-step Yard sampling Advantages 1. ample number free approach 2. ample distribution free approach 3. Perfect classification is achieved in any condition Disadvantages 1. Relatively complex operation to generate discriminant functions 2. Need powerful computer power

41 Kohtaro Yuta In ilico Data, Ltd.

Structure-Activity Modeling - QSAR. Uwe Koch

Structure-Activity Modeling - QSAR Uwe Koch QSAR Assumption: QSAR attempts to quantify the relationship between activity and molecular strcucture by correlating descriptors with properties Biological activity