Data Analysis in the Life Sciences - The Fog of Data -

Size: px

Start display at page:

Download "Data Analysis in the Life Sciences - The Fog of Data -"

Cuthbert Wade
6 years ago
Views:

1 ALTAA Chair for Bioinformatics & Information Mining Data Analysis in the Life Sciences - The Fog of Data - Michael R. Berthold ALTAA-Chair for Bioinformatics & Information Mining Konstanz University, Germany Michael.Berthold@uni-konstanz.de [ some (the cool) pictures thanks to Thomas Exner ] verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today: Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #2 1

2 Ultimate Goal: Towards Understanding Biology Complete Understanding (with Model!) of Interactions within human body Gene and Protein functions Protein-protein and cell interactions Metabolic pathways Clear model to understand (and avoid) side effects Design of new drugs in silico possible Personalized drugs for specific geno- and phenotype. But until then 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #3 Discovering Drugs Desired Effect First define a worth($)-while disease to tackle - ften driven by existing ideas - owadays also life style drugs 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #4 2

3 Discovering Drugs Desired Effect Target Identification Find points to attack (usually proteins) by analyzing - Gene expression changes - Changes in proteins - Information about biological connections (metabolic pathways) - Biological intuition and expert knowledge 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #5 Discovering Drugs Desired Effect Target Identification Target Validation Validate that identified targets have desired effect - Knockout specific genes, eliminate proteins - analyze e.g. gene expression for desired effect - doesn t work? Try to find new target(s) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #6 3

4 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design Design an experiment that allows to quickly check if desired action happens - eeds to allow quick and easy testing of activity 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #7 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Test of hundreds of thousands of candidate molecules from a usually carefully designed (targeted library) set of candidates - Data is generally of rather bad quality (many false positives, unknown false negatives) Data Exploration 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #8 4

5 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding From HTS results, try to identify lead structures, i.e. interesting pieces of molecules that may be responsible for the observed effects. Structured Data Mining 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #9 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding Lead ptimization Prediction (Activity, Toxicity, ) ptimize active molecules: - Increase activity - Eliminate undesired side effects (toxicity!) - didn t work? Try to find new lead or new target 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #10 5

6 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding Lead ptimization Animal Testing 1 st phase Clinical trials 2 nd phase Clinical trials Check if it really works and has no nasty side effects. - didn t work? March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #11 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding Lead ptimization Animal Testing 1 st phase Clinical trials 2 nd phase Clinical trials ew Drug ~10-12 years later: $$$$ 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #12 6

7 Challenges Analysis of truly heterogeneous data Classical Data Analysis: production plants, shopping baskets, few, reasonably self-contained data sources Life Sciences: truly diverse data- and information sources complex data types (molecular structures, sequences, ) many different notions of similarity Data of rather different quality noisy, unreliable, faulty, outdated, Complicated Domain Usually: Goal of Analysis also understandable by on Specialist Life Sciences: Without continuous incorporation of expert feedback (biologist, biochemist) no chance for success. domain knowledge not explicitly known question often unclear / unknown at beginning need for true data exploration cycle incorporating the expert 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #13 verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today : Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #14 7

8 The Context of Today: High Throughput Screening Rapidly screen 100-thousand s of candidates. Problems ften thousands of actives Data extremely noisy (up to 50% false positives, unknown false negatives!) Positives almost always active for different reasons Separate, diverse clusters! Goal: Find subsets of similar active molecules (help user find clusters of actives) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #15 Example: Find similar cars: red Ferrari blue Porsche red Bobby car What is Similarity? First abstract representation (descriptor) = Speed Group 1: Ferrari & Porsche Second abstract representation (descriptor) = Color Group 1: Ferrari & Bobby car and many other notions of similarity can be used For molecular structures this is even more complicated! 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #16 8

What is Similarity? F F H Tropacocaine 1518-12246 Local Anesthetic 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R.

9 What is Similarity? F F H Tropacocaine Local Anesthetic 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #17 Types of Molecular Similarity Structural similarity: Same basic layout of overall graph or at least existence of a common subgraph Geometrical similarity: Roughly same shape in 3D, independent of exact atom matches Instead of simple shape, also other properties (surface charge ) can be compared Global properties: Molecular weight umber of hydrogen donors/acceptors And (mainly used) some of many heuristic approaches: Fingerprint based 3D grid comparisons (topomers ) see also: A. Bender, R. Glen: Molecular similarity: a key technique in molecular informatics, rg. Biomol. Chem., 2: , March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #18 9

Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #19 Similarity: Polarity

10 Similarity to Adrenaline: ther eurotransmitters Adrenalin (D) H H H H (C) H H H 2 (B) (E) H H H 2 H 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #19 Similarity: Polarity 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #20 10

Blood-Brain Barrier Adrenalin Amphetamin H H H H Ephedrin H H H 2 Dopamin MDMA H H H 2 H 08-March-07

11 Dissimilarity: Hydrophobic / Hydrophilic 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #21 Dissimilarity to Adrenaline: Crossing the Blood-Brain Barrier Adrenalin Amphetamin H H H H Ephedrin H H H 2 Dopamin MDMA H H H 2 H 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #22 11

12 Dissimilarity to Adrenaline: Crossing the Blood-Brain Barrier Adrenalin Amphetamin (Speed) H H H H Ephedrin H H H 2 Dopamin MDMA (Ecstasy) H H H 2 H 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #23 Different Similarities Comparing Molecular Structures depends on context: different properties molecules (drugs, proteins, ) often have more than one function different modes of action the same effect can be caused in various ways 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #24 12

13 Different ways to interact, e.g. drug candidate can block access to receptor site protein can lock into receptor site Example: Inhibitors of the Trypanothione Reductase Different Modes of Activity: Example 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #25 Different Similarities Comparing Molecular Structures depends on context: different properties molecules (drugs, proteins, ) often have more than one function different modes of action the same effect can be caused in various ways Problems: not a priori known which property / properties matter (biology poorly understood) not always only one relevant property (analysis needs to find patterns in various spaces in parallel ) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #26 13

14 Motivation: Parallel Universes Usually: Data given in a single feature space Mostly high-dimensional and numeric Definition of one, global similarity measure Here: Multiple representations for the data: Parallel Universes Different similarity measures nly subsets of data will form useful model in each universe Standard learning techniques not applicable 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #27 Similarities to other Similarity-Based Methods Feature Selection Global selection (entire data set) Global optimization goal (e.g. performance) Subgroup Mining Local selection (subset of data) Local groupings of features without semantic Feature/Cluster Weighting (e.g. Friedman et al) Individual feature weights for each cluster Interpretation difficult. Multi-Instance Learning multiple descriptions per object ne similarity function one or more descriptions must match Multi View Mining multiple descriptors per object independent hypotheses built on each view ensemble for prediction Parallel Universes: Several descriptions / distance metrics exist for objects Parts of Model operate on one (or some) universes only. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #28 14

15 Parallel Universes: HTS Example Molecular Data Analysis: Descriptors based on Fingerprints umerical Features derived from 3D shapes, surface charge distribution, etc. Comparisons of chemical graphs Different descriptors encode different structural information one of them show satisfactory prediction results alone ne approach: interactive clustering algorithm using eighborgrams [M. Berthold, B. Wiswedel, D. Patterson, Interactive Exploration of Fuzzy Clusters Using eighborgrams, Fuzzy Sets and Systems, 149(1):21-37, 2005] 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #29 Parallel Universes eighborgram Clustering a few more active compounds close by the remaining of the 100 closest neighbors best (fuzzy) cluster suggested by interactive clustering algorithm. one active compound in origin of eighborgram. closest (also active) compound: distance d 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #30 15

Parallel Universes Interactive Clustering molecules in first potential cluster first universe (VolSurf features) second universe (Unity Fingerprints) 08-March-07 DIS/Faculty Seminar, University of

Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #31 Parallel Universes Mining in parallel universes challenging: Helps finding the right description for only a subset of the data

16 Parallel Universes Interactive Clustering molecules in first potential cluster first universe (VolSurf features) second universe (Unity Fingerprints) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #31 Parallel Universes Mining in parallel universes challenging: Helps finding the right description for only a subset of the data (David Hand: find patterns describing anomalous local features ) verlap between universes possible and desirable Resulting models may only describe parts of data in each universe User feedback to guide analysis even more important utlook Promising concept to mine heterogeneous data with diverse notions of similarity (works well in bioinformatics) Integrates focus of analysis, allows for context switch (via interactive Expert query refinement) Presents additional context information about model (pieces). Work in Progress: Fuzzy CU-Means: unsupervised clustering in parallel universes) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #32 16

17 verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today: Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #33 Remember?: Types of Molecular Similarity Structural similarity: Same basic layout of overall graph or at least existence of a common subgraph Geometrical similarity: Roughly same shape in 3D, independent of exact atom matches Instead of simple shape, also other properties (surface charge ) can be compared Global properties: Molecular weight umber of hydrogen donors/acceptors And (mainly used) some of many heuristic approaches: Fingerprint based 3D grid comparisons (topomers ) see also: A. Bender, R. Glen: Molecular similarity: a key technique in molecular informatics, rg. Biomol. Chem., 2: , March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #34 17

18 Motivation for Molecular Fragment Mining Goal: Find (and describe!) structural groups of molecules that share activity. For few molecules, manual inspection is feasible. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #35 Goal: Find (and describe!) structural groups of molecules that share activity. For few molecules, manual inspection is feasible. For more molecules, automated methods are needed Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #36 18

19 Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #37 Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #38 19

synthesis result, ): Appear often in Positives: freq(high activity)>threshold Appear rarely in egatives: freq(low activity)<delta MoFa (and many variations: FSG, gspan, ): Based on Market Basket

20 Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #39 Molecular Fragment Miner (MoFa) Goal: Find Fragments that are discriminative for a class of interest (high activity, good synthesis result, ): Appear often in Positives: freq(high activity)>threshold Appear rarely in egatives: freq(low activity)<delta MoFa (and many variations: FSG, gspan, ): Based on Market Basket Analysis (Apriori or Eclat Algorithm) Grow Fragment-Candidates from scratch atom-by-atom nly report significant and unique fragments [Ch. Borgelt, M. Berthold: Mining Molecular Fragments: Finding Relevant Substructures of Molecules, Proc. IEEE Data Mining, 51-58, 2002] 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #40 20

21 Finding Frequent Molecular Fragments More complicated than itemset mining o clear order on items Eliminating duplicates is a problem (same fragment can be along different paths) Items can appear more than once (H--H) Graph topology crucial not just co-occurence in one molecule Different types of heuristics exist: MoFa: uses local order on atom/bond extensions Restricts extensions to recently added atoms MoSS: Uses search tree traversal to generate canonical form 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #41 #=6 #=6 #=4 #=6 C S #=6 #=4 #=4 #=4 S-C S- S= S= S-C-C S-C S-C S-C S- C S- S- S= C S= S= S= C S= Examples : (a) (b) (c) (d) (e) (f) C_ C _ S _ = = C _ C _ S= = _ C = C _ S= _ C _ C C S C S S= _ C S = 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #42 = = _ 21

22 Support of fragment A: Support Based Pruning supp(a) = Frequency of appearance in molecules Monotone conditions decline with size of fragment: fragment A is contained in fragment B supp(a) supp(b) If supp(node) in branch is below threshold then all child-nodes will also be below threshold. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #43 #=6 #=6 #=4 #=6 C S #=6 #=4 #=4 #=4 S-C S- S= S= #=3 S-C-C #=4 S-C #=4 S-C #=4 S-C #=2 S- #=2 S- Examples : (a) (b) (c) (d) (e) (f) C_ C _ S _ = = C _ C _ S= = _ C = C _ S= _ C _ C C S C S _ C S = 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #44 = = _ #=2 S= 22

23 Discriminative Fragments Just finding frequent fragments usually not interesting Find fragments that are frequent in one class of molecules and infrequent in the remainder of molecules Discriminative Fragments summarize shared properties. umber of actives and inactives (and the ratio) that contain fragment indicates relevance. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #45 Example: [CI HIV dataset ~45000 (~400 active) compounds, threshold=15%] % vs. 0.02% 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #46 23

A few more fragments 5.23% vs. 0.05% 4.92 vs. 0.07% 5.

Fog of Data" #47 Fragment lattices may still be big

24 A few more fragments 5.23% vs. 0.05% 4.92 vs. 0.07% 5.23% vs. 0.08% 9.85% vs. 0.07% 9.85% vs. 0.0% 10.15% vs. 0.04% 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #47 Fragment lattices may still be big 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #48 24

25 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #49 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Colchicines (antimitotic agents) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #50 25

26 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #51 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Anthracyclines (i.e., Adriamycin) induce apoptosis in tumor cells. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #52 26

27 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #53 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Podophyllotoxin derivatives inhibit Tubulin polymerization. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #54 27

28 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #55 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Camptothecins are Topoisomerase I inhibitors. Acridines are cytotoxic to tumor cells. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #56 28

29 verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today: Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #57 Conclusions Life Science offers lots of challenging data Focus is not exclusively on building good predictors Discovered knowledge needs to be meaningful Users want understandable pieces of knowledge ( Information Mining ). Integration of expert is crucial: Allow interaction / exploration Two examples discussed today Explorative tool for potential clusters allows to incorporate expert feedback easily Fragments understandable to chemist (Better than e.g. rules/decision trees on mystic attributes) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #58 29

Conclusions Results only describe one part of the overall picture Successful tools present many different views on the data Goal is not outstanding prediction but

interesting! which is a moving target 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R.

30 Conclusions Results only describe one part of the overall picture Successful tools present many different views on the data Goal is not outstanding prediction but confirmation of hypotheses, triggering of new ideas, and quite often simply: cleanup of data o clearly defined target of analysis no real questions asked find something interesting! which is a moving target 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #59 Thank You! Want to try for yourself? 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #60 30

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli