Data Analysis in the Life Sciences - The Fog of Data -

Size: px
Start display at page:

Download "Data Analysis in the Life Sciences - The Fog of Data -"

Transcription

1 ALTAA Chair for Bioinformatics & Information Mining Data Analysis in the Life Sciences - The Fog of Data - Michael R. Berthold ALTAA-Chair for Bioinformatics & Information Mining Konstanz University, Germany Michael.Berthold@uni-konstanz.de [ some (the cool) pictures thanks to Thomas Exner ] verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today: Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #2 1

2 Ultimate Goal: Towards Understanding Biology Complete Understanding (with Model!) of Interactions within human body Gene and Protein functions Protein-protein and cell interactions Metabolic pathways Clear model to understand (and avoid) side effects Design of new drugs in silico possible Personalized drugs for specific geno- and phenotype. But until then 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #3 Discovering Drugs Desired Effect First define a worth($)-while disease to tackle - ften driven by existing ideas - owadays also life style drugs 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #4 2

3 Discovering Drugs Desired Effect Target Identification Find points to attack (usually proteins) by analyzing - Gene expression changes - Changes in proteins - Information about biological connections (metabolic pathways) - Biological intuition and expert knowledge 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #5 Discovering Drugs Desired Effect Target Identification Target Validation Validate that identified targets have desired effect - Knockout specific genes, eliminate proteins - analyze e.g. gene expression for desired effect - doesn t work? Try to find new target(s) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #6 3

4 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design Design an experiment that allows to quickly check if desired action happens - eeds to allow quick and easy testing of activity 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #7 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Test of hundreds of thousands of candidate molecules from a usually carefully designed (targeted library) set of candidates - Data is generally of rather bad quality (many false positives, unknown false negatives) Data Exploration 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #8 4

5 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding From HTS results, try to identify lead structures, i.e. interesting pieces of molecules that may be responsible for the observed effects. Structured Data Mining 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #9 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding Lead ptimization Prediction (Activity, Toxicity, ) ptimize active molecules: - Increase activity - Eliminate undesired side effects (toxicity!) - didn t work? Try to find new lead or new target 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #10 5

6 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding Lead ptimization Animal Testing 1 st phase Clinical trials 2 nd phase Clinical trials Check if it really works and has no nasty side effects. - didn t work? March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #11 Discovering Drugs Desired Effect Target Identification Target Validation Assay Design High Throughput Screening Lead Finding Lead ptimization Animal Testing 1 st phase Clinical trials 2 nd phase Clinical trials ew Drug ~10-12 years later: $$$$ 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #12 6

7 Challenges Analysis of truly heterogeneous data Classical Data Analysis: production plants, shopping baskets, few, reasonably self-contained data sources Life Sciences: truly diverse data- and information sources complex data types (molecular structures, sequences, ) many different notions of similarity Data of rather different quality noisy, unreliable, faulty, outdated, Complicated Domain Usually: Goal of Analysis also understandable by on Specialist Life Sciences: Without continuous incorporation of expert feedback (biologist, biochemist) no chance for success. domain knowledge not explicitly known question often unclear / unknown at beginning need for true data exploration cycle incorporating the expert 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #13 verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today : Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #14 7

8 The Context of Today: High Throughput Screening Rapidly screen 100-thousand s of candidates. Problems ften thousands of actives Data extremely noisy (up to 50% false positives, unknown false negatives!) Positives almost always active for different reasons Separate, diverse clusters! Goal: Find subsets of similar active molecules (help user find clusters of actives) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #15 Example: Find similar cars: red Ferrari blue Porsche red Bobby car What is Similarity? First abstract representation (descriptor) = Speed Group 1: Ferrari & Porsche Second abstract representation (descriptor) = Color Group 1: Ferrari & Bobby car and many other notions of similarity can be used For molecular structures this is even more complicated! 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #16 8

9 What is Similarity? F F H Tropacocaine Local Anesthetic 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #17 Types of Molecular Similarity Structural similarity: Same basic layout of overall graph or at least existence of a common subgraph Geometrical similarity: Roughly same shape in 3D, independent of exact atom matches Instead of simple shape, also other properties (surface charge ) can be compared Global properties: Molecular weight umber of hydrogen donors/acceptors And (mainly used) some of many heuristic approaches: Fingerprint based 3D grid comparisons (topomers ) see also: A. Bender, R. Glen: Molecular similarity: a key technique in molecular informatics, rg. Biomol. Chem., 2: , March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #18 9

10 Similarity to Adrenaline: ther eurotransmitters Adrenalin (D) H H H H (C) H H H 2 (B) (E) H H H 2 H 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #19 Similarity: Polarity 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #20 10

11 Dissimilarity: Hydrophobic / Hydrophilic 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #21 Dissimilarity to Adrenaline: Crossing the Blood-Brain Barrier Adrenalin Amphetamin H H H H Ephedrin H H H 2 Dopamin MDMA H H H 2 H 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #22 11

12 Dissimilarity to Adrenaline: Crossing the Blood-Brain Barrier Adrenalin Amphetamin (Speed) H H H H Ephedrin H H H 2 Dopamin MDMA (Ecstasy) H H H 2 H 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #23 Different Similarities Comparing Molecular Structures depends on context: different properties molecules (drugs, proteins, ) often have more than one function different modes of action the same effect can be caused in various ways 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #24 12

13 Different ways to interact, e.g. drug candidate can block access to receptor site protein can lock into receptor site Example: Inhibitors of the Trypanothione Reductase Different Modes of Activity: Example 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #25 Different Similarities Comparing Molecular Structures depends on context: different properties molecules (drugs, proteins, ) often have more than one function different modes of action the same effect can be caused in various ways Problems: not a priori known which property / properties matter (biology poorly understood) not always only one relevant property (analysis needs to find patterns in various spaces in parallel ) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #26 13

14 Motivation: Parallel Universes Usually: Data given in a single feature space Mostly high-dimensional and numeric Definition of one, global similarity measure Here: Multiple representations for the data: Parallel Universes Different similarity measures nly subsets of data will form useful model in each universe Standard learning techniques not applicable 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #27 Similarities to other Similarity-Based Methods Feature Selection Global selection (entire data set) Global optimization goal (e.g. performance) Subgroup Mining Local selection (subset of data) Local groupings of features without semantic Feature/Cluster Weighting (e.g. Friedman et al) Individual feature weights for each cluster Interpretation difficult. Multi-Instance Learning multiple descriptions per object ne similarity function one or more descriptions must match Multi View Mining multiple descriptors per object independent hypotheses built on each view ensemble for prediction Parallel Universes: Several descriptions / distance metrics exist for objects Parts of Model operate on one (or some) universes only. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #28 14

15 Parallel Universes: HTS Example Molecular Data Analysis: Descriptors based on Fingerprints umerical Features derived from 3D shapes, surface charge distribution, etc. Comparisons of chemical graphs Different descriptors encode different structural information one of them show satisfactory prediction results alone ne approach: interactive clustering algorithm using eighborgrams [M. Berthold, B. Wiswedel, D. Patterson, Interactive Exploration of Fuzzy Clusters Using eighborgrams, Fuzzy Sets and Systems, 149(1):21-37, 2005] 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #29 Parallel Universes eighborgram Clustering a few more active compounds close by the remaining of the 100 closest neighbors best (fuzzy) cluster suggested by interactive clustering algorithm. one active compound in origin of eighborgram. closest (also active) compound: distance d 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #30 15

16 Parallel Universes Interactive Clustering molecules in first potential cluster first universe (VolSurf features) second universe (Unity Fingerprints) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #31 Parallel Universes Mining in parallel universes challenging: Helps finding the right description for only a subset of the data (David Hand: find patterns describing anomalous local features ) verlap between universes possible and desirable Resulting models may only describe parts of data in each universe User feedback to guide analysis even more important utlook Promising concept to mine heterogeneous data with diverse notions of similarity (works well in bioinformatics) Integrates focus of analysis, allows for context switch (via interactive Expert query refinement) Presents additional context information about model (pieces). Work in Progress: Fuzzy CU-Means: unsupervised clustering in parallel universes) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #32 16

17 verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today: Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #33 Remember?: Types of Molecular Similarity Structural similarity: Same basic layout of overall graph or at least existence of a common subgraph Geometrical similarity: Roughly same shape in 3D, independent of exact atom matches Instead of simple shape, also other properties (surface charge ) can be compared Global properties: Molecular weight umber of hydrogen donors/acceptors And (mainly used) some of many heuristic approaches: Fingerprint based 3D grid comparisons (topomers ) see also: A. Bender, R. Glen: Molecular similarity: a key technique in molecular informatics, rg. Biomol. Chem., 2: , March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #34 17

18 Motivation for Molecular Fragment Mining Goal: Find (and describe!) structural groups of molecules that share activity. For few molecules, manual inspection is feasible. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #35 Goal: Find (and describe!) structural groups of molecules that share activity. For few molecules, manual inspection is feasible. For more molecules, automated methods are needed Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #36 18

19 Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #37 Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #38 19

20 Motivation 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #39 Molecular Fragment Miner (MoFa) Goal: Find Fragments that are discriminative for a class of interest (high activity, good synthesis result, ): Appear often in Positives: freq(high activity)>threshold Appear rarely in egatives: freq(low activity)<delta MoFa (and many variations: FSG, gspan, ): Based on Market Basket Analysis (Apriori or Eclat Algorithm) Grow Fragment-Candidates from scratch atom-by-atom nly report significant and unique fragments [Ch. Borgelt, M. Berthold: Mining Molecular Fragments: Finding Relevant Substructures of Molecules, Proc. IEEE Data Mining, 51-58, 2002] 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #40 20

21 Finding Frequent Molecular Fragments More complicated than itemset mining o clear order on items Eliminating duplicates is a problem (same fragment can be along different paths) Items can appear more than once (H--H) Graph topology crucial not just co-occurence in one molecule Different types of heuristics exist: MoFa: uses local order on atom/bond extensions Restricts extensions to recently added atoms MoSS: Uses search tree traversal to generate canonical form 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #41 #=6 #=6 #=4 #=6 C S #=6 #=4 #=4 #=4 S-C S- S= S= S-C-C S-C S-C S-C S- C S- S- S= C S= S= S= C S= Examples : (a) (b) (c) (d) (e) (f) C_ C _ S _ = = C _ C _ S= = _ C = C _ S= _ C _ C C S C S S= _ C S = 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #42 = = _ 21

22 Support of fragment A: Support Based Pruning supp(a) = Frequency of appearance in molecules Monotone conditions decline with size of fragment: fragment A is contained in fragment B supp(a) supp(b) If supp(node) in branch is below threshold then all child-nodes will also be below threshold. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #43 #=6 #=6 #=4 #=6 C S #=6 #=4 #=4 #=4 S-C S- S= S= #=3 S-C-C #=4 S-C #=4 S-C #=4 S-C #=2 S- #=2 S- Examples : (a) (b) (c) (d) (e) (f) C_ C _ S _ = = C _ C _ S= = _ C = C _ S= _ C _ C C S C S _ C S = 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #44 = = _ #=2 S= 22

23 Discriminative Fragments Just finding frequent fragments usually not interesting Find fragments that are frequent in one class of molecules and infrequent in the remainder of molecules Discriminative Fragments summarize shared properties. umber of actives and inactives (and the ratio) that contain fragment indicates relevance. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #45 Example: [CI HIV dataset ~45000 (~400 active) compounds, threshold=15%] % vs. 0.02% 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #46 23

24 A few more fragments 5.23% vs. 0.05% 4.92 vs. 0.07% 5.23% vs. 0.08% 9.85% vs. 0.07% 9.85% vs. 0.0% 10.15% vs. 0.04% 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #47 Fragment lattices may still be big 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #48 24

25 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #49 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Colchicines (antimitotic agents) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #50 25

26 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #51 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Anthracyclines (i.e., Adriamycin) induce apoptosis in tumor cells. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #52 26

27 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #53 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Podophyllotoxin derivatives inhibit Tubulin polymerization. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #54 27

28 Example 2: CI Cancer Screen (~35000 mol. with several active groups) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #55 Example 2: CI Cancer Screen (~35000 mol. with several active groups) Camptothecins are Topoisomerase I inhibitors. Acridines are cytotoxic to tumor cells. 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #56 28

29 verview Motivation / Background Data Analysis in the Life Sciences: Understanding Biology Main Focus of Today: Drug Discovery Visual Clustering of Screening Data Different otions of Similarities Visual Clustering Using eighborgrams Finding Interesting Molecular Fragments Mining Graphs The Space of Discriminative Molecular Fragments Conclusions 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #57 Conclusions Life Science offers lots of challenging data Focus is not exclusively on building good predictors Discovered knowledge needs to be meaningful Users want understandable pieces of knowledge ( Information Mining ). Integration of expert is crucial: Allow interaction / exploration Two examples discussed today Explorative tool for potential clusters allows to incorporate expert feedback easily Fragments understandable to chemist (Better than e.g. rules/decision trees on mystic attributes) 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #58 29

30 Conclusions Results only describe one part of the overall picture Successful tools present many different views on the data Goal is not outstanding prediction but confirmation of hypotheses, triggering of new ideas, and quite often simply: cleanup of data o clearly defined target of analysis no real questions asked find something interesting! which is a moving target 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #59 Thank You! Want to try for yourself? 08-March-07 DIS/Faculty Seminar, University of Bolzano - Michael R. Berthold: "Data Analysis in the Life Sciences - The Fog of Data" #60 30

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Mining Molecular Fragments: Finding Relevant Substructures of Molecules Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli

More information

Molecular Fragment Mining for Drug Discovery

Molecular Fragment Mining for Drug Discovery Molecular Fragment Mining for Drug Discovery Christian Borgelt 1, Michael R. Berthold 2, and David E. Patterson 2 1 School of Computer Science, tto-von-guericke-university of Magdeburg, Universitätsplatz

More information

Data Mining in the Chemical Industry. Overview of presentation

Data Mining in the Chemical Industry. Overview of presentation Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry

More information

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a Retrieving hits through in silico screening and expert assessment M.. Drwal a,b and R. Griffith a a: School of Medical Sciences/Pharmacology, USW, Sydney, Australia b: Charité Berlin, Germany Abstract:

More information

Introduction to Chemoinformatics and Drug Discovery

Introduction to Chemoinformatics and Drug Discovery Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.

More information

Computational chemical biology to address non-traditional drug targets. John Karanicolas

Computational chemical biology to address non-traditional drug targets. John Karanicolas Computational chemical biology to address non-traditional drug targets John Karanicolas Our computational toolbox Structure-based approaches Ligand-based approaches Detailed MD simulations 2D fingerprints

More information

Novel Methods for Graph Mining in Databases of Small Molecules. Andreas Maunz, Retreat Spitzingsee,

Novel Methods for Graph Mining in Databases of Small Molecules. Andreas Maunz, Retreat Spitzingsee, Novel Methods for Graph Mining in Databases of Small Molecules Andreas Maunz, andreas@maunz.de Retreat Spitzingsee, 05.-06.04.2011 Tradeoff Overview Data Mining i Exercise: Find patterns / motifs in large

More information

DISCOVERING new drugs is an expensive and challenging

DISCOVERING new drugs is an expensive and challenging 1036 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 17, NO. 8, AUGUST 2005 Frequent Substructure-Based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi, Nikil

More information

Chemical Space: Modeling Exploration & Understanding

Chemical Space: Modeling Exploration & Understanding verview Chemical Space: Modeling Exploration & Understanding Rajarshi Guha School of Informatics Indiana University 16 th August, 2006 utline verview 1 verview 2 3 CDK R utline verview 1 verview 2 3 CDK

More information

In silico pharmacology for drug discovery

In silico pharmacology for drug discovery In silico pharmacology for drug discovery In silico drug design In silico methods can contribute to drug targets identification through application of bionformatics tools. Currently, the application of

More information

Receptor Based Drug Design (1)

Receptor Based Drug Design (1) Induced Fit Model For more than 100 years, the behaviour of enzymes had been explained by the "lock-and-key" mechanism developed by pioneering German chemist Emil Fischer. Fischer thought that the chemicals

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013

Chemical Space. Space, Diversity, and Synthesis. Jeremy Henle, 4/23/2013 Chemical Space Space, Diversity, and Synthesis Jeremy Henle, 4/23/2013 Computational Modeling Chemical Space As a diversity construct Outline Quantifying Diversity Diversity Oriented Synthesis Wolf and

More information

Introduction. OntoChem

Introduction. OntoChem Introduction ntochem Providing drug discovery knowledge & small molecules... Supporting the task of medicinal chemistry Allows selecting best possible small molecule starting point From target to leads

More information

Early Stages of Drug Discovery in the Pharmaceutical Industry

Early Stages of Drug Discovery in the Pharmaceutical Industry Early Stages of Drug Discovery in the Pharmaceutical Industry Daniel Seeliger / Jan Kriegl, Discovery Research, Boehringer Ingelheim September 29, 2016 Historical Drug Discovery From Accidential Discovery

More information

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK Chemoinformatics and information management Peter Willett, University of Sheffield, UK verview What is chemoinformatics and why is it necessary Managing structural information Typical facilities in chemoinformatics

More information

Structural biology and drug design: An overview

Structural biology and drug design: An overview Structural biology and drug design: An overview livier Taboureau Assitant professor Chemoinformatics group-cbs-dtu otab@cbs.dtu.dk Drug discovery Drug and drug design A drug is a key molecule involved

More information

Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds A short version of this paper appears in IEEE ICDM 2003 Frequent ub-tructure-based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi and George Karypis University of Minnesota,

More information

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007

Computational Chemistry in Drug Design. Xavier Fradera Barcelona, 17/4/2007 Computational Chemistry in Drug Design Xavier Fradera Barcelona, 17/4/2007 verview Introduction and background Drug Design Cycle Computational methods Chemoinformatics Ligand Based Methods Structure Based

More information

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs

Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs Next Generation Computational Chemistry Tools to Predict Toxicity of CWAs William (Bill) Welsh welshwj@umdnj.edu Prospective Funding by DTRA/JSTO-CBD CBIS Conference 1 A State-wide, Regional and National

More information

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre Dr. Sander B. Nabuurs Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre The road to new drugs. How to find new hits? High Throughput

More information

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS PETER GUND Pharmacopeia Inc., CN 5350 Princeton, NJ 08543, USA pgund@pharmacop.com Empirical and theoretical approaches to drug discovery have often

More information

Using Self-Organizing maps to accelerate similarity search

Using Self-Organizing maps to accelerate similarity search YOU LOGO Using Self-Organizing maps to accelerate similarity search Fanny Bonachera, Gilles Marcou, Natalia Kireeva, Alexandre Varnek, Dragos Horvath Laboratoire d Infochimie, UM 7177. 1, rue Blaise Pascal,

More information

COMBINATORIAL CHEMISTRY: CURRENT APPROACH

COMBINATORIAL CHEMISTRY: CURRENT APPROACH COMBINATORIAL CHEMISTRY: CURRENT APPROACH Dwivedi A. 1, Sitoke A. 2, Joshi V. 3, Akhtar A.K. 4* and Chaturvedi M. 1, NRI Institute of Pharmaceutical Sciences, Bhopal, M.P.-India 2, SRM College of Pharmacy,

More information

Using AutoDock for Virtual Screening

Using AutoDock for Virtual Screening Using AutoDock for Virtual Screening CUHK Croucher ASI Workshop 2011 Stefano Forli, PhD Prof. Arthur J. Olson, Ph.D Molecular Graphics Lab Screening and Virtual Screening The ultimate tool for identifying

More information

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.

More information

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180 Course plan 201-201 Academic Year Qualification MSc on Bioinformatics for Health Sciences 1. Description of the subject Subject name: Code: 30180 Total credits: 5 Workload: 125 hours Year: 1st Term: 3

More information

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME Iván Solt Solutions for Cheminformatics Drug Discovery Strategies for known targets High-Throughput Screening (HTS) Cells

More information

Pooling Experiments for High Throughput Screening in Drug Discovery

Pooling Experiments for High Throughput Screening in Drug Discovery Pooling Experiments for High Throughput Screening in Drug Discovery Jacqueline M. Hughes-Oliver hughesol@stat.ncsu.edu Department of Statistics North Carolina State University Spring Research Conference,

More information

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS

Cheminformatics Role in Pharmaceutical Industry. Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS Cheminformatics Role in Pharmaceutical Industry Randal Chen Ph.D. Abbott Laboratories Aug. 23, 2004 ACS Agenda The big picture for pharmaceutical industry Current technological/scientific issues Types

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

FRAUNHOFER IME SCREENINGPORT

FRAUNHOFER IME SCREENINGPORT FRAUNHOFER IME SCREENINGPORT Design of screening projects General remarks Introduction Screening is done to identify new chemical substances against molecular mechanisms of a disease It is a question of

More information

Notes of Dr. Anil Mishra at 1

Notes of Dr. Anil Mishra at   1 Introduction Quantitative Structure-Activity Relationships QSPR Quantitative Structure-Property Relationships What is? is a mathematical relationship between a biological activity of a molecular system

More information

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning and Simulated Annealing Student: Ke Zhang MBMA Committee: Dr. Charles E. Smith (Chair) Dr. Jacqueline M. Hughes-Oliver

More information

Interactive Feature Selection with

Interactive Feature Selection with Chapter 6 Interactive Feature Selection with TotalBoost g ν We saw in the experimental section that the generalization performance of the corrective and totally corrective boosting algorithms is comparable.

More information

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi

More information

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Navigation in Chemical Space Towards Biological Activity Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland Data Explosion in Chemistry CAS 65 million molecules CCDC 600 000 structures

More information

Automated detection of structural alerts (chemical fragments) in (eco)toxicology

Automated detection of structural alerts (chemical fragments) in (eco)toxicology , http://dx.doi.org/10.5936/csbj.201302013 CSBJ Automated detection of structural alerts (chemical fragments) in (eco)toxicology Alban Lepailleur a,b, Guillaume Poezevara a,c, Ronan Bureau a,b,* Abstract:

More information

Reaxys Medicinal Chemistry Fact Sheet

Reaxys Medicinal Chemistry Fact Sheet R&D SOLUTIONS FOR PHARMA & LIFE SCIENCES Reaxys Medicinal Chemistry Fact Sheet Essential data for lead identification and optimization Reaxys Medicinal Chemistry empowers early discovery in drug development

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

Chemical Data Retrieval and Management

Chemical Data Retrieval and Management Chemical Data Retrieval and Management ChEMBL, ChEBI, and the Chemistry Development Kit Stephan A. Beisken What is EMBL-EBI? Part of the European Molecular Biology Laboratory International, non-profit

More information

Frequent Pattern Mining: Exercises

Frequent Pattern Mining: Exercises Frequent Pattern Mining: Exercises Christian Borgelt School of Computer Science tto-von-guericke-university of Magdeburg Universitätsplatz 2, 39106 Magdeburg, Germany christian@borgelt.net http://www.borgelt.net/

More information

mrna proteins DNA predicts certain possible future health issues Limited info concerning evolving health; advanced measurement technologies

mrna proteins DNA predicts certain possible future health issues Limited info concerning evolving health; advanced measurement technologies DA predicts certain possible future health issues mra Limited info concerning evolving health; advanced measurement technologies proteins Potentially comprehensive information concerning evolving health;

More information

COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE

COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE NUE FEATURE T R A N S F O R M I N G C H A L L E N G E S I N T O M E D I C I N E Nuevolution Feature no. 1 October 2015 Technical Information COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE A PROMISING

More information

Combinatorial Heterogeneous Catalysis

Combinatorial Heterogeneous Catalysis Combinatorial Heterogeneous Catalysis 650 μm by 650 μm, spaced 100 μm apart Identification of a new blue photoluminescent (PL) composite material, Gd 3 Ga 5 O 12 /SiO 2 Science 13 March 1998: Vol. 279

More information

arxiv: v1 [cs.ds] 25 Jan 2016

arxiv: v1 [cs.ds] 25 Jan 2016 A Novel Graph-based Approach for Determining Molecular Similarity Maritza Hernandez 1, Arman Zaribafiyan 1,2, Maliheh Aramon 1, and Mohammad Naghibi 3 1 1QB Information Technologies (1QBit), Vancouver,

More information

Simplifying Drug Discovery with JMP

Simplifying Drug Discovery with JMP Simplifying Drug Discovery with JMP John A. Wass, Ph.D. Quantum Cat Consultants, Lake Forest, IL Cele Abad-Zapatero, Ph.D. Adjunct Professor, Center for Pharmaceutical Biotechnology, University of Illinois

More information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics

More information

DivCalc: A Utility for Diversity Analysis and Compound Sampling

DivCalc: A Utility for Diversity Analysis and Compound Sampling Molecules 2002, 7, 657-661 molecules ISSN 1420-3049 http://www.mdpi.org DivCalc: A Utility for Diversity Analysis and Compound Sampling Rajeev Gangal* SciNova Informatics, 161 Madhumanjiri Apartments,

More information

Structure-Activity Modeling - QSAR. Uwe Koch

Structure-Activity Modeling - QSAR. Uwe Koch Structure-Activity Modeling - QSAR Uwe Koch QSAR Assumption: QSAR attempts to quantify the relationship between activity and molecular strcucture by correlating descriptors with properties Biological activity

More information

TRAINING REAXYS MEDICINAL CHEMISTRY

TRAINING REAXYS MEDICINAL CHEMISTRY TRAINING REAXYS MEDICINAL CHEMISTRY 1 SITUATION: DRUG DISCOVERY Knowledge survey Therapeutic target Known ligands Generate chemistry ideas Chemistry Check chemical feasibility ELN DBs In-house Analyze

More information

The Intersection of Chemistry and Biology: An Interview with Professor W. E. Moerner

The Intersection of Chemistry and Biology: An Interview with Professor W. E. Moerner The Intersection of Chemistry and Biology: An Interview with Professor W. E. Moerner Joseph Nicolls Stanford University Professor W.E Moerner earned two B.S. degrees, in Physics and Electrical Engineering,

More information

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic Cross Discipline Analysis made possible with Data Pipelining J.R. Tozer SciTegic System Genesis Pipelining tool created to automate data processing in cheminformatics Modular system built with generic

More information

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015

Ignasi Belda, PhD CEO. HPC Advisory Council Spain Conference 2015 Ignasi Belda, PhD CEO HPC Advisory Council Spain Conference 2015 Business lines Molecular Modeling Services We carry out computational chemistry projects using our selfdeveloped and third party technologies

More information

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery AtomNet A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery Izhar Wallach, Michael Dzamba, Abraham Heifets Victor Storchan, Institute for Computational and

More information

Drug Informatics for Chemical Genomics...

Drug Informatics for Chemical Genomics... Drug Informatics for Chemical Genomics... An Overview First Annual ChemGen IGERT Retreat Sept 2005 Drug Informatics for Chemical Genomics... p. Topics ChemGen Informatics The ChemMine Project Library Comparison

More information

networks in molecular biology Wolfgang Huber

networks in molecular biology Wolfgang Huber networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic

More information

Emerging patterns mining and automated detection of contrasting chemical features

Emerging patterns mining and automated detection of contrasting chemical features Emerging patterns mining and automated detection of contrasting chemical features Alban Lepailleur Centre d Etudes et de Recherche sur le Médicament de Normandie (CERMN) UNICAEN EA 4258 - FR CNRS 3038

More information

CS5112: Algorithms and Data Structures for Applications

CS5112: Algorithms and Data Structures for Applications CS5112: Algorithms and Data Structures for Applications Lecture 19: Association rules Ramin Zabih Some content from: Wikipedia/Google image search; Harrington; J. Leskovec, A. Rajaraman, J. Ullman: Mining

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Data Analytics Beyond OLAP. Prof. Yanlei Diao Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of

More information

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery

Building innovative drug discovery alliances. Just in KNIME: Successful Process Driven Drug Discovery Building innovative drug discovery alliances Just in KIME: Successful Process Driven Drug Discovery Berlin KIME Spring Summit, Feb 2016 Research Informatics @ Evotec Evotec s worldwide operations 2 Pharmaceuticals

More information

PIOTR GOLKIEWICZ LIFE SCIENCES SOLUTIONS CONSULTANT CENTRAL-EASTERN EUROPE

PIOTR GOLKIEWICZ LIFE SCIENCES SOLUTIONS CONSULTANT CENTRAL-EASTERN EUROPE PIOTR GOLKIEWICZ LIFE SCIENCES SOLUTIONS CONSULTANT CENTRAL-EASTERN EUROPE 1 SERVING THE LIFE SCIENCES SPACE ADDRESSING KEY CHALLENGES ACROSS THE R&D VALUE CHAIN Characterize targets & analyze disease

More information

Using Web Technologies for Integrative Drug Discovery

Using Web Technologies for Integrative Drug Discovery Using Web Technologies for Integrative Drug Discovery Qian Zhu 1 Sashikiran Challa 1 Yuying Sun 3 Michael S. Lajiness 2 David J. Wild 1 Ying Ding 3 1 School of Informatics and Computing, Indiana University,

More information

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression

QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression APPLICATION NOTE QSAR Modeling of ErbB1 Inhibitors Using Genetic Algorithm-Based Regression GAINING EFFICIENCY IN QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS ErbB1 kinase is the cell-surface receptor

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Systems biology Introduction to Bioinformatics Systems biology: modeling biological p Study of whole biological systems p Wholeness : Organization of dynamic interactions Different behaviour of the individual

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit: Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify

More information

Reductionist Paradox Are the laws of chemistry and physics sufficient for the discovery of new drugs?

Reductionist Paradox Are the laws of chemistry and physics sufficient for the discovery of new drugs? Quantitative Biology Colloquium, University of Arizona, March 1, 2011 Reductionist Paradox Are the laws of chemistry and physics sufficient for the discovery of new drugs? Gerry Maggiora, Ph.D. Adjunct

More information

Advanced Medicinal Chemistry SLIDES B

Advanced Medicinal Chemistry SLIDES B Advanced Medicinal Chemistry Filippo Minutolo CFU 3 (21 hours) SLIDES B Drug likeness - ADME two contradictory physico-chemical parameters to balance: 1) aqueous solubility 2) lipid membrane permeability

More information

Structural Bioinformatics (C3210) Molecular Docking

Structural Bioinformatics (C3210) Molecular Docking Structural Bioinformatics (C3210) Molecular Docking Molecular Recognition, Molecular Docking Molecular recognition is the ability of biomolecules to recognize other biomolecules and selectively interact

More information

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies

Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-aturwissenschaftlichen Fakultät der Rheinischen

More information

JCICS Major Research Areas

JCICS Major Research Areas JCICS Major Research Areas Chemical Information Text Searching Structure and Substructure Searching Databases Patents George W.A. Milne C571 Lecture Fall 2002 1 JCICS Major Research Areas Chemical Computation

More information

Similarity Search. Uwe Koch

Similarity Search. Uwe Koch Similarity Search Uwe Koch Similarity Search The similar property principle: strurally similar molecules tend to have similar properties. However, structure property discontinuities occur frequently. Relevance

More information

Frequent Itemset Mining

Frequent Itemset Mining ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team (PART I) IMAGINA 17/18 Webpage: http://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps

FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps Julie Dickerson 1, Zach Cox 1 and Andy Fulmer 2 1 Iowa State University and 2 Proctor & Gamble. FCModeler Goals Capture

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

An Integrated Approach to in-silico

An Integrated Approach to in-silico An Integrated Approach to in-silico Screening Joseph L. Durant Jr., Douglas. R. Henry, Maurizio Bronzetti, and David. A. Evans MDL Information Systems, Inc. 14600 Catalina St., San Leandro, CA 94577 Goals

More information

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012

bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012 bcl::cheminfo Suite Enables Machine Learning-Based Drug Discovery Using GPUs Edward W. Lowe, Jr. Nils Woetzel May 17, 2012 Outline Machine Learning Cheminformatics Framework QSPR logp QSAR mglur 5 CYP

More information

In Silico Investigation of Off-Target Effects

In Silico Investigation of Off-Target Effects PHARMA & LIFE SCIENCES WHITEPAPER In Silico Investigation of Off-Target Effects STREAMLINING IN SILICO PROFILING In silico techniques require exhaustive data and sophisticated, well-structured informatics

More information

Machine learning for ligand-based virtual screening and chemogenomics!

Machine learning for ligand-based virtual screening and chemogenomics! Machine learning for ligand-based virtual screening and chemogenomics! Jean-Philippe Vert Institut Curie - INSERM U900 - Mines ParisTech In silico discovery of molecular probes and drug-like compounds:

More information

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley

Integrative Biology 200A PRINCIPLES OF PHYLOGENETICS Spring 2012 University of California, Berkeley Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2012 University of California, Berkeley B.D. Mishler Feb. 7, 2012. Morphological data IV -- ontogeny & structure of plants The last frontier

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci Association Rules Information Retrieval and Data Mining Prof. Matteo Matteucci Learning Unsupervised Rules!?! 2 Market-Basket Transactions 3 Bread Peanuts Milk Fruit Jam Bread Jam Soda Chips Milk Fruit

More information

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C.

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring 2006 Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. Latombe Scribe: Neda Nategh How do you update the energy function during the

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

RECENT TRENDS IN PHARMACEUTICAL CHEMISTRY FOR DRUG DISCOVERY

RECENT TRENDS IN PHARMACEUTICAL CHEMISTRY FOR DRUG DISCOVERY INTERNATIONAL JOURNAL OF RESEARCH IN PHARMACY AND CHEMISTRY Available online at www.ijrpc.com Review Article RECENT TRENDS IN PHARMACEUTICAL CHEMISTRY FOR DRUG DISCOVERY Sathyaraj A Department of Chemistry,

More information

Developing CAS Products for Substructure Searching by Chemists. Linda Toler

Developing CAS Products for Substructure Searching by Chemists. Linda Toler Developing CAS Products for Substructure Searching by Chemists Linda Toler Developing CAS Products for Substructure Searching Evolution of the CAS Registry Development of substructure searching for CAS

More information

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer

LigandScout. Automated Structure-Based Pharmacophore Model Generation. Gerhard Wolber* and Thierry Langer LigandScout Automated Structure-Based Pharmacophore Model Generation Gerhard Wolber* and Thierry Langer * E-Mail: wolber@inteligand.com Pharmacophores from LigandScout Pharmacophores & the Protein Data

More information

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA

COMPARISON OF SIMILARITY METHOD TO IMPROVE RETRIEVAL PERFORMANCE FOR CHEMICAL DATA http://www.ftsm.ukm.my/apjitm Asia-Pacific Journal of Information Technology and Multimedia Jurnal Teknologi Maklumat dan Multimedia Asia-Pasifik Vol. 7 No. 1, June 2018: 91-98 e-issn: 2289-2192 COMPARISON

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Genetic. Maxim Thibodeau

Genetic. Maxim Thibodeau Maxim Thibodeau INDEX The meaning of words...3 The golden number...4 Gregor Mendel...5 Structure of DNA...6 The posology of the the recipe...7 Conclusion...10 II The meaning of words We live in a world,

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

Interesting Patterns. Jilles Vreeken. 15 May 2015

Interesting Patterns. Jilles Vreeken. 15 May 2015 Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the Day What is interestingness? what is a pattern? and how can we mine interesting patterns? What is a pattern? Data Pattern y = x - 1 What

More information

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction PostDoc Journal Vol. 2, No. 3, March 2014 Journal of Postdoctoral Research www.postdocjournal.com QSAR Study of Thiophene-Anthranilamides Based Factor Xa Direct Inhibitors Preetpal S. Sidhu Department

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an

More information