Pooling Experiments for High Throughput Screening in Drug Discovery

Size: px
Start display at page:

Download "Pooling Experiments for High Throughput Screening in Drug Discovery"

Transcription

1 Pooling Experiments for High Throughput Screening in Drug Discovery Jacqueline M. Hughes-Oliver Department of Statistics North Carolina State University Spring Research Conference, June

2 Outline Motivation What is a Pooling Experiment? + synergism & blocking, saves money, time, materials logistics are difficult, needs careful design & analysis Analysis of Pooling Experiments Issues Current Work Spring Research Conference, June

3 High Throughput Screening 500,000+ molecules available for screening < 5% are active (high potencies) Must find m diverse leads leads toxicity Phase I clinical trials etc. Search for Structure-Activity-Relationships, SARs Relate activity to chemical structure Often, n= #responses << p= #descriptors Assay 1: n = 1000 p = 1873 Assay 2: n 500, 000 p>1mil Testing done by liquid-handling robotic systems Spring Research Conference, June

4 State of the Art Test all molecules in training set Recursive partitioning (RP) Nonlinear, fragmented relationships Use hypothesis testing to split nodes Excellent for n<<p Needs large n, since Pr(active) is small Make predictions for untested molecules, then do ordered testing Accumulation curves: # actives found vs. # tests performed Spring Research Conference, June

5 State of the Art Test all molecules in training set Recursive partitioning (RP) Nonlinear, fragmented relationships Use hypothesis testing to split nodes Excellent for n<<p Needs large n, since Pr(active) is small Predict for untested molecules, then do ordered testing Accumulation curves: # actives found vs. # tests performed Can We Increase efficiency? Discover combination therapies in vitro? Spring Research Conference, June

6 Pooling Experiment Test molecules in mixtures, not individually Spring Research Conference, June

7 Pooling Experiment Test molecules in mixtures, not individually HTS Plate Spring Research Conference, June

8 Pooling Experiment Test molecules in mixtures, not individually Individual Compounds Pools Figure 1: One-way Pooling experiment where pooling is by column. Spring Research Conference, June

9 Pooling Experiment: Dorfman Assumptions Test molecules in mixtures, not individually p =Pr(active) k = pool size n =#pools X =#active pools X bin(n, θ) θ =1 (1 p) k p same for all molecules No errors in interpreting pooled responses all molecules in pool are inactive inactive pool 1+ molecule active active pool Pooling does not alter behavior of individuals No degeneration of activity No enhancement of activity Spring Research Conference, June

10 Pooling Experiment: Dorfman Assumptions Violated Test molecules in mixtures, not individually p =Pr(active) k = pool size n =#pools X =#active pools X bin(n, θ) θ =1 (1 p) k p same for all molecules SAR No errors in interpreting pooled responses all molecules inactive inactive pool Specificity+ 1+ molecule active active pool Sensitivity+ Pooling does not alter behavior of individuals No degeneration of activity Dilution? Blocking? No enhancement of activity Additivity? Synergism? Spring Research Conference, June

11 Active Compound Inactive Compound Blocker Compound Individual Compounds Pools Synergism occurs active Pool Blocking occurs active Pool Figure 2: One-way Pooling experiment where pooling is by column. Pool 1 illustrates synergism and Pool 8 illustrates blocking. Pools 4 and 11 show regular activity. Spring Research Conference, June

12 Assay 1 y =%inhibition relative to reference molecule n = 100 pools each of size k =10 Pooling by dissimilarity according to Burden Numbers avoid additivity conc. for pool =10 conc. for individual avoid dilution Control over design Active is y 60 Active pools: 4 of 100 (4%) Active molecules: 40 of 1000 (4%) Spring Research Conference, June

13 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] pools [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] [2,] [3,] [4,] * [5,] [6,] * [7,] *1 0 *1 [8,] [9,] [10,] pools Blocking Spring Research Conference, June

14 Pool along the rows, using activity thresholds 60 (individuals) and (pools) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] pools [1,] [2,] [3,] [4,] *1 [5,] [6,] [7,] [8,] [9,] [10,] Synergism Spring Research Conference, June

15 Pooling Experiment: Decoding Retesting? Dorfman: individually test all molecules in active pools Random: individually test some molecules in active pools and some molecules in inactive pools Can estimate synergism and blocking probabilities No retesting? Saves time and money Lose information Not a good idea Spring Research Conference, June

16 Analysis of Pooling Experiments Nonparametric Fully parametric Semiparametric Model as a missing data problem Yi et al. (2003, JSM) Chemical descriptors Atom pairs, BCUT numbers, Mol weight, etc. Spring Research Conference, June

17 Nonparametric: RP on Pools Pooled descriptors: binary Atom pair in pool? Need large number of pools for this to be effective Useful for determining preliminary covariate classes for (semi-) parametric models Excellent indicator of synergism Spring Research Conference, June

18 n= 140 u= 13 s= 29 ap= 7.28E-004 bp= 6.74E-001 N I NO x.1348 n= 52 u= 24 s= 38 ap= 3.16E-003 bp= 1.00E+000 N1 YES x.1348 n= 88 u= 7 s= 21 ap= 9.12E-005 bp= 7.43E-002 N2 I I NO x.1637 YES x.1637 NO x.1048 YES x.1048 n= 38 u= 14 s= 28 ap= 9.24E-004 bp= 2.45E-001 N11 n= 14 u= 49 s= 50 ap= 3.42E-004 bp= 6.46E-002 N12 n= 83 u= 4 s= 17 N21 n= 5 u= 40 s= 44 N22 I I I I NO x.106 YES x.106 NO x.1392 YES x.1392 n= 31 u= 8 s= 17 ap= 4.43E-003 bp= 7.85E-001 N111 n= 7 u= 44 s= 45 N112 n= 7 u= 88 s= 42 N121 n= 7 u= 9 s= 7 N122 I I I I NO x.583 n= 26 u= 4 s= 10 N1111 YES x.583 n= 5 u= 27 s= 34 N1112 I I min node size is 5; splits forced based on = 140 tests Spring Research Conference, June

19 Atom Pairs In Tree Individuals class active total the rest Synergism in class 2? Spring Research Conference, June

20 Number of Actives Found Random testing RP on pools, PT= Number of Tests RP on only 140 tests; need more data Testing order within a node? Spring Research Conference, June

21 Number of Actives Found Random testing RP on pools, PT=60 RP on pools, PT= Number of Tests RP on 390 tests when PT=13.14 Why PT=13.14? Spring Research Conference, June

22 Fully Parametric Model at the individual molecule level Trinomial (active, blocker, other), with class probabilities dependent on chemical features; see Zhu et al (2001) Binomial (active or not), conditioned on interactions in a pool Blocking probability same across all classes Synergism probability same across all classes Activity probabilities dependent on chemical features Scale-up to obtain model on pooled responses Predict activities of untested molecules Test molecules according to rank from predictions Spring Research Conference, June

23 Parametric: Conditional Binomial For i =1,...,n and l =1,...,L, s il = # molecules in pool i and covariate class l W il = # active molecules in pool i and class l Y i = I(pool i active) W il bin(s il,p l ), independent over i and l l s il = k b =Pr(Y i =0 l W il > 0), constant blocking g =Pr(Y i =1 l W il =0), constant synergism Can also model sensitivity and specificity in this manner Spring Research Conference, June

24 Dorfman: test all molecules in active pools. Then L(θ) = i φ y i i (1 ψ i) 1 y i, where ψ i =Pr(Y i =1)=(1 b)+(g + b 1) l (1 p l ) s il φ i = (1 b) l (s il w il p w il l )(1 p l f l ) s il w il l w il > 0 g l (1 p l) s il l w il =0 Spring Research Conference, June

25 Assay 1: Dorfman Experiment, PT=13.14 Class Observed Active Total Pr(active) Conditionally Binomial Pr(active) Pr(blocking).292 Pr(synergism).101 Spring Research Conference, June

26 Number of Actives Found Random testing RP on pools, PT=13.14 MLE, PT= Number of Tests Spring Research Conference, June

27 Issues Design of Pooling Experiments large Flawed designs are not as informative as small but carefully selected designs (additivity, dilution) Zhu et al. (2002), Remlinger et al. (2002) Dilution effect may be unavoidable. Model it. Can we truly disentangle Yi et al. (2002) (synergism,blocking), (additivity,dilution), (effect of activity threshold), (sensitivity,specificity)? Variable selection under parametric models Large dataset Spring Research Conference, June

28 Pooling experiments can be risky: pharmaceutical industry is cautious Pooling experiments can pay off in big ways: Reduce testing costs Shorten testing and development cycle Discover synergistic relationships Discover blocking relationships Spring Research Conference, June

29 Acknowledgements Katja Remlinger, NC State Bingming Yi, Merck Stan Young, NISS & CGStat Ke Zhang, NC State Lei Zhu, GlaxoSmithKline Spring Research Conference, June

30 Current Work Design Random retesting schemes Effect of pool threshold for activity Semi-parametric model, data missing at random Explore pairs/triplets of chemical descriptors, stochastic search Multiple trees Spring Research Conference, June

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing

Analysis of a Large Structure/Biological Activity. Data Set Using Recursive Partitioning and. Simulated Annealing Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning and Simulated Annealing Student: Ke Zhang MBMA Committee: Dr. Charles E. Smith (Chair) Dr. Jacqueline M. Hughes-Oliver

More information

Statistical Learning in Drug Discovery via Clustering and Mixtures

Statistical Learning in Drug Discovery via Clustering and Mixtures Statistical Learning in Drug Discovery via Clustering and Mixtures by Xu Wang A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor of Philosophy

More information

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS

EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS EMPIRICAL VS. RATIONAL METHODS OF DISCOVERING NEW DRUGS PETER GUND Pharmacopeia Inc., CN 5350 Princeton, NJ 08543, USA pgund@pharmacop.com Empirical and theoretical approaches to drug discovery have often

More information

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Lecture 9 Two-Sample Test. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Lecture 9 Two-Sample Test Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech Computer exam 1 18 Histogram 14 Frequency 9 5 0 75 83.33333333

More information

Data Mining in the Chemical Industry. Overview of presentation

Data Mining in the Chemical Industry. Overview of presentation Data Mining in the Chemical Industry Glenn J. Myatt, Ph.D. Partner, Myatt & Johnson, Inc. glenn.myatt@gmail.com verview of presentation verview of the chemical industry Example of the pharmaceutical industry

More information

Drug Combination Analysis

Drug Combination Analysis Drug Combination Analysis Gary D. Knott, Ph.D. Civilized Software, Inc. 12109 Heritage Park Circle Silver Spring MD 20906 USA Tel.: (301)-962-3711 email: csi@civilized.com URL: www.civilized.com abstract:

More information

A Sequential Approach for Identifying Lead Compounds in Large Chemical Databases

A Sequential Approach for Identifying Lead Compounds in Large Chemical Databases Statistical Science 2001, Vol. 16, No. 2, 154 168 A Sequential Approach for Identifying Lead Compounds in Large Chemical Databases Markus Abt, YongBin Lim, Jerome Sacks, Minge Xie and S. Stanley Young

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Electrical and Computer Engineering Department University of Waterloo Canada

Electrical and Computer Engineering Department University of Waterloo Canada Predicting a Biological Response of Molecules from Their Chemical Properties Using Diverse and Optimized Ensembles of Stochastic Gradient Boosting Machine By Tarek Abdunabi and Otman Basir Electrical and

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

the long tau-path for detecting monotone association in an unspecified subpopulation

the long tau-path for detecting monotone association in an unspecified subpopulation the long tau-path for detecting monotone association in an unspecified subpopulation Joe Verducci Current Challenges in Statistical Learning Workshop Banff International Research Station Tuesday, December

More information

Introduction to Chemoinformatics and Drug Discovery

Introduction to Chemoinformatics and Drug Discovery Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013 The Chemical Space There are atoms and space. Everything else is opinion. Democritus (ca.

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Early Stages of Drug Discovery in the Pharmaceutical Industry

Early Stages of Drug Discovery in the Pharmaceutical Industry Early Stages of Drug Discovery in the Pharmaceutical Industry Daniel Seeliger / Jan Kriegl, Discovery Research, Boehringer Ingelheim September 29, 2016 Historical Drug Discovery From Accidential Discovery

More information

DivCalc: A Utility for Diversity Analysis and Compound Sampling

DivCalc: A Utility for Diversity Analysis and Compound Sampling Molecules 2002, 7, 657-661 molecules ISSN 1420-3049 http://www.mdpi.org DivCalc: A Utility for Diversity Analysis and Compound Sampling Rajeev Gangal* SciNova Informatics, 161 Madhumanjiri Apartments,

More information

Interactive Feature Selection with

Interactive Feature Selection with Chapter 6 Interactive Feature Selection with TotalBoost g ν We saw in the experimental section that the generalization performance of the corrective and totally corrective boosting algorithms is comparable.

More information

FRAUNHOFER IME SCREENINGPORT

FRAUNHOFER IME SCREENINGPORT FRAUNHOFER IME SCREENINGPORT Design of screening projects General remarks Introduction Screening is done to identify new chemical substances against molecular mechanisms of a disease It is a question of

More information

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre Dr. Sander B. Nabuurs Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre The road to new drugs. How to find new hits? High Throughput

More information

Applications of Basu's TheorelTI. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University

Applications of Basu's TheorelTI. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University i Applications of Basu's TheorelTI by '. Dennis D. Boos and Jacqueline M. Hughes-Oliver I Department of Statistics, North Car-;'lina State University January 1997 Institute of Statistics ii-limeo Series

More information

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data

A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data A Multivariate Two-Sample Mean Test for Small Sample Size and Missing Data Yujun Wu, Marc G. Genton, 1 and Leonard A. Stefanski 2 Department of Biostatistics, School of Public Health, University of Medicine

More information

COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE

COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE NUE FEATURE T R A N S F O R M I N G C H A L L E N G E S I N T O M E D I C I N E Nuevolution Feature no. 1 October 2015 Technical Information COMBINATORIAL CHEMISTRY IN A HISTORICAL PERSPECTIVE A PROMISING

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

Priority Setting of Endocrine Disruptors Using QSARs

Priority Setting of Endocrine Disruptors Using QSARs Priority Setting of Endocrine Disruptors Using QSARs Weida Tong Manager of Computational Science Group, Logicon ROW Sciences, FDA s National Center for Toxicological Research (NCTR), U.S.A. Thanks for

More information

Statistical concepts in QSAR.

Statistical concepts in QSAR. Statistical concepts in QSAR. Computational chemistry represents molecular structures as a numerical models and simulates their behavior with the equations of quantum and classical physics. Available programs

More information

Using AutoDock for Virtual Screening

Using AutoDock for Virtual Screening Using AutoDock for Virtual Screening CUHK Croucher ASI Workshop 2011 Stefano Forli, PhD Prof. Arthur J. Olson, Ph.D Molecular Graphics Lab Screening and Virtual Screening The ultimate tool for identifying

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a

Retrieving hits through in silico screening and expert assessment M. N. Drwal a,b and R. Griffith a Retrieving hits through in silico screening and expert assessment M.. Drwal a,b and R. Griffith a a: School of Medical Sciences/Pharmacology, USW, Sydney, Australia b: Charité Berlin, Germany Abstract:

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Data Analysis in the Life Sciences - The Fog of Data -

Data Analysis in the Life Sciences - The Fog of Data - ALTAA Chair for Bioinformatics & Information Mining Data Analysis in the Life Sciences - The Fog of Data - Michael R. Berthold ALTAA-Chair for Bioinformatics & Information Mining Konstanz University, Germany

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Distance between multinomial and multivariate normal models

Distance between multinomial and multivariate normal models Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Zoe Blaxill Analytical Sciences Discovery Research. High Frequency Acoustic Technology: Evaluation for Compound Mixing and Dissolution in HTS.

Zoe Blaxill Analytical Sciences Discovery Research. High Frequency Acoustic Technology: Evaluation for Compound Mixing and Dissolution in HTS. Introduction Zoe Blaxill Analytical Sciences Discovery Research High Frequency Acoustic Technology: Evaluation for Compound Mixing and Dissolution in HTS. Introduction Current Issues Technology Overview

More information

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION PubH 745: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION Let Y be the Dependent Variable Y taking on values and, and: π Pr(Y) Y is said to have the Bernouilli distribution (Binomial with n ).

More information

Data Quality Issues That Can Impact Drug Discovery

Data Quality Issues That Can Impact Drug Discovery Data Quality Issues That Can Impact Drug Discovery Sean Ekins 1, Joe Olechno 2 Antony J. Williams 3 1 Collaborations in Chemistry, Fuquay Varina, NC. 2 Labcyte Inc, Sunnyvale, CA. 3 Royal Society of Chemistry,

More information

Extrapolating New Approaches into a Tiered Approach to Mixtures Risk Assessment

Extrapolating New Approaches into a Tiered Approach to Mixtures Risk Assessment Extrapolating New into a Tiered Approach to Mixtures Risk Assessment Michael L. Dourson, PhD, DABT, FATS, FSRA Toxicology Excellence for Risk Assessment (TERA) dourson@tera.org Conflict of Interest Statement

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Infinitely Imbalanced Logistic Regression

Infinitely Imbalanced Logistic Regression p. 1/1 Infinitely Imbalanced Logistic Regression Art B. Owen Journal of Machine Learning Research, April 2007 Presenter: Ivo D. Shterev p. 2/1 Outline Motivation Introduction Numerical Examples Notation

More information

Society for Biomolecular Screening 10th Annual Conference, Orlando, FL, September 11-15, 2004

Society for Biomolecular Screening 10th Annual Conference, Orlando, FL, September 11-15, 2004 Society for Biomolecular Screening 10th Annual Conference, Orlando, FL, September 11-15, 2004 Advanced Methods in Dose-Response Screening of Enzyme Inhibitors Petr uzmič, Ph.D. Bioin, Ltd. TOPICS: 1. Fitting

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

FRAGMENT SCREENING IN LEAD DISCOVERY BY WEAK AFFINITY CHROMATOGRAPHY (WAC )

FRAGMENT SCREENING IN LEAD DISCOVERY BY WEAK AFFINITY CHROMATOGRAPHY (WAC ) FRAGMENT SCREENING IN LEAD DISCOVERY BY WEAK AFFINITY CHROMATOGRAPHY (WAC ) SARomics Biostructures AB & Red Glead Discovery AB Medicon Village, Lund, Sweden Fragment-based lead discovery The basic idea:

More information

Lecture 13 and 14: Bayesian estimation theory

Lecture 13 and 14: Bayesian estimation theory 1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates

More information

The Conformation Search Problem

The Conformation Search Problem Jon Sutter Senior Manager Life Sciences R&D jms@accelrys.com Jiabo Li Senior Scientist Life Sciences R&D jli@accelrys.com CAESAR: Conformer Algorithm based on Energy Screening and Recursive Buildup The

More information

Kernel-based Machine Learning for Virtual Screening

Kernel-based Machine Learning for Virtual Screening Kernel-based Machine Learning for Virtual Screening Dipl.-Inf. Matthias Rupp Beilstein Endowed Chair for Chemoinformatics Johann Wolfgang Goethe-University Frankfurt am Main, Germany 2008-04-11, Helmholtz

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

More information

Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results

Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results Jing Zhang Miami University August 12, 2014 Jing Zhang (Miami University) Using Historical

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Tutorial: Sparse Signal Recovery

Tutorial: Sparse Signal Recovery Tutorial: Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan (Sparse) Signal recovery problem signal or population length N k important Φ x = y measurements or tests:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Isothermal Titration Calorimetry in Drug Discovery. Geoff Holdgate Structure & Biophysics, Discovery Sciences, AstraZeneca October 2017

Isothermal Titration Calorimetry in Drug Discovery. Geoff Holdgate Structure & Biophysics, Discovery Sciences, AstraZeneca October 2017 Isothermal Titration Calorimetry in Drug Discovery Geoff Holdgate Structure & Biophysics, Discovery Sciences, AstraZeneca October 217 Introduction Introduction to ITC Strengths / weaknesses & what is required

More information

Human or Cylon? Group testing on the Battlestar Galactica

Human or Cylon? Group testing on the Battlestar Galactica Human or Cylon? Group testing on the Statistics and The story so far Video Christopher R. Bilder Department of Statistics University of Nebraska-Lincoln chris@chrisbilder.com Slide 1 of 37 Slide 2 of 37

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Generalizing the MCPMod methodology beyond normal, independent data

Generalizing the MCPMod methodology beyond normal, independent data Generalizing the MCPMod methodology beyond normal, independent data José Pinheiro Joint work with Frank Bretz and Björn Bornkamp Novartis AG ASA NJ Chapter 35 th Annual Spring Symposium June 06, 2014 Outline

More information

CSC 411 Lecture 3: Decision Trees

CSC 411 Lecture 3: Decision Trees CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Generalizing the MCPMod methodology beyond normal, independent data

Generalizing the MCPMod methodology beyond normal, independent data Generalizing the MCPMod methodology beyond normal, independent data José Pinheiro Joint work with Frank Bretz and Björn Bornkamp Novartis AG Trends and Innovations in Clinical Trial Statistics Conference

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH The First Step: SAMPLE SIZE DETERMINATION THE ULTIMATE GOAL The most important, ultimate step of any of clinical research is to do draw inferences;

More information

An Integrated Approach to in-silico

An Integrated Approach to in-silico An Integrated Approach to in-silico Screening Joseph L. Durant Jr., Douglas. R. Henry, Maurizio Bronzetti, and David. A. Evans MDL Information Systems, Inc. 14600 Catalina St., San Leandro, CA 94577 Goals

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Simplifying Drug Discovery with JMP

Simplifying Drug Discovery with JMP Simplifying Drug Discovery with JMP John A. Wass, Ph.D. Quantum Cat Consultants, Lake Forest, IL Cele Abad-Zapatero, Ph.D. Adjunct Professor, Center for Pharmaceutical Biotechnology, University of Illinois

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses

Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses Dispensing Processes Profoundly Impact Biological, Computational and Statistical Analyses Sean Ekins 1, Joe Olechno 2 Antony J. Williams 3 1 Collaborations in Chemistry, Fuquay Varina, NC. 2 Labcyte Inc,

More information

Expectation-Maximization

Expectation-Maximization Expectation-Maximization Léon Bottou NEC Labs America COS 424 3/9/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression,

More information

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors

A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Censored Data Analysis for Performance Data V. Bram Lillard Institute for Defense Analyses

Censored Data Analysis for Performance Data V. Bram Lillard Institute for Defense Analyses Censored Data Analysis for Performance Data V. Bram Lillard Institute for Defense Analyses 4/20/2016-1 Power The Binomial Conundrum Testing for a binary metric requires large sample sizes Sample Size Requirements

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE

AMRI COMPOUND LIBRARY CONSORTIUM: A NOVEL WAY TO FILL YOUR DRUG PIPELINE AMRI COMPOUD LIBRARY COSORTIUM: A OVEL WAY TO FILL YOUR DRUG PIPELIE Muralikrishna Valluri, PhD & Douglas B. Kitchen, PhD Summary The creation of high-quality, innovative small molecule leads is a continual

More information

Classification and Prediction

Classification and Prediction Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model

More information

Machine-learning scoring functions for docking

Machine-learning scoring functions for docking Machine-learning scoring functions for docking Dr Pedro J Ballester MRC Methodology Research Fellow EMBL-EBI, Cambridge, United Kingdom EBI is an Outstation of the European Molecular Biology Laboratory.

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK

Chemoinformatics and information management. Peter Willett, University of Sheffield, UK Chemoinformatics and information management Peter Willett, University of Sheffield, UK verview What is chemoinformatics and why is it necessary Managing structural information Typical facilities in chemoinformatics

More information

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria SOLUTION TO FINAL EXAM Friday, April 12, 2013. From 9:00-12:00 (3 hours) INSTRUCTIONS:

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

RECENT TRENDS IN PHARMACEUTICAL CHEMISTRY FOR DRUG DISCOVERY

RECENT TRENDS IN PHARMACEUTICAL CHEMISTRY FOR DRUG DISCOVERY INTERNATIONAL JOURNAL OF RESEARCH IN PHARMACY AND CHEMISTRY Available online at www.ijrpc.com Review Article RECENT TRENDS IN PHARMACEUTICAL CHEMISTRY FOR DRUG DISCOVERY Sathyaraj A Department of Chemistry,

More information

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length

More information

Combinatorial Heterogeneous Catalysis

Combinatorial Heterogeneous Catalysis Combinatorial Heterogeneous Catalysis 650 μm by 650 μm, spaced 100 μm apart Identification of a new blue photoluminescent (PL) composite material, Gd 3 Ga 5 O 12 /SiO 2 Science 13 March 1998: Vol. 279

More information

ST4241 Design and Analysis of Clinical Trials Lecture 9: N. Lecture 9: Non-parametric procedures for CRBD

ST4241 Design and Analysis of Clinical Trials Lecture 9: N. Lecture 9: Non-parametric procedures for CRBD ST21 Design and Analysis of Clinical Trials Lecture 9: Non-parametric procedures for CRBD Department of Statistics & Applied Probability 8:00-10:00 am, Friday, September 9, 2016 Outline Nonparametric tests

More information

Introduction to Data Science Data Mining for Business Analytics

Introduction to Data Science Data Mining for Business Analytics Introduction to Data Science Data Mining for Business Analytics BRIAN D ALESSANDRO VP DATA SCIENCE, DSTILLERY ADJUNCT PROFESSOR, NYU FALL 2014 Fine Print: these slides are, and always will be a work in

More information

Molecular Descriptors Theory and tips for real-world applications

Molecular Descriptors Theory and tips for real-world applications Molecular Descriptors Theory and tips for real-world applications Francesca Grisoni University of Milano-Bicocca, Dept. of Earth and Environmental Sciences, Milan, Italy ETH Zurich, Dept. of Chemistry

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

Advanced Medicinal Chemistry SLIDES B

Advanced Medicinal Chemistry SLIDES B Advanced Medicinal Chemistry Filippo Minutolo CFU 3 (21 hours) SLIDES B Drug likeness - ADME two contradictory physico-chemical parameters to balance: 1) aqueous solubility 2) lipid membrane permeability

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Quality control analytical methods- Switch from HPLC to UPLC

Quality control analytical methods- Switch from HPLC to UPLC Quality control analytical methods- Switch from HPLC to UPLC Dr. Y. Padmavathi M.pharm,Ph.D. Outline of Talk Analytical techniques in QC Introduction to HPLC UPLC - Principles - Advantages of UPLC - Considerations

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

A3. Statistical Inference Hypothesis Testing for General Population Parameters

A3. Statistical Inference Hypothesis Testing for General Population Parameters Appendix / A3. Statistical Inference / General Parameters- A3. Statistical Inference Hypothesis Testing for General Population Parameters POPULATION H 0 : θ = θ 0 θ is a generic parameter of interest (e.g.,

More information

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction PostDoc Journal Vol. 2, No. 3, March 2014 Journal of Postdoctoral Research www.postdocjournal.com QSAR Study of Thiophene-Anthranilamides Based Factor Xa Direct Inhibitors Preetpal S. Sidhu Department

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information