Chemistry Informatics in Academic Laboratories: Lessons Learned Michael Hudock Center for Biophysics & Computational Biology University of Illinois at Urbana-Champaign
My Background Ph.D. candidate, Biophysics & Computational Biology, University of Illinois at Urbana- Champaign. Associate Research Scientist, Discovery Technologies group at Bristol-Myers Squibb prior to graduate school. Strong interest in the interface of computers and chemistry, graduate work in computational modeling with chemoinformatics.
Talk Outline Chemoinformatics System Our Basic Requirements Registration / Results / Reports / Research Build vs. Buy Infrastructure / Cost / Maintenance / Development Results & Lessons Learned Short-term impact / long-term impact Future Directions New, advanced SAR modules
Our Laboratory
Our Basic Requirements Registration ~50 assays Results Reports Research Y= c + a b + c d +
A Decision Point Commercial Solution "Out of the box" functionality Restrictive Infrastructure Requirements Expensive, Perhaps Recurring Costs Completely Customizable? Programming Expertise Testing & Deployment Data Backup Custom Solution Decision to develop a custom solution that would meet, at first, our most basic requirements, with capability to expand at a later date.
Client-Server Architecture Multiple client platforms supported All code resides on the server Data all stored in one location
Specific Implementation Modular architecture allows new components to be quickly and easily added.
Database Architecture
Compound Registration
Input Results
Structures & Data United Using ChemAxon Marvin Java Applet
Retrieve Data Easily
Real-Time Data Analysis New analysis tools can be added quickly in response to user requests
Finding Patterns in a Few Clicks === Stratified cross-validation === === Summary === Correctly Classified Instances 23 88.4615 % Incorrectly Classified Instances 3 11.5385 % Kappa statistic 0.7692 Mean absolute error 0.1839 Root mean squared error 0.3543 Relative absolute error 36.4346 % Root relative squared error 70.1844 % Total Number of Instances 26 Provide SAR tools to all users, help detect trends. === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.923 0.154 0.857 0.923 0.889 cluster1 0.846 0.077 0.917 0.846 0.88 cluster2 === Confusion Matrix === a b <-- classified as 12 1 a = cluster1 2 11 b = cluster2
Additional Modules Easily Added Additional modules added over time as needed
Initial Impact Initial Development: 1 FTE, 1 month Updates & New Code: 1 FTE, 3 days/month Intuitive interface, short end user training Pre- Chemoinformatics Chemoinformatics What is the structure of compound 700? 20 sec. 20 min. Correlate assay A with assay B 5 sec. 30 min. for compounds 65% similar to cpd 700 10 sec. 45 min. or instead, with assays B N 15 sec. 5 hours Will addition of CH 2 to 352 decrease activity? 10 sec. 25 min. Is assay A activity related to TPSA? 5 sec. 20 min. An informatics solution, commercial or custom, can have large positive impact on productivity - even for relatively small amounts of data.
Longer-Term Impact >1,000 unique compounds, >11,000 fittings Used daily by group members (~30) Data easily shared with entire group Trends now routinely identified publications Mindset: paper to electronic
How can I do this? Identify and implement basic requirements first, don t go overboard Programming typically requires functional understanding of databases and programming language such as PHP*. CS students, temporary help or computer savvy graduate students might be able help Use third-party components when appropriate, e.g. for plotting, displaying structures System can evolve over time, with sophisticated capabilities added with additional experience *Good Books: PHP and MySQL Web Development, Welling & Thompson, 2007. Web Database Applications with PHP & MySQL, Williams & Lane, 2004.
Acknowledgements CINF Division for the invitation to present National Institutes of Health Professor Eric Oldfield and members of the Oldfield Research Group, Department of Chemistry, University of Illinois at Urbana-Champaign Professor Eric Oldfield Yongcheng Song Yonghui Zhang Fenglin Yin Kilannin Krysiak Sujoy Mukherjee Dushyant Mukkamala Rong Cao Kyle Bergan