Хемоінформатика. Докінг. Дизайн ліків Біоінформатика (3 курс) Лекція 4 (частина 1)
Формати файлів в хемоінформатиці Chemical information is usually provided as files or streams and many formats have been created, with varying degrees of documentation. file extension (usually 3 letters). This is widely used, but fragile as common suffixes such as ".mol" and ".dat" are used by many systems, including non-chemical ones. self-describing files where the format information is included in the file. Examples are CIF and CML.
Перетворення між форматами файлів OpenBabel and JOELib are freely available open source tools specifically designed for converting between file formats. Their chemical expert systems support a large atom type conversion tables. A number of tools intended for viewing and editing molecular structures are able to read in files in a number of formats and write them out in other formats. The tools JChemPaint (based on the Chemistry Development Kit), XDrawChem (based on OpenBabel), Chime, Jmol, Mol2mol and Discovery Studio fit into this category.
Мови для машинного вводу хімічних формул
Chemical Machine Languages Interestingly, chemistry has defined three simple languages for encoding chemical information. InChI, SMILES, CML Can generate these by hand or automatically InChIs and SMILES can represent molecules as a single string/character array. Useful as keys for databases and for search queries in Google. You can convert between SMILES and InChIs OpenBabel, OELib, JOELib CML is an XML format, and more verbose, but benefits from XML community tools
A CML Example
SMILES: Simplified Molecular Input Line Entry Specification Language for describing the structure of chemical molecules using ASCII strings. http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
SMIRKS http://www.daylight.com/dayhtml/doc/theory/theory.smirks.html
http://www.opensmiles.org/
InChI: International Chemical Identifier IUPAC and NIST Standard similar to SMILES Encodes structural information about compounds Based on open an standard and algorithms. http://wwmm.ch.cam.ac.uk/inchifaq/
InChI in Public Chemistry Databases US National Institute of Standards and Technology (NIST) - 150,000 structures NIH/NCBI/PubChem project - >3.2 million structures Thomson ISI - 2+ million structures US National Cancer Institute(NCI) Database - 23+ million structures US Environmental Protection Agency(EPA)-DSSToX Database - 1450 structures Kyoto Encyclopaedia of Genes and Genomes (KEGG) database - 9584 structures University of California at San Francisco ZINC - >3.3 million structures BRENDA enzyme information system (University of Cologne) - 36,000 structures Chemical Entities of Biological Interest (ChEBI) database of the European Bioinformatics Institute - 5000 structures University of California Carcinogenic Potency Project - 1447 structures Compendium of Pesticide Common Names - 1437 (2005-03-03) structures
Journals and Software Using InChI Journals Nature Chemical Biology. Beilstein Journal of Organic Chemistry Software ACD/Labs ACD/ChemSketch. ChemAxon Marvin. SciTegic Pipeline Pilot. CACTVS Chemoinformatics Toolkit by Xemistry, GmbH.
Chemistry Markup Language CML is an XML markup language for encoding chemical information. Developed by Peter Murray Rust, Henry Rzepa and others. Actually dates from the SGML days before XML More verbose than InChI and SMILES But inherits XML schema, namespaces, parsers, XPATH, language binding tools like XML Beans, etc. Not limited to structural information Has OpenBabel support. http://cml.sourceforge.net/, http://cml.sourceforge.net/wiki/index.php/main_page
Ресурси хемоінформатики
http://www.ebi.ac.uk/chebi/advancedsearchforward.do
http://www.ebi.ac.uk/chebi/advancedsearchforward.do
http://pubchem.ncbi.nlm.nih.gov/
http://www.ncbi.nlm.nih.gov/pccompound?tabcmd=limits
http://pubchem.ncbi.nlm.nih.gov/
http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?p=heat
http://www.emolecules.com/
http://www.smpdb.ca/
http://ctdbase.org/
http://zinc.docking.org/