Computational Approaches towards Life Detection by Mass Spectrometry

Similar documents
Workshop II: Optimization, Search and Graph-Theoretical Algorithms for Chemical Compound Space

Markus Meringer. Generation of Molecular Graphs and Applications in Astrobiology

Machine learning for ligand-based virtual screening and chemogenomics!

Kernels for small molecules

Welcome! Course 7: Concepts for LC-MS

RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank

Maximum Common Substructures of Organic Compounds Exhibiting Similar Infrared Spectra

2. Separate the ions based on their mass to charge (m/e) ratio. 3. Measure the relative abundance of the ions that are produced

MOLECULES IN SILICO: POTENTIAL VERSUS KNOWN ORGANIC COMPOUNDS

QSAR in Green Chemistry

Interpretation of Organic Spectra. Chem 4361/8361

Rapid Application Development using InforSense Open Workflow and Daylight Technologies Deliver Discovery Value

CHEMISTRY Topic #3: Using Spectroscopy to Identify Molecules: Radicals and Mass Spectrometry (MS) Spring 2018 Dr.

SDBS Integrated Spectral Database for Organic Compounds

Machine Learning (CS 567) Lecture 2

Data Mining in Chemometrics Sub-structures Learning Via Peak Combinations Searching in Mass Spectra

Molecules in Silico: The Generation of Structural Formulae and Its Applications

Machine Learning Alternatives to Manual Knowledge Acquisition

Lecture notes in EI-Mass spectrometry. By Torben Lund

Development and application of methodology for designer drugs

MS Interpretation-1: Introduction + Elemental Composition I

Computational Methods for Mass Spectrometry Proteomics

5. Carbon-13 NMR Symmetry: number of chemically different Carbons Chemical Shift: chemical environment of Carbons (e- rich or e- poor)

SUSPECT AND NON-TARGET SCREENING OF ORGANIC MICROPOLLUTANTS IN WASTEWATER THROUGH THE DEVELOPMENT OF A LC-HRMS BASED WORKFLOW

CHEM 241 UNIT 5: PART A DETERMINATION OF ORGANIC STRUCTURES BY SPECTROSCOPIC METHODS [MASS SPECTROMETRY]

Qualitative Analysis of Unknown Compounds

Mass Spectrometry (MS)

Choosing the metabolomics platform

NUCLEAR MAGNETIC RESONANCE AND INTRODUCTION TO MASS SPECTROMETRY

Mass Spectrometry - Background

Open Research Online The Open University s repository of research publications and other research outputs

Chapter 20. Mass Spectroscopy

Information Extraction from Chemical Images. Discovery Knowledge & Informatics April 24 th, Dr. Marc Zimmermann

Presentation given to computer science undergraduate students at the University of Houston July 2007

Fast similarity searching making the virtual real. Stephen Pickett, GSK

Propose a structure for an alcohol, C4H10O, that has the following

GAS CHROMATOGRAPHY MASS SPECTROMETRY. Pre-Lab Questions

Mechanisms of Ion Fragmentation (McLafferty Chapter 4) Business Items

Condensed Graph of Reaction: considering a chemical reaction as one single pseudo molecule

MOLGEN-CID - A Canonizer for Molecules and Graphs Accessible through the Internet

Chemical Reaction Databases Computer-Aided Synthesis Design Reaction Prediction Synthetic Feasibility

Introduction to Mass Spectrometry (MS)

Poster at CC 2017 Conferentia Chemometrica, 3 6 Sep 2017, Gyöngyös-Farkasmály, Hungary

An Integrated Approach to in-silico

BMC Bioinformatics. Open Access. Abstract. Background Metabolomics seeks to identify and quantify all metabolites. BioMed Central

The model of solar wind polytropic flow patterns

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

ROSETTA: ESA's Comet Orbiter and Lander Mission. Hermann Boehnhardt Max-Planck Institute for Solar System Research Katlenburg-Lindau, Germany

Mass Spectrometry. Ionizer Mass Analyzer Detector

Show all work. Remember: a point a minute! This exam is MUCH longer than the upcoming Final.

Mass spectrometry prosess

Mass spectroscopy ( Mass spec )

Applied Statistics. Multivariate Analysis - part II. Troels C. Petersen (NBI) Statistics is merely a quantization of common sense 1

Lecture Interp-3: The Molecular Ion (McLafferty & Turecek 1993, Chapter 3)

QSAR/QSPR modeling. Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of

What is the smallest particle of the element gold (Au) that can still be classified as gold? A. atom B. molecule C. neutron D.

ORGANIC - EGE 5E CH UV AND INFRARED MASS SPECTROMETRY

ORGANIC - BRUICE 8E CH MASS SPECT AND INFRARED SPECTROSCOPY

Mass Spectrometry. Introduction EI-MS and CI-MS Molecular mass & formulas Principles of fragmentation Fragmentation patterns Isotopic effects


A Physical, Chemical, Isotopic and Astrobiology Instrumentation Package

Navigation in Chemical Space Towards Biological Activity. Peter Ertl Novartis Institutes for BioMedical Research Basel, Switzerland

Gas chromatography/mass spectrometry (GC MS) Interpretation of EI spectra

Characterisation of allergens in cosmetics by GC GC TOF MS with Select-eV variable-energy electron ionisation

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Introduction. OntoChem

KNIME-based scoring functions in Muse 3.0. KNIME User Group Meeting 2013 Fabian Bös

ECE 661: Homework 10 Fall 2014

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Iterative Laplacian Score for Feature Selection

Development of a Structure Generator to Explore Target Areas on Chemical Space

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Life Science 1a Review Notes: Basic Topics in Chemistry

Computational mass spectrometry for small molecules

Global Scene Representations. Tilke Judd

19.1 Bonding and Molecules

DATA ANALYTICS IN NANOMATERIALS DISCOVERY

More information can be found in Chapter 12 in your textbook for CHEM 3750/ 3770 and on pages in your laboratory manual.

8.1 THE LANGUAGE OF MOTION

PAC-learning, VC Dimension and Margin-based Bounds

MIRA, SVM, k-nn. Lirong Xia

Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction

Solved and Unsolved Problems in Chemoinformatics

Chapter 8. Ions and the Noble Gas. Chapter Electron transfer leads to the formation of ionic compounds

Name: 1. The mass of a proton is approximately equal to the mass of (1) an alpha particle (2) a beta particle (3) a positron (4) a neutron

Chapter 5. Complexation of Tholins by 18-crown-6:

ORGANIC - CLUTCH CH ANALYTICAL TECHNIQUES: IR, NMR, MASS SPECT

An ion source performs the following two functions:

WADA Technical Document TD2003IDCR

Cerno Application Note Extending the Limits of Mass Spectrometry

College of Science (CSCI) CSCI EETF Assessment Year End Report, June, 2017

THE VIBRATIONAL SPECTRUM OF A POLYATOMIC MOLECULE (Revised 4/7/2004)

Data Structures and Algorithms

ORGANIC - CLUTCH CH ANALYTICAL TECHNIQUES: IR, NMR, MASS SPECT

Fragmentation trees reloaded

SIFT-MS. technology overview

Transcription:

Markus Meringer Computational Approaches towards Life Detection by Mass Spectrometry International Workshop on Life Detection Technology: For Mars, Encheladus and Beyond Earth-Life Science Institute, Tokyo Institute of Technology, Tokyo, Japan October 5-6, 2017

Outline The Past - DENDRAL - COSAC The Present - from mass to formula - from spectrum to structure The Future - from structure to life - from spectrum to life - from masses to life Conclusion The DENDRAL Project driven by Joshua Lederberg initiated in the mid 1960 s aim: identify unknown organic compounds by analyzing their mass spectra computationally perspective: processing of MS recorded on mars missions Slide 2 / 15

Detecting Life in Space: Types of Measurements and Missions Remote sensing In-situ measurements Sample return missions Mass spectrometry Slide 3 / 15

Processing of the COSAC Data COSAC measurements on ROSETTA mission NIST DB filter fit F. Goesmann et al, Organic compounds on comet 67P/Churyumov-Gerasimenko revealed by COSAC mass spectrometry, Sciene 349 (2015) J. H. Bredehöft et al, The organics on the nucleus of 67P/C-G and how they might have gotten there, XVIIIth International Conference on the Origin of Life (2017) Slide 4 / 15

Revisiting the COSAC Data use non-negative linear least squares (NNLS) fit unexplained part: 13.5 % of TIC instead of 14.4 % similar results Slide 5 / 15

From Mass to Formula Higher mass resolution enables better assignment of molecular formulas Which mass resolution is required to enable unambiguous assignment of molecular formulas?* Procedure: 1. Generate all molecular formulas up to a certain nominal mass (151) 2. Sort formulas by increasing exact masses: m 1 m 2 m 3 3. Compute for each mass m i the minimum distance to the neighbored masses: delta(m i ) = min(m i+1 -m i, m i -m i-1 ) 4. Plot m i /delta(m i ) vs. m i *question received from the Europa lander science definition team Slide 6 / 15

From Mass to Formula Settings: Element set: C, H, N, O, F, P, I, S, Si, Br, Cl Constraint: atom-valence rules (LEWIS and SENIOR check of the 7 Golden Rules ) 14138 molecular formulas T. Kind and O. Fiehn, Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry, BMC Bioinf. 8 (2007), 105. Slide 7 / 15

From Mass to Formula More settings: Other element sets: C, H, N, O, F, P, I, S, Si C, H, N, O, F, P, I C, H, N, O Additional heuristic filters: element ratios by Heuerding & Clerc, Kind & Fiehn FT-ICR S. Heuerding and J. T. Clerc, Simple tools for the computer-aided interpretation of mass spectra, Chemom. Intell. Lab. Syst. 20 (1993), 57{69. Slide 8 / 15

From Spectrum to Structure Established approach: use spectral data as molecular fingerprint for a database search FBI DB NIST DB O O Problem: only such data can be found that is stored in the database Slide 9 / 15 Meringer > Molecular Structure Generation > CITE > UFZ > June 22, 2011

Number of Structures 1 100 10000 1000000 100000000 The Database Chemical Space Gap Structures: elements C, H, N, O at least 1 C-atom NIST MS Library Beilstein Registry Molecular Graphs standard valences no charges, no radicals no stereoisomers only connected structures Plus lists of forbidden substructures for chemical spaces with and without hydrolysis 50 70 90 110 130 150 Molecular Mass Structure generator: MOLGEN A. Kerber, R. Laue, M. Meringer, C. Rücker: Molecules in Silico: Potential versus Known Organic Compounds. MATCH 54 (2), 301-312, 2005. Slide 10 / 15

Approach: From Structure to Life Use machine learning methods to classify structures of biotic and abiotic compounds, e.g. - linear discriminant analysis - neural networks, - support vector machines, - decision trees, Input variables: molecular descriptors, representing structural properties, e.g. - size (atom count, nominal weight, ), - complexity (e.g. information content), - degree of branching (e.g. walk counts), Target variable: - biotic compound (Y/N) L. R. Cronin, Towards the Universal Life Detection System, Astrobiology Science Conference 2015 Slide 11 / 15

Approach: From Spectra to Life Use machine learning methods to classify spectra of biotic and abiotic compounds, e.g. - linear discriminant analysis - neural networks, - support vector machines, - decision trees, Input variables: spectral descriptors, e.g. - ion series descriptors, - autocorrelation descriptors, - spectra shape descriptors, - logarithmic intensity ratios, Target variable: - biotic compound (Y/N) W. Werther et al, Classification of mass spectra, Chemom. Intell. Lab. Syst. 22, 63-76, 1994 Slide 12 / 15

Approach: From Masses to Life Pattern recognition via Kendrick plots H. J. Cleaves, C. Giri, Universal Mass Spectrometry-Based Life Detection, Planetary Science Vision 2050 Workshop 2017 Slide 13 / 15

Conclusions / Recommendations Find a good trade-off between compound characterization and life detection, provide - soft ionization mode to preserve the molecular ion - EI ionization mode the get a fragmentation spectrum - chromatography methods for compound separation Use simulations to define requirements (e.g. regarding mass resolution) For biotic/abiotic classification use - data mining - pattern recognition - machine learning Slide 14 / 15

Acknowledgements Jim Cleaves Caitanya Giri next talk: Tue Oct 10, 13:30 Generation of Molecules EON WS on Comput. Chem. MOLGEN Team former Mathematics II University of Bayreuth www.molgen.de THANKS FOR YOUR ATTENTION! Slide 15 / 15