Mass decomposition with the Rdisop package

Similar documents
Package Rdisop. R topics documented: August 20, Title Decomposition of Isotopic Patterns Version Date

Welcome! Course 7: Concepts for LC-MS

Inferring Peptide Composition from Molecular Formulas

SIRIUS Documentation. Release 3.1. Kai Dührkop

Application Note LCMS-116 What are we eating? MetaboScape Software; Enabling the De-replication and Identification of Unknowns in Food Metabolomics

De Novo Metabolite Chemical Structure Determination. Paul R. West Ph.D. Stemina Biomarker Discovery, Inc.

Choosing the metabolomics platform

QTOF-based proteomics and metabolomics for the agro-food chain.

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Fragmentation trees reloaded

Identification and Characterization of an Isolated Impurity Fraction: Analysis of an Unknown Degradant Found in Quetiapine Fumarate

SIRIUS 3 Documentation

Mass spectroscopy ( Mass spec )

LC-MS Based Metabolomics

RMassBank: Automatic Recalibration and Processing of Tandem HR-MS Spectra for MassBank

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

MassHunter TOF/QTOF Users Meeting

Computational Methods for Mass Spectrometry Proteomics

Metabolomics in an Identity Crisis? Am I a Feature or a Compound? The world leader in serving science

Features or compounds? A data reduction strategy for untargeted metabolomics to generate meaningful data

LC-MS. Pre-processing (xcms) W4M Core Team. 22/09/2015 v 1.0.0

An Effective Workflow for Impurity Analysis Incorporating High Quality HRAM LCMS & MSMS with Intelligent Automated Data Mining

sample was a solution that was evaporated in the spectrometer (such as with ESI-MS) ions such as H +, Na +, K +, or NH 4

MS Interpretation I. Identification of the Molecular Ion

Making Sense of Differences in LCMS Data: Integrated Tools

Characterization of petroleum samples via thermal analysis coupled to APCI FTMS

SUSPECT AND NON-TARGET SCREENING OF ORGANIC MICROPOLLUTANTS IN WASTEWATER THROUGH THE DEVELOPMENT OF A LC-HRMS BASED WORKFLOW

Ionisation energies provide evidence for the arrangement of electrons in atoms. 1s 2... (1) (2)

MassHunter Software Overview

Impurities Sources, Detection and Measurement HPLC-MS Analyses. Dr Courtney Milner Chemical Analysis

Agilent TOF Screening & Impurity Profiling Julie Cichelli, PhD LC/MS Small Molecule Workshop Dec 6, 2012

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

MRMS axelerate rapidly detected micropollutants and plant response metabolites in poplar leaves

Topic 2.11 ANALYTICAL TECHNIQUES. High Resolution Mass Spectrometry Infra-red Spectroscopy

Mass Spectrometry (MS)

Introduction to Mass Spectrometry (MS)

Metabolic Phenotyping Using Atmospheric Pressure Gas Chromatography-MS

Interpretation of Organic Spectra. Chem 4361/8361

What s New in NIST11 (April 3, 2011)

Building 3D models of proteins

Mass spectrometry prosess

ChemSpider Main Menu. Select Advanced Search

Cerno Application Note Extending the Limits of Mass Spectrometry

Accurate Mass Measurements: Identifying Known Unknowns

(2) 1s 2... (1) Identify the block in the Periodic Table to which magnesium belongs.... (1)

Application Note. Authors. Abstract. Introduction. Environmental

Identifying Disinfection Byproducts in Treated Water

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

A Fast and Simple Algorithm for the Money Changing Problem 1

Quantitation of High Resolution MS Data Using UNIFI: Acquiring and Processing Full Scan or Tof-MRM (Targeted HRMS) Datasets for Quantitative Assays

Computational mass spectrometry for small molecules

Compounding insights Thermo Scientific Compound Discoverer Software

A Platform to Identify Endogenous Metabolites Using a Novel High Performance Orbitrap MS and the mzcloud Library

Cerno Application Note Extending the Limits of Mass Spectrometry

G (2013) A 4 (5). ISSN

Package InterpretMSSpectrum

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Techniques for Structure Elucidation of Unknowns: Finding Substitute Active Pharmaceutical Ingredients in Counterfeit Medicines

Modeling Mass Spectrometry-Based Protein Analysis

Propose a structure for an alcohol, C4H10O, that has the following

Analytical Technologies and Compound Identification. Daniel L. Norwood, MSPH, PhD SCĪO Analytical Consulting, LLC.

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Chemical Space: Modeling Exploration & Understanding

CHEM1101 Worksheet 6: Lewis Structures

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis

Use these data to calculate the empirical formula of the unknown acid. Show your working.

Application Note. Edgar Naegele. Abstract

Mass Spectrometry (MS)

Rapid and Accurate Forensics Analysis using High Resolution All Ions MS/MS

FUTURE CONFIRMATORY CRITERIA

State the number of protons and the number of neutrons in an atom of the isotope 85 Rb

Virtual Libraries and Virtual Screening in Drug Discovery Processes using KNIME

Mass Spectrometry. Electron Ionization and Chemical Ionization

Name AP CHEM / / Chapter 3 Outline Stoichiometry

Development of a LC-HRMS workflow for the target, suspect and non-target screening of contaminants of emerging concern in environmental water samples

Lecture Interp-3: The Molecular Ion (McLafferty & Turecek 1993, Chapter 3)

Proteomics. November 13, 2007

Tools for Structure Elucidation

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Metabolomics Batch Data Analysis Workflow to Characterize Differential Metabolites in Bacteria

MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples p.1

OCR AS LEVEL CHEMISTRY A ATOMS ELECTRON STRUCTURE PERIODICITY (a) (d) TEST

Exemplar for Internal Achievement Standard. Chemistry Level 3

Introduction. Authors

Structural Determination Of Compounds

The Pitfalls of Peaklist Generation Software Performance on Database Searches

5. Carbon-13 NMR Symmetry: number of chemically different Carbons Chemical Shift: chemical environment of Carbons (e- rich or e- poor)

Introduction to Chemoinformatics and Drug Discovery

Chemistry Chapter 3. Stoichiometry. (three sections for this chapter)

A Strategy for an Unknown Screening Approach on Environmental Samples Using HRAM Mass Spectrometry

Analysis of Pharmaceuticals and Personal Care Products in River Water Samples by UHPLC-TOF

Using information from the Periodic Table, complete the electron configuration of tellurium.

A Strategy for an Unknown Screening Approach on Environmental Samples using HRAM Mass Spectrometry

Mass spectrometry and elemental analysis

Give the full electron configuration of an Al atom and of a Cr 3+ ion. Al atom... Cr 3+ ion... (2)

CDK & Mass Spectrometry

PAPER No.12 :Organic Spectroscopy MODULE No.29: Combined problem on UV, IR, 1 H NMR, 13 C NMR and Mass - Part I

mzmatch Excel Template Tutorial

An ion source performs the following two functions:

Nuclear Magnetic Resonance Spectroscopy (NMR)

Transcription:

Mass decomposition with the Rdisop package Steffen Neumann, Anton Pervukhin, Sebastian Böcker April 30, 2018 Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, sneumann@ipb-halle.de Bioinformatics, Friedrich-Schiller-University Jena, {apervukh boecker}@minet.uni-jena.de Contents 1 Introduction 1 2 Decomposing isotope patterns 2 2.1 Chemical background....................... 2 2.2 Identification schema....................... 2 3 Working with molecules and isotope peaklists 3 3.1 Handling of Molecules...................... 3 3.2 decomposemass and decomposeisotopes............. 4 3.3 Interaction with other BioConductor packages......... 7 1 Introduction The BioConductor Rdisop package is designed to determine the sum formula of metabolites solely from their exact mass and isotope pattern as obtained from high resolution mass spectrometry measurements. Algorithms are described in Böcker et al. (2008, 2009, 2006); Böcker and Lipták (2007). It is designed with compatibility to the Bioconductor packages XCMS, MassSpecWavelet and rpubchem in mind. 1

2 Decomposing isotope patterns After preprocessing, the output of a mass spectrometer is a list of peaks which corresponds to the masses of the sample molecules and their abundance, i.e., the amount of sample compounds with a certain mass. In fact, sum formulas of small molecules can be identified using only accurate output masses. However, even with very high mass accuracy (< 1 ppm), many chemically possible formulas are found in higher mass regions. It has been shown that applying only this data therefore does not suffice to identify a compound, and more information, such as isotopic abundances, needs to be taken into account. High resolution mass spectrometry allows us to obtain the isotope pattern of sample molecule with outstanding accuracy. 2.1 Chemical background Atoms are composed of electrons, protons, and neutrons. The number of protons (the atomic number) is fixed and defines what element the atom is. The number of neutrons, on the other hand, can vary: Atoms with the same number of protons but different numbers of neutrons are called isotopes of the element. Each of these isotopes occurs in nature with a certain abundance. The nominal mass of a molecule is the sum of protons and neutrons of the constituting atoms. The mass of the molecule is the sum of masses of these atoms. The monoisotopic (nominal) mass of a molecule is the sum of (nominal) masses of the constituting atoms where for every element its most abundant natural isotope is chosen. Clearly, nominal mass and mass depend on the isotopes the molecule consists of, thus on the isotope species of the molecule. No present-day analysis technique is capable of resolving isotope species with identical nominal mass. Instead, these isotope species appear as one single peak in the mass spectrometry output. For this reason, we merge isotope species with identical nominal mass and refer to the resulting distribution as the molecule s isotope pattern. 2.2 Identification schema Obtaining an accurate isotope pattern from a high resolution mass spectrometer, we apply this information to identify the elemental composition of the sample molecule. Our input is a list of masses with normalized abundances that corresponds to the isotope pattern of the sample molecule. We want to find that molecule s elemental composition whose isotope pattern best matches the input. 2

Solving this task is divided into the following parts: First, all elemental compositions are calculated that share some property, for example monoisotopic mass, with the input spectrum. Second, to remove those compositions that do not exist in nature, chemical bonding rules are applied, discarding formulas that have negative or non-integer degree of unsaturation. And third, for every remaining composition, its theoretical isotope pattern is calculated and compared to the measured isotope pattern. Candidate patterns are ranked using Bayesian Statistics, and the one with the highest score is chosen. 3 Working with molecules and isotope peaklists The central object in Rdisop is the molecule, which is a list containing the (sum-)formula, its isotope pattern, a score and other information. Molecules can either be created explicitely through getmolecule() or initializexxx(), or through decomposemass() and decomposeisotopes(). Most functions operate only on a subset of the periodic system of elements (PSE) given as elements argument. 3.1 Handling of Molecules The getmolecule returns a list object containing the information for a named single atom or a more complex molecule. > library(rdisop) > molecule <- getmolecule("c2h5oh") > getformula(molecule) [1] "C2H6O" > getmass(molecule) [1] 46.04186 Note that the formula is in a canonical form, and the mass includes the decimals (the nominal mass for ethanol would be just 46). Without further arguments only the elements C, H, N, O, P and S are available. For metabolomics research, these are the most relevant ones. A different subset of the PSE can be returned and passed to the functions, but keep in mind that a larger set of elements yields a (much) larger result set when decomposing masses later. 3

> essentialelements <- initializechnopsmgkcafe() > chlorophyll <- getmolecule("c55h72mgn4o5h", z=1, + elements=essentialelements) > isotopes <- getisotope(chlorophyll, seq(1,4)) > isotopes [,1] [,2] [,3] [,4] [1,] 893.5431390 894.5459934 895.546247 896.54752708 [2,] 0.4140648 0.3171228 0.178565 0.06773657 > plot(t(isotopes), type="h", xlab="m/z", ylab="intensity") In this case we have created a complex molecule with a charge (z= +1) containing a metal ion and check its first four isotope peaks. For a visual inspection the isotope pattern can be plotted, see figure 1. 3.2 decomposemass and decomposeisotopes The decomposemass returns a list molecules which have a given exact mass (within an error window in ppm): > molecules <- decomposemass(46.042, ppm=20) > molecules $formula [1] "C2H6O" $score [1] 1 $exactmass [1] 46.04186 $charge [1] 0 $parity [1] "e" $valid [1] "Valid" 4

Intensity 0.10 0.15 0.20 0.25 0.30 0.35 0.40 893.5 894.0 894.5 895.0 895.5 896.0 896.5 m/z Figure 1: Isotope pattern for a protonated chlorophyll ion, which could be observed on a high-resolution mass spectrometer in positive mode. 5

$DBE [1] 0 $isotopes $isotopes[[1]] [,1] [,2] [,3] [,4] [,5] [,6] [1,] 46.0418648 47.04534542 48.04631711 4.904960e+01 5.005324e+01 5.105929e+01 [2,] 0.9749152 0.02293559 0.00210353 4.540559e-05 2.816635e-07 2.324709e-10 [,7] [,8] [,9] [,10] [1,] 5.206548e+01 5.307171e+01 5.407796e+01 5.508422e+01 [2,] 8.458096e-14 1.665930e-17 1.857005e-21 1.107409e-25 This call produces a list of potential molecules (with a single element in this case). The larger the masses, the allowed ppm deviation and the allowed elements list, the larger the result list will grow. For each hypothesis there is its formula and weight and score. The parity, validity (using the nitrogen rule) and double bond equivalents (DBE) are simple, yet commonly used hints for the plausibility of a solution and can be used for filtering the results list. For an amino acid this simple method guesses already eight hypotheses: > length(decomposemass(147.053)) [1] 8 On modern mass spectrometers a full isotope pattern can be obtained for a molecule, and the masses and intensities improve the accuracy of the sum formula prediction. Accessor functions return only subsets of the molecule data structure: > # glutamic acid (C5H9NO4) > masses <- c(147.053, 148.056) > intensities <- c(93, 5.8) > molecules <- decomposeisotopes(masses, intensities) > cbind(getformula(molecules), getscore(molecules), getvalid(molecules)) [,1] [,2] [,3] [1,] "C5H9NO4" "0.999999998064664" "Valid" [2,] "C3H17P2S" "1.93533578193563e-09" "Invalid" The first ranked solution already has a score close to one, and if using an N-rule filter, only one solution would remain. These cases are not removed 6

by default, because a few compound classes do not obey the N-rule, which after all is just a simple heuristic. I the masses were obtained by an LC-ESI-MS, it is likely that the measured mass signal actually resembles an adduct ion, such as [M+H] +. The sum formula obtained through decomposeisotopes will have one H too much, and will not be found in PubChem or other libraries, unless the adduct has been removed: > querymolecule <- submolecules("c5h10no4", "H") > getformula(querymolecule) [1] "C5H9NO4" Similarly, if during ionisation an in-source fragmentation occurred, the lost fragment can be added before querying using addmolecules. 3.3 Interaction with other BioConductor packages This section will give some suggestions how the Rdisop functionality can be combined with other BioConductor packages. Usually the masses and intensities will be obtained from a high-resolution mass spectrometer such as an FTICR-MS or QTOF-MS. BioConductor currently has two packages dealing with peak picking on raw machine data, MassSpecWavelet and XCMS. The latter contains a wrapper for MassSpecWavelet, so we need to deal with XCMS peak lists only. The ESI package 1 can extract a set of isotope clusters from peak lists. After Rdisop has created a set of candidate molecular formulae, the openaccess compound databases PubChem or ChEBI can be queried whether any information about this compound exists. Nota bene: a hit or non-hit does not indicate a correct or incorrect formula, but merely helps in further verification or structure elucidation steps. For other cheminformatics functionality in BioConductor see e.g. RCDK. Acknowledgments AP supported by Deutsche Forschungsgemeinschaft (BO 1910/1), additional programming by Marcel Martin, whom we thank for his unfailing support, and by Marco Kortkamp. 1 not part of BioConductor, see http://msbi.ipb-halle.de/ 7

References Sebastian Böcker and Zsuzsanna Lipták. A fast and simple algorithm for the Money Changing Problem. Algorithmica, 48(4):413 432, 2007. doi: 10.1007/s00453-007-0162-8. Sebastian Böcker, Matthias Letzel, Zsuzsanna Lipták, and Anton Pervukhin. Decomposing metabolomic isotope patterns. In Proc. of Workshop on Algorithms in Bioinformatics (WABI 2006), volume 4175 of Lect. Notes Comput. Sci., pages 12 23. Springer, Berlin, 2006. URL http://bio. informatik.uni-jena.de/bib2html/downloads/2006/boeckeretal_ DecomposingMetabolomicIsotopePatterns_WABI_2006.pdf. Sebastian Böcker, Zsuzsanna Lipták, Marcel Martin, Anton Pervukhin, and Henner Sudek. DECOMP from interpreting mass spectrometry peaks to solving the Money Changing Problem. Bioinformatics, 24(4):591 593, 2008. doi: 10.1093/bioinformatics/btm631. URL http://bioinformatics.oxfordjournals.org/cgi/reprint/24/ 4/591?ijkey=1lM50Bkzz4SCLsa&keytype=ref. Sebastian Böcker, Matthias Letzel, Zsuzsanna Lipták, and Anton Pervukhin. SIRIUS: Decomposing isotope patterns for metabolite identification. Bioinformatics, 25(2):218 224, 2009. doi: 10.1093/ bioinformatics/btn603. URL http://bioinformatics.oxfordjournals. org/cgi/content/full/25/2/218. 8