NEW TOOLS FOR FINDING AND IDENTIFYING METABOLITES IN A METABOLOMICS WORKFLOW Julia E. Wingate 1 ; Elliott Jones 2 ; Armin Graber 3 ; Klaus Weinberger 3 1Applied Biosystems, Toronto, Canada; 2Applied Biosystems, Foster City, California; 3BIOCRATES life sciences GmbH, Innsbruck, Austria INTRODUCTION One of the current bottlenecks in the field of metabolomics is finding and identifying potential biomarkers in complex data sets. In this technical note, we were able to use various software tools designed for mass spectrometry analysis to help interpret multiple complex data sets to not only find potential biomarkers, but to identify these compounds and to view the metabolic pathways in which these compounds are found. The ability to interactively view the appropriate biological pathway information helps increase confidence in compound identification and further understand the importance and biological implication of the various biomarkers. MATERIALS AND METHODS Urine samples were collected from three different human subjects and frozen until use. Prior to analysis, acetonitrile was added to thawed urine to precipitate any macromolecules. Samples were spun down, supernatant was removed, dried and reconstituted prior to analysis in water. After sample clean-up, samples were divided into 3 aliquots, resulting in 3 replicate injections from each of the different subjects. Chromatographic separation was performed using a Tempo TM ht LC System (Applied Biosystem MDS Sciex) with a Wakosil C18 (0.3x50mm, 3u, Eksigent) column. A gradient from 95% to 5% aqueous at several different flow rates was used. A QSTAR Elite system (Applied Biosystems MDS Sciex) was used for collection of MS and MS/MS data in TurboIonSpray positive ion mode. Continuous TOF calibration was achieved using the AutoCal feature in the Analyst QS 2.0 software to achieve better then 3 ppm mass accuracy. MarkerView 1.1 Software, Analyst QS 2.0 Software, along with Biocrates prototype software referred to as MarkerIDQ were used for data processing.
RESULTS and DISCUSSION MarkerView software finds peaks within complex data and allows this data to be aligned to account for mass and retention time drift. Once a list of all peaks present in the data has been created, MarkerView can perform several different types of statistical analysis on the data to determine the inherent groupings within the data and find which compounds are most likely to be potential biomarkers. In this work, we used Principal Components Analysis (PCA), an unsupervised multivariate statistical method to analyze the data. As can be seen in the scores plot shown on the left in Figure 1, PCA resulted in clear differentiation between each of the 3 individuals. The scores plot, shown on the right in Figure 1, shows which samples account for the separation between the 3 different groups. It is very easy to interpret the results in this figure, compounds falling in the upper left quadrant of the loadings plot (positive PC2 and negative PC1) are increased in sample 2F. Those in the lower left quadrant of the loadings plot (negative PC1 and negative PC2) are increased in sample E2, etc. Figure 1. PCA output from MarkerView software showing the scores and loadings plot To further mine the data, one can create a trends plot, shown in the bottom panel of Figure 2 which plots the intensity of a given compound across all the samples. In Figure 2, we show a plot of a compound at m/z 377 with a retention time of 16.6 minutes. As is clear in both the loadings and trends plot, this compound is greatly elevated in the E2 samples. Figure 2. Trends plot in MarkerView software showing elevation of the compound at m/z 377 and retention time 16.6 minutes only in samples E2
From within MarkerView software, it is possible to view both the raw chromatographic and mass spectral data. In addition, if MS/MS data was acquired using Information Dependent Acquisition (IDA), this information can also be displayed. In this experiment, we acquired data using the new Dynamic Background Subtraction (DBS) functionality which can be found on the QSTAR Elite instruments. When acquiring data using DBS, the number of relevant MS/MS spectra acquired is greatly increased. In this particular experiment, we were able to acquire MS/MS data automatically on m/z 377 as shown in the bottom pane of Figure 3. Figure 3. MS/MS spectra of m/z 377 acquired automatically using Dynamic Background Subtraction After finding a potential biomarker, usually the next step is to determine the elemental formula for the unknown compound. Standard elemental calculators typically use only molecular weight information to calculate potential elemental formulae that fall within a user-specified mass tolerance. However, these lists are often quite long with the correct compound not scoring very high on the list. Here instead we used the Formula Finder within Analyst QS 2.0 to help determine the elemental formula. This calculator uses molecular weight information as well as isotope ratios and chemical logic to help determine the correct formula. As can be seen in Figure 4 below, when using the Formula Finder, only 4 possible formulae were found that matched the specified criteria, even using a tolerance of 10 ppm. Figure 4. Formula Finder showing potential elemental formula for compound at m/z 377
The next step can be the most challenging in metabolomics studies, taking the possible elemental formulae and determining the structure. To help with this process, we used new prototype software from Biocrates life sciences, currently referred to at MarkerIDQ. This software, among other capabilities, allows one to search the KEGG database by molecular weight, helping with identification of potential biomarkers. In addition, MarkerIDQ contains interactive maps showing the biological pathways in which different metabolites are involved. Here we used the search module, shown in Figure 5, to look for potential metabolites with a molecular weight of 376 Da. Figure 6. Search Module from MarkerIDQ. For mass 376, only one compound is proposed, riboflavin From the search module, we were able to see that there is only one human metabolite found in the KEGG database with a molecular weight of 376, riboflavin, with a molecular formula of C 17 H 21 N 4 O 6. This corresponds to the second proposed formula from the Formula Finder with a mass error of 2.75 ppm. For further confirmation of the identity of the potential biomarker, we looked at the structure of riboflavin and compared that to the MS/MS spectra that was acquired automatically in our analysis. Using the Molecular Editor module in MarkerIDQ, we could view the structure of riboflavin, as shown in Figure 7 below. For comparison, an enlarged view of the MS/MS spectra of m/z 377 is shown in Figure 8. Based on the structure, one would expect to see a large fragment peak at m/z 243, corresponding to the loss of the OH side chain. From Figure 8, it is clear that this loss is seen in the MS/MS data. Figure 7. From the molecular editor within MarkerIDQ, the structure of riboflavin can be displayed
Figure 8. Enlarged view of the MS/MS spectrum of m/z 377 After identifying the potential biomarker at m/z 377 as riboflavin, one can use the Browser Module in MarkerIDQ to take a closer look at the biochemical pathways which involve riboflavin, as shown in Figure 9. This pathway information can be used to look for other compounds related to riboflavin metabolism that would be expected to be similarly up or down regulated. This information can be used as further confirmation, not only for the identity of riboflavin as the biomarker, but also as other biomarkers. Figure 9. Browser Module of Biocrates MarkerIDQ software provides interactive pathway information about selected compounds CONCLUSIONS QSTAR Elite was used, along with MarkerView, Formula Finder and Biocrates MarkerIDQ software, to find and identify a potential biomarker After work completed, volunteers were questioned about nutritional habits. The E2 samples which showed the high level of riboflavin were given by an individual taking high doses of multivitamin, thus providing more confidence as to the identification of the biomarker as riboflavin TRADEMARKS/LICENSING MARKERVIEW and TEMPO are trademarks and QSTAR, TURBOIONSPRAY, and ANALYST are registered trademarks of Applied Biosystems/MDS Sciex, a joint venture between Applera Corporation and MDS Inc. 2006. Applera Corporation and MDS Inc. All rights reserved.