Computer-assisted analysis of complex natural product extracts Detection of known and identification of unknown compounds from Q-TF mass spectrometry with the Agilent MassHunter Metabolite ID software Application Note Drug Discovery Author Edgar Nägele Agilent Technologies Waldbronn, Germany x 4 MS/MS (95.65).8.6.4.2.8.6.4.2 4.9 425.3778 393.23 375.886 249.59 325.38 426.3797 573.636 574.662 4.9 735.264 736.265 325.38 249.59 443.396 425.3778 853.4924 835.4832 836.4866 24 28 32 36 4 44 48 52 56 6 64 68 72 76 8 84 H H H H H H H H H H H H H H 3 C H H 3 C H 853.4924 854.4966 Abstract This Application Note demonstrates: The computer-assisted analysis for known and unknown new compounds in natural product extracts. Use of the Agilent MassHunter Metabolite Identification (MetID) software, which includes a molecular feature extraction (MFE) algorithm to locate compounds in the complex data, as well as several additional algorithms to analyze the natural product extracts. The separation of components in complex natural product extracts with the Agilent 2 Series Rapid Resolution LC (RRLC) system. The measurement of accurate molecular masses by electrospray ionization quadrupole time-of-flight (Q-TF) mass spectrometry. A comparative data analysis of extracts from two ginseng subspecies to find the compounds that are unique to a given subspecies or that differ in concentration between the subspecies.
Introduction Crude extracts from herbal origin have been used for medical treatment of diseases since prehistoric times by all ancient cultures around the globe. The ability to treat different diseases was found by trial and error over hundreds of years, and the knowledge about this medicine was inherited from generation to generation. The herbal-based traditional Chinese medicine (TCM) provides a good example of the efficiency achieved during this optimization process. Since these drugs are often complex mixtures containing hundreds of chemicals with different effects or synergisms, an assignment of pharmaceutical effect to a discrete compound is difficult. In Western medicine, drugs from natural origin are gaining importance due to their potential. But Western pharmaceutical quality standards require a deep knowledge about the ingredients of natural-productbased medicine. Computer-assisted identification could be used in a modern workflow to identify known compounds and to detect new unknown compounds in complex extracts from plant and herbal origin. This workflow could comprise comparison of complex MS and MS/MS data to libraries to find the known compounds and special computer-based algorithms for the identification of the new compounds. A famous Asian herb that has been used as an herbal medicine for more than 5 years is the ginseng root (Panax species). The main active compounds the ginsenosides are triterpene saponins, of which more than 8 have been isolated and characterized using electrospray mass spectrometry during the past years. The method of choice for the analysis of complex natural product extracts, such as those from the ginseng root, is high performance liquid chromatography (HPLC). 2 To determine the complex and similar structures of ginsenosides, modern LC/MS equipment for structure elucidation by accurate-mass measurement, MS/MS, and MS n is currently in use. 3 This Application Note demonstrates a computer-assisted workflow, based on the Agilent MassHunter Metabolite ID software, for the detection of known and identification of unknown compounds in complex natural product extracts. The ingredients in a ginseng root extract are used as an example. The analysis is based on the measurement of accurate masses by the Agilent 652 Accurate-Mass Q-TF LC/MS system. Experimental Equipment Agilent 2 Series Rapid Resolution LC (RRLC) system comprised of an Agilent 2 Series binary pump SL with degasser, an Agilent 2 Series high performance autosampler SL with thermostat, an Agilent 2 Series thermostatted column compartment (TCC) and an Agilent 2 Series diode-array detector SL (DAD SL) Agilent 652 Accurate-Mass Q-TF LC/MS system with dual-sprayer interface for mass calibration Agilent ZRBAX SB-C8 column, 2. x 5 mm,.8 µm Agilent MassHunter Workstation software used for data acquisition and data analysis (qualitative analysis software, MetID software) Sample preparation Two samples of powdered freezedried ginseng root (g each) were treated for 3 minutes with ultrasonic in ml methanol, filtered, and directly used for analysis. The samples were: Asian ginseng (Panax ginseng) obtained from ILHWA Co., LTD, Korea 2 American ginseng (Panax quinquefolius) obtained from Sigma-Aldrich, Taufkirchen, Germany An internal standard of reserpine (C 33 H 4 N 2 9 ), with a molecular weight (MW) of 68.2733, was prepared at ng/ml. The natural product extract was mixed : with this standard solution to give a final concentration of ng/ml reserpine. The µl injection delivered pg reserpine on-column. Methods LC conditions: Solvent A: Water +. % formic acid (FA) Solvent B: Acetonitrile +. % FA. Flow rate:.5 ml/min Gradient: min 5 %B, min 5 %B, 3 min 95 %B Stop time: 3 min Post time: 5 min Injection volume: µl Thermostatted autosampler: 4 ºC Automated delay volume reduction: N Column temperature: 5 ºC DAD SL: 22 nm/4, Ref. 36 nm/6 Flow cell: 2 µl ( mm path length) The Agilent 652 Accurate-Mass Q-TF LC/MS system was operated under the following conditions: Source: Electrospray (ESI) in positive mode with dual spray for reference masses (2.587 m/z and 922.98 m/z) Dry gas: 2 L/min Dry gas temperature: 2 ºC Nebulizer: 6 psi Scan: 2 3 Fragmentor: 5 V Skimmer: 6 V Capillary: 3 V Data dependent MS/MS: 2 MS and 2 MS/MS spectra/sec, 2 compounds per MS for MS/MS, and active exclusion for.25 min after 2 MS/MS spectra
Data analysis In the first step of data analysis, the Q-TF data file from the analysis of Asian ginseng (Panax ginseng) was opened in the Agilent MassHunter Metabolite ID software and the compounds were extracted using the molecular feature extraction (MFE) algorithm. The extraction of the molecular features was optimized on the lowlevel spiking compound, reserpine. After molecular feature extraction, the extracted ion chromatograms (EICs) were drawn for all MFE compounds. This set of compounds was compared with those in a custom Agilent METLIN Personal metabolite database of natural products. The database included accurate masses and retention times. Finally, the molecular formulae were calculated for all compounds, based on accurate masses. In the second step of data analysis, the unknown compounds in the sample were examined more closely to identify new natural product compounds. This identification started with the assumption that a large number of natural product compounds in a given source are derivatives and can be identified by comparing with the known compounds. Therefore, one of the known compounds was used as a reference and the remaining compounds were searched for similarities in the isotopic pattern, the MS/MS fragmentation pattern, and characteristic mass shifts. These searches used algorithms that are included in the Agilent MassHunter Metabolite ID software. 2) Compounds that increased in response, and 3) Compounds that decreased in response. Then the group of new compounds in the American ginseng was searched for compounds that were related to a given reference compound in the Asian ginseng, as described above. Results and discussion Step : Identification of known compounds The ingredients of an extract from an Asian ginseng root (Panax ginseng) were separated on the Agilent 2 Series RRLC system using an Agilent ZRBAX Rapid Resolution High Throughput (RRHT) column, with subsequent measurement of the accurate masses of the eluting compounds by C D means of the Agilent 652 Accurate- Mass Q-TF LC/MS system. The highresolution LC provided an excellent separation of the major and minor ingredients of the natural product extract. The resulting data files were processed with the described data analysis method using the Agilent MassHunter Metabolite ID software. The database, which was based on the Agilent METLIN Personal metabolite database, contained the accurate masses, the retention times (RTs), and the structural information for the most important ginsenoside compounds. As indicated in figure, the displayed results showed the EIC (extracted ion chromatogram), the ECC (extracted compound chromatogram), the UV trace, the molecular masses of all extracted molecular features, and the matches from the comparison to the database. In this particular example, A B E In the third step, the previously analyzed sample was used for a comparison to a related but new sample (American ginseng, Panax quinquefolius). In this comparison, the compounds in the new sample were divided into three groups: ) Completely new compounds Figure A) At-a-glance result table that displays all filtered compounds identified by METLIN database search and other search criteria. B) Detailed database search results for the selected compound, including molecular formula and relative mass error, retention time and retention time error, and structural information. C) Detailed chromatograms, including the ECC and EIC of the selected compound, and the UV trace. D) Detailed mass spectrum, displaying the isotopic pattern of the selected compound. E) Calculated formulae and mass accuracies for the selected compound, including its isotopic pattern. 3
the reported compound # 2 matched to the known compound ginsenoside Rb (C 54 H 92 23, MW = 8.629, RT = 6.447). Based on the measured masses of the protonated molecule (m/z 9.67) and its isotope pattern, the molecular mass was calculated, which confirmed the molecular formula in the database with a very low relative mass error of.37 ppm. Additionally, the retention time error was calculated in comparison to the database entry as.7 minutes in this particular case. Step 2: Detailed analysis of the unknown compounds For the analysis of the unknown compounds in the natural product extract, the assumption was made that these compounds are similar to the compounds that were already identified. For example, they might belong to the same chemical family or be derivatives of the known compounds. For this comparative analysis, the identified compound ginsenoside Rb was used as a reference. The first attribute that was used for the comparison was the measured isotopic pattern of each individual compound. In this comparison, all measured isotopic patterns were compared to the calculated isotopic pattern (CIP) of the reference compound ginsenoside Rb (C 54 H 93 23, m/z = 9.68, RT = 6.447). The isotopic pattern-matching algorithm in the Agilent MassHunter Metabolite ID software marks all compounds that exceed a predefined matching score. ne of those was the compound with a molecular weight of 94.634 at a retention time of 6.62 to 6.89 minutes (figure 2). The calculated formula C 57 H 94 26 matches best, with a low x 4.95.9.85.8.75.7.65.6.55.5.45.4.35.3.25.2.5..5 95.69 Figure 2 Isotopic pattern-matching of the measured compound spectrum (RT 6.62 to 6.89 minutes) at m/z 95.69 to the calculated isotopic pattern (CIP, green boxes) of ginsenoside Rb. relative mass error of.9 ppm (figure 3). Relative to the mass reference compound, this unknown adds the equivalent of a C 3 H 2 3. CIP (C 54 H 93 23 ) Compound Spectrum (6.62-6.89) 96.644 97.672 96 97 98 98.62 Figure 3 Calculated formula for the compound at m/z 95.69, including isotope pattern and calculated absolute and relative mass errors. 4
The second attribute that was searched for similarities was the MS/MS spectrum. For that purpose, all acquired MS/MS spectra were compared to the MS/MS spectrum of the chosen reference compound ginsenoside Rb (figure 4). Within this spectrum, typical fragments were identified that should be found in similar compounds. In this case, these fragments were typical sugar moieties and the fragments coming from the central steroidal system. For all fragments, formulae and mass accuracies were calculated (table ). To make this comparison manually achievable, a special fragments overview graph was generated by the fragment pattern-matching algorithm (figure 5). In this overview, all MS/MS fragments of compounds with a relation to the compared ginsenoside Rb MS/MS spectrum were displayed. The fragment masses were displayed in a graph showing the fragment mass on the x-axis and the compound retention time on the y-axis. nly compounds with fragment masses that matched those in the comparison compound ginsenoside Rb were selected. This is indicated by the grey bands in the graph. New compounds that had a fragment at a nominal mass at m/z 325 were further investigated. ne compound at a retention time of 6.7 minutes showed additional typical fragments at nominal masses of m/z 47, 425, and 443. x 5.8.6.4.2.8.6.4.2 Parent MS/MS (9.6) 325.29 487.655 343.232 325.29 649.284 425.3773 326.62 47.3668 488.682 667.2273 65.4397 443.389 65.22 785.538 C 42 H 73 3 = 785.538 -H 2 C 42 H 7 2 = 767.4925 325.29 443.389 425.3773 767.4925 768.4933 C 3 H 49 = 425.3773 -H 2 C 3 H 47 = 47.3668 3 34 38 42 46 5 54 58 62 66 7 74 78 82 86 Figure 4 MS/MS spectrum of ginsenoside Rb and interpreted fragments. (See also table.) m/z Ion formula Calculated m/z [mda] H [ppm] H H H H H H H H H H H H H 3 C H H Neutral loss Loss formula Loss mass 63.596 C 6 H 5 63.6.48 2.92 946.555 C 48 H 82 8 946.55 325.29 C 2 H 2 325.292..2 784.4982 C 42 H 72 3 784.4973 343.232 C 2 H 23 343.2349.24.7 766.4878 C 42 H 7 2 766.4867 47.3668 C 3 H 47 47.36723.45. 72.2443 C 24 H 46 23 72.243 425.3773 C 3 H 49 425.37779.5.2 684.2338 C 24 H 44 22 684.2324 443.389 C 3 H 5 2 443.38836.69.56 666.222 C 24 H 42 2 666.229 487.655 C 8 H 3 5 487.6575.26.54 622.4456 C 36 H 62 8 622.4445 55.757 C 8 H 33 6 55.763.65.3 64.4354 C 36 H 6 7 64.4339 65.4397 C 36 H 6 7 65.448.5 2.48 54.74 C 8 H 32 6 54.69 649.284 C 24 H 4 2 649.2857.6.24 46.3927 C 3 H 52 3 46.396 667.2273 C 24 H 43 2 667.2293.87 2.8 442.3838 C 3 H 5 2 442.38 767.4925 C 42 H 7 2 767.494.53.99 342.86 C 2 H 22 342.62 785.538 C 42 H 73 3 785.5457.75.96 324.73 C 2 H 2 324.56 929.5454 C 48 H 8 7 929.54683.43.54 8.657 C 6 H 2 6 8.634 9.68 C 54 H 93 23 9.62.32.9 Table MS/MS fragments of ginsenoside Rb, calculated formulae, and mass accuracies. 5
A closer examination of the MS/MS spectrum of this compound showed that fragments at m/z 325.38, 47.3666, 425.3778, and 443.396 could be explained in comparison to the MS/MS spectrum of the reference compound ginsenoside Rb as a glucose moiety and fragments of the steroidal framework of the molecule (figure 6). At this point of the analysis, the whole set of data was searched for derivatives of the reference compound ginsenoside Rb. For that purpose, the change in the molecular formula of the parent was introduced in a transformation search list. The software calculated the resulting mass and formula and assigned the identified compounds. In this way, the compound at m/z 95.69 and retention time 6.7 was assigned as the malonyl derivative of ginsenoside Rb (mrb), with a difference in the formula of C 3 H 2 3 (mass delta 86.4). 4 With this additional information, the fragments in the MS/MS spectrum at m/z 4.9 and 249.59 were explained and the structure was assigned (figure 6). The complete information for all MS/MS fragments, such as mass, formula, neutral loss to the parent, mass accuracies, mass shifts by derivatization, and mass shifts to reference, are shown in table 2. Parent Bands Fragments (Parent Relation) 2 8 6 4 2 8 x 4 MS/MS (95.65).8.6.4.2.8.6.4.2 2 3 4 5 6 7 8 9 2 Mass-to-Charge [m/z] Figure 5 Fragment overview graph. Compounds with fragments at a nominal mass of m/z 325 (highlighted in green) were further investigated. ne compound at RT 6.7 minutes showed additional typical fragments at nominal masses of m/z 47, 425, and 443 (highlighted in blue). 4.9 425.3778 393.23 375.886 249.59 325.38 426.3797 573.636 574.662 4.9 735.264 736.265 325.38 249.59 443.396 425.3778 853.4924 835.4832 836.4866 24 28 32 36 4 44 48 52 56 6 64 68 72 76 8 84 H H H H H H H H H H H H H H 3 C H H 3 C H 853.4924 854.4966 Figure 6 MS/MS spectrum of compound at m/z 95.65 and RT 6.7 minutes, with proposed structure and explained main fragments. m/z Ion formula Calculated m/z [mda] [ppm] Neutral loss Loss formula Loss mass FPM Shift Δ Shift Shift m/z* m/z** [mda]*** formula 27. C 8 H 5 27.74.74 5.8 68.545 C 49 H 8 25 68.49887 249.59 C 9 H 3 8 249.649.53 6.4 946.55249 C 48 H 82 8 946.552 325.38 C 2 H 2 325.292.9 2.77 87.49763 C 45 H 74 6 87.49769 325.29.9 375.886 C 5 H 9 375.929 3.57 9.53 82.52284 C 42 H 76 5 82.5842 393.23 C 5 H 2 2 393.275.43.8 82.592 C 42 H 74 4 82.5786 47.3666 C 3 H 47 47.36723.58.42 788.2448 C 27 H 48 26 788.24338 47.3668.3 4.9 C 5 H 23 3 4.332.44 3.5 784.49957 C 42 H 72 3 784.49729 325.29 86.4.43 C 3 H 2 3 425.3778 C 3 H 49 425.37779.5. 77.2336 C 27 H 46 25 77.23282 425.3773.56 443.396 C 3 H 5 2 443.38836 2.25 5.7 752.2285 C 27 H 44 24 752.22225 443. 389.56 573.636 C 2 H 33 8 573.664 2.5 4.36 622.4478 C 36 H 62 8 622.44447 487. 655 86.4 2.24 C 3 H 2 3 735.264 C 27 H 43 23 735.2896 2.5 3.42 46.395 C 3 H 52 3 46.3965 649.284 86.4 2.36 C 3 H 2 3 835.4832 C 45 H 7 4 835.48383.68.82 36.283 C 2 H 24 2 36.2678 853.4924 C 45 H 73 5 853.4944 2.2 2.37 342.97 C 2 H 22 342.62 767.4925 86.4.49 C 3 H 2 3 *FPM = the m/z value of the analogous fragment in the original (reference) compound. **Shift m/z = mass shift between the fragment from the original (reference) compound and the fragment that includes derivatization = mass shift by derivatization. ***Δ Shift [mda] = [(FPM m/z + Shift m/z) m/z] * = mass shift to reference. Table 2 MS/MS fragments of malonyl ginsenoside Rb (mrb), masses, calculated formulae with mass accuracies, neutral losses to the parent, loss formulae with mass accuracies, mass shifts by derivatization, and mass shifts to reference. 6
Step 3: Differential analysis of the components in two samples The content in natural product extracts often differs between various species in a plant family or even between the same plants if grown under different conditions. The ginseng plant exists in various subspecies, whose composition and concentrations of ingredients can be different. 5 It is possible to distinguish the subspecies and their related pharmaceutical products by LC/MS analysis and determination of the content and quantity of different ginsenosides. 6 To examine such natural product extracts, the investigated extract is compared with a related well-investigated natural product extract. This comparison can be performed with the assistance of the Agilent MassHunter Metabolite ID software. A list of compounds is extracted from each sample data file by the molecular feature extractor and the two compound lists are compared. In the comparison, three classes of compounds are assigned: ) Compounds that are new 2) Compounds that are present in both samples but show increased response in the sample under investigation, and 3) Compounds that are present in both samples but show decreased response in the sample under investigation. In this work, an extract of the American ginseng (Panax quinquefolius) was compared to the previously investigated extract of the Asian ginseng (Panax ginseng). To align and normalize both compound lists, reserpine and ginsenoside Rb (which is present in an equal amount in both samples) were used as internal standards. As an example, the list of compounds that increased in the American ginseng sample included a compound at Abundance Abundance x6 3 2 6 x.5 A) 4 4.5 5 5.5 6 6.5 7 Acquisition Time [min.] B) 5.57 5.538.5 4 4.5 5 5.5 6 6.5 7 Acquisition Time [min.] Abundance x 6 Compound Spectrum (5.364-5.67) 8.52.8.6.4.2.8.6.4.2 82.549 C) 83.583 84.5 8.5 82 82.5 83 83.5 84 84.5 Mass-to-Charge [m/z] Figure 7 Comparison of Asian and American ginseng for the compound at m/z 8.52 at RT 5.5 minutes. A) ECCs of the compound at mass 8.494, with increased response in the American ginseng sample. B) EICs of the compound at m/z 8.52. C) Isotopic pattern of compound at m/z 8.52. Figure 8 Measured and calculated masses, calculated formula, and calculated mass errors of the compound pseudoginsenoside F, which is present in 8.5-fold excess in the American ginseng sample. H x 5 MS/MS (8.5) 43.7 H 2.6 H 3 C 43.7 2.4 2.2 2.8 H 457.3686.6 H 3 C.4 H 39.86.2 457.3686 H 8.54 H.8 439.358 H.6 82.528.4.2 44. 39.86 42.347 H H 25 75 225 275 325 375 425 475 525 575 625 675 725 775 825 Figure 9 MS/MS spectrum and structure interpretation of pseudoginsenoside F. H3C 7
m/z 8.52 that increased by 8.5-fold relative to the Asian ginseng sample (figure 7). The increase in response is clearly shown in the comparison of the extracted compound chromatograms and extracted ion chromatograms (ECCs and EICs) of the American ginseng sample and the control sample from Asian ginseng. For this compound at m/z 8.52, the formula C 42 H 72 4 and a molecular mass of 8.494 were calculated with a relative mass error of 2.22 ppm. The formula and molecular mass match those of the known pseudoginsenoside F (figure 8). The fragments obtained in the MS/MS spectrum are consistent with the structure of this compound (figure 9), and the formulae of these fragments were calculated and showed low relative mass errors (table 3). Conclusion This Application Note demonstrates a workflow concept based on softwareassisted identification of natural products in highly complex extracts from plant origin. In this example, an extremely complex extract from ginseng root was analyzed using the Agilent 652 Accurate-Mass Q-TF LC/MS system. The data file was processed using the Agilent MassHunter Metabolite ID software. By software-assisted analysis, known compounds were identified by database search in the first step. In the second step, the unknown compounds were identified by a software-assisted similarity search to the previously identified compounds. For this purpose, the isotopic pattern-matching algorithm and the MS/MS fragment pattern-matching algorithm were used. The complex structures of ginsenosides, which are the main compounds in the extract, were elucidated by the interpretation of the results from the fragment pattern-matching and accurate-mass-based formulae m/z Ion formula Calculated m/z calculation. In the third step, the elucidated sample was compared with a new sample obtained from another ginseng species, and new compounds and those that increased in amount were identified. References [mda]. Liu, S., Cui, M., Liu, Z., Song, F., Mob, W., Structural analysis of saponins from medical herbs using electrospray ionization tandem mass spectrometry, J. Am. Soc. Mass Spectrom. 5:33-44, 24. 2. Fuzzati, N., Analysis methods of ginsenosides, J. Chrom. B 82:9-33, 24. 3. Wang, X., Sakuma, T., Asafu-Adjaye, E., Shiu, G. K., Determination of ginsenosides in plant extracts from Panax ginseng and Panax quinquefolius L. by LC/MS/MS, Anal. Chem. 7:579-584, 999. [ppm] Neutral loss Loss formula Loss mass 43.7 C 8 H 5 2 43.666.44 3.9 658.3929 C 34 H 58 2 658.39283 39.86 C 2 H 2 9 39.8.58.88 492.384 C 3 H 52 5 492.3847 42.347 C 3 H 45 42.34649.49.6 38.532 C 2 H 28 3 38.5299 439.358 C 3 H 47 2 439.3576.94 2.3 362.42 C 2 H 26 2 362.4243 457.3686 C 3 H 49 3 457.36762.99 2.7 344.338 C 2 H 24 344.386 8.54 C 42 H 72 4 8.4995.92.4 Table 3 MS/MS fragment masses of pseudoginsenoside F, calculated formulae, and calculated mass accuracies. 4. Kite, G. C., Howes, M. J. R., Leon, C. J., Simmonds, M. S. J., Liquid chromatography/mass spectrometry of malonylginsenosides in the authentication of ginseng, Rapid Commun. Mass Spectrom. 7:238-244, 23. 5. Li, W., Gu, C., Zhang, H., Awang, D. V. C., Fitzloff, J. F., Fong, H. H. S., van Breemen, R. B., Use of high performance liquid chromatography-tandem mass spectrometry to distinguish Panax ginseng C. A. Meyer (Asian ginseng) and Panax quinquefolius L. (North American ginseng), Anal. Chem. 72:547-5422, 2. 6. Chan, T. D. W., But, P. P. H., Cheng, S. W., Kwok, I. M. Y., Lau, F. W., Xu, H. X., Differentiation and authentication of Panax ginseng, Panax quinquefolius, and ginseng products by using HPLC/MS, Anal. Chem. 72:28-287, 2. www.agilent.com/chem/metid Agilent Technologies, Inc., 29 Published January, 29 Publication Number 599-3234EN