Improved peptide sequencing using isotope information inherent in tandem mass spectra

Size: px
Start display at page:

Download "Improved peptide sequencing using isotope information inherent in tandem mass spectra"

Transcription

1 RAPID COMMUNICATIONS IN MASS SPECTROMETRY Rapid Commun. Mass Spectrom. 2003; 17: Published online in Wiley InterScience ( DOI: /rcm.1119 Improved peptide sequencing using isotope information inherent in tandem mass spectra William R. Cannon 1 * and Kenneth D. Jarman 2 1 Computational Biosciences, Pacific Northwest National Laboratory, Richland, WA 99352, USA 2 Applied Mathematics, Pacific Northwest National Laboratory, Richland, WA 99352, USA Received 26 September 2002; Revised 2 June 2003; Accepted 2 June 2003 We demonstrate here the use of natural isotopic labels in peptides to aid in the identification of peptides with a de novo algorithm. Using data from ion trap tandem mass spectrometric (MS/MS) analysis of 102 tryptic peptides, we have analyzed multiple series of peaks within LCQ MS/MS spectra that spell peptide sequences. Isotopic peaks from naturally abundant isotopes are particularly prominent even after peak centroiding on y- and b-series ions and lead to increased confidence in the identification of the precursor peptides. Sequence analysis of the MS/MS data is accomplished by finding sequences and subsequences in a hierarchical manner within the spectra. Copyright # 2003 John Wiley & Sons, Ltd. The identification of peptides derived from complex mixtures of proteins is a prerequisite for several high-throughput proteomics technologies. 1 4 Typically, protein mixtures are digested with trypsin and the resulting peptides are sequenced using tandem mass spectrometry (MS/MS). The sequence information provided by MS/MS analysis ideally consists of the sequential residue masses, or more precisely the mass-to-charge ratios (m/z, in units of Th), of the peptide as it fragments along the peptide backbone. Recent developments in extending peptide-sequencing technologies include chemically and physically labeling the peptides so that either the N-terminal or C-terminal fragments resulting from collision-induced dissociation (CID) can be differentiated by m/z shifts of peaks in the spectrum. Chemical approaches have included derivatization of both the N- and C-terminal groups. 5 7 The prominent b- or y-ion series can then be more easily discerned as long as the spectrum of the unmodified or differentially modified peptide is also obtained. In one approach, carboxylic acid groups of glutamic acid, aspartic acid and the C-terminal carboxylic acid are methyl-esterified using either -OCH 3 or -OCD 3. 5 When the spectrum for a singly charged peptide that has been modified with the methyl ester only at the C-terminal carboxylic acid is compared with the spectrum of the analogous peptide with the deuterated methyl ester, the y-ion series of the latter appears shifted by 3 Th. In a second approach, two populations of peptides are obtained and one population is differentially modified with O-methylisourea on the C-terminal lysine. 7 When the spectra of the two populations are compared, y-ion peaks in the derivatized peptides then show up as shifted by the mass difference of the functional groups (42 Da for singly charged *Correspondence to: W. R. Cannon, Computational Biosciences, Pacific Northwest National Laboratory, Richland, WA 99352, USA William.Cannon@pnl.gov Contract/grant sponsor: The Office of Biological and Environmental Research in the US Department of Energy; contract/ grant number: 41966A. peptides). Although chemically labeling peptides can have additional advantages for quantitating peptides, for the purpose of sequencing peptides it requires additional experimental and computational steps that may slow down highthroughput sequencing. Physically labeling peptides involves introducing heavy isotopes during processing steps With mass resolutions dropping to one in 2000 (Dm/z/m/z) in the mass range observed by ion trap mass detectors, it is reasonable to expect the detection of isotopic distributions of a given peptide. Typically, proteins are digested with trypsin in both normal solution and solution containing 18 O-labeled water. Physically labeling peptides with isotopes has the advantage for peptide sequencing in that there is no need to obtain a second spectrum in order to observe spectral shifts in the C-terminal ion series since the tryptic digestion can be done in a mixture of 50% 18 O-labeled water. As a result, two y-ion peaks appear in the spectrum for each C-terminal fragment, one peak containing the naturally abundant distribution of oxygen and the other containing the isotopic distribution for the 18 O- labeled oxygen. The presence of two adjacent peaks for each y-ion fragment in the spectrum allows y-ions to be readily discerned from peaks due to other fragments. The net result is that for complete spectra, the sequence of the peptide can be easily determined from the mass differences of consecutive y- ion peaks. In addition to using isotopic labels to aid in the sequencing of a peptide, isotopic labels are used in high-throughput proteomic studies for the quantitation of peptides. 4,16 18 In these studies cell populations are grown on normal media and on 15 N-enriched media. The cell lysate from the two populations are pooled and analyzed by mass spectrometry and relative abundances are determined based on abundances of the 14 N- and 15 N-enriched peptides. Identification of the peptides still relies on MS/MS analysis. In order to analyze both 14 N- and 15 N-enriched peptides by currently available programs, the data must be analyzed twice, the first Copyright # 2003 John Wiley & Sons, Ltd.

2 1794 W. R. Cannon and K. D. Jarman time using parameters for the natural 14 N-isotopic distribution and the second time using parameters for the 15 N- enriched isotopic distribution. In both of the cases of 18 O labeling for sequence analysis and 15 N labeling for quantitation purposes, many existing spectral analysis packages must be modified or run using a separate set of parameters in order to accommodate both isotope-labeled and unlabeled peptides. With regard to the mathematical and computational analysis of peptide sequences, many advances in de novo algorithms have been made in the last decade Graph theory approaches for analyzing spectra are a natural fit with the problem of how to use spectral peak information to build up the evidence for a particular peptide sequence. In the process of building up the evidence for the identification of a particular peptide, the peaks can be associated with known fragmentation ion types. Initial scoring functions were largely ad hoc and used empirically derived parameters for the various ion types that might arise from CID. Recently, work had been done on making the identification and scoring of the peptides more rigorous. 20 We demonstrate here the use of natural abundance isotopic labels that survive the centroiding process to aid in the identification of peptides with a de novo algorithm. The data are from electrospray ionization and ion trap MS/MS analysis of 102 tryptic peptides of charge þ2. Observation of isotopic peaks for the relatively abundant b- and y-series ions leads to an increased confidence in the identification of the precursor peptide. The computational analysis of the spectra consists of a novel method of exploiting the additional isotopic information that is not dependent on classification of product ions into the various ion-types such as y- or b-ions. In addition, since the computational method relies on finding a series of mass peaks within a spectrum that spells a peptide and does not rely on ion type classification per se, it is readily adaptable to proteomic studies in which peptides are differentially labeled with 14 N/ 15 N for quantitation purposes, 4,16 18 but MS/MS analysis of both 14 N- and 15 N-labeled peptides is required. That is, in addition to maximally exploiting isotope information that survives the peak centroiding process, the method also provides the ability to simultaneously analyze both 14 N- and 15 N-labeled peptides as well as other isotopically or chemically labeled peptides. METHODS Description of spectra Peptides were derived from Deinococcus radiodurans by tryptic digestion and mass analyzed in the laboratory of Richard Smith in the William R. Wiley Environmental Molecular Sciences Laboratory at the Pacific Northwest National Laboratory. 24 The CID spectra for the 102 peptides discussed herein were obtained using an electrospray ionization source feeding a Finnigan LCQ Classic ion trap. The spectra were all output in centroid mode. Independent identifications were performed using SEQUEST 25 in which tryptic peptides obtained from an organism-specific sequence database were examined. The range of SEQUEST scores for this data set ranged from 3.1 to 6.7. In addition, each peptide was reanalyzed multiple times on multiple days with the LCQ, and the mass of each peptide precursor ion was confirmed to within 1 part-per-million (ppm) by the use of an 11.5 Tesla ion-cyclotron resonance mass spectrometer and a 15% elution time tolerance. 26 Analysis of isotopic distributions Our approach to identifying the peptide fragment ion types and their isotopic distributions that are present in the mass spectral data follows that outlined by Dancik et al. 20 Let A be the set of amino acids and let m(a) represent the mass of the residue for each amino acid a 2 A. Let P represent a parent peptide, which consists of a string of N amino acids: P ¼ a 1...a N, here a i 2 A for i ¼ 1,..., N. Let P ij ¼ a i...a j be a partial peptide (1 i j N), and define the N-terminus and C-terminus partial peptides by P i ¼ P 1,i and P i ¼ P i,n, respectively, for (1 i N). The residue mass of a peptide P is: mp ð Þ ¼ XN i¼1 ma ð i Þ Likewise, the residue mass for the N-terminal or C- terminal peptide is defined as the sum of the respective residue masses: mp ð i Þ ¼ Xi ma ð k Þ and mp X N i ¼ k¼1 k¼i ma ð k Þ Now we can define ion types in terms of mass offsets from the peptide residue mass. Let d k be the mass offset associated with the kth ion type as shown in Table 1. The mass associated with the peptide fragment of ion type k is then given by either m(p i ) þ d k or m(p i ) þ d k for N-terminal and C-terminal peptides, respectively. Next, we want to automatically find the offsets d k due to peptide fragments that result from MS/MS fragmentation processes such as CID. We do this by examining spectra of 102 identified peptides. Let S ¼ {s 1,...s M } be a spectrum of M peaks that results from CID of the peptide P. Define the difference between a spectral peak and the residue mass of the peptide as d ij ¼ s j m(p i ). If a difference d ij corresponds to an ion type offset d k shown above, then it will be a common feature in the mass spectral data sets that we examine. A histogram of the differences d ij will indicate which value of d ij corresponds to ion type offset d k by a frequency of appearance that is above the background. De novo analysis of mass spectra Each peak in a mass spectrum is a node in a spectrum graph, and all nodes are labeled by the m/z value of the corresponding peak. The graph is ordered from lowest to highest based Table 1. Ion types in terms of mass offsets: d k is the mass offset from the peptide residue mass associated with the kth ion type. The mass associated with the peptide fragment of ion type k is then given by either m(p i ) þ d k or m(p i ) þ d k for N-terminal and C-terminal peptides, respectively k Ion type a b b-nh 3 b-h 2 O y y-nh 3 y-h 2 O Offset, d k

3 Peptide sequencing using isotope information in MS/MS spectra 1795 on the mass-to-charge (m/z) value of the node. Edges between nodes are allowed if the m/z difference between nodes is equal to the monoisotopic mass of an amino acid within a margin of error. After construction of the graph, all paths through the graph are found in a manner following a depth-first search. This process is rapid, because, unlike other implementations of spectrum graphs, 20,22 we do not expand the graph by adding additional nodes for each possible ion type that can be derived from peptide fragmentation. All nodes are initially colored white. The initial node used in finding the paths is the one with the lowest m/z value. This node and all nodes subsequently visited from this node are colored black. After a search starting with the initial node has been completed, the next white node is chosen and another search is initiated. This is done until all nodes have been visited. Since the graph is sorted by m/z values, searching from lowest to highest will spell out all N-terminal peptides (that is, a-, b- and c-type ions) as they appear, but will spell out C-terminal peptides (that is, x-, y-, and z-type ions) in reverse. Construction of a hierarchical sequence graph for scoring Next, a sequence graph is constructed in which each node is labeled with one of the sequences or strings found above. Edges are placed between nodes if a node s sequence or string label is a substring of another node s sequence label. In this manner, and as shown in Fig. 1, child-parent relationships between nodes are established in a sequence graph. From each parent node, all child nodes can be examined for scoring. An amino acid sequence A k of length l occurs randomly with probability p lþ1, as there are lþ1 peaks that make up A k. This amino acid sequence can appear either forward or backward in the spectrum, and the number of ways A k can appear in this spectrum is given by M. The number of ways A k can appear n times in the spectrum is M!/ (n!(m-n)!). The probability that A k appears exactly n times is then given by: M! PA k appears n times ¼ p lþ1 n 1 p lþ1 M n n! ðm nþ! Subsequences that appear in the sequence tree (Fig. 1) for A k are accounted for by conditioning the each subsequence on the immediate supersequence in the sequence tree. The overall score reflects the likelihood that the sequence A k would arise by chance alone. Further details on the scoring algorithm are presented elsewhere. 27 RESULTS Natural abundance isotope distributions in MS/MS spectra We have employed a computational model developed by Dancik et al. 20 to discover isotopic information contained within MS/MS spectra. The method was originally used to discover prominent ion types in MS/MS spectra regardless of the particular fragmentation process employed. As noted by Bartels, 19 the mass of each fragment is the sum of the residue masses and a mass offset characteristic of each ion type which implicitly includes a mass value for either the C- or N- terminus moiety. For example, b-series ions have a mass offset of 1 Da due to the presence of a hydrogen atom beyond what would be expected for the sum of the residue masses plus the N-terminal hydrogen atom. Likewise, for tryptic peptides, y-series ions have a mass offset of þ19 Da from what would be expected for the sum of the amino acid residues. The offset is due to a C-terminus mass of þ17 Da for the terminal hydroxyl group, an additional hydrogen on the N-terminus, and the ionizing proton on the lysine or arginine side chain. Figures 2(A) and 2(B) show scans of ion type offsets, from the N- and C-terminus, respectively, for 102 data sets from a Figure 1. An example sequence tree. Top nodes correspond to parent sequences and subnodes correspond to partial sequences of the parent node. Each sequence or subsequence corresponds to at least one path through the spectrum graph. Each parent node is connected to its children, forming a sequence hierarchy, which is used later for scoring. Each node contains information on all paths through the spectrum graph that is consistent with the node label or sequence.

4 1796 W. R. Cannon and K. D. Jarman Figure 2. Histogram of ion type offsets for 102 data sets from a Finnigan LCQ ion trap. (A) Abundance of mass shifts from the mass of N-terminal peptide fragments. The prominent peak at approximately þ1 u corresponds to the presence of b-series ions in the spectra. (B) Abundance of mass shifts from the mass of C-terminal peptide fragments. The prominent peak at þ19 u corresponds to the presence of y-series ions in the spectra. Finnigan LCQ ion trap that were collected on the same instrument. The residue mass itself would have an offset of zero in Figs. 2(A) and 2(B), respectively. As expected, no peak occurs at zero indicating that peptide fragments in the spectrum that correspond to unmodified residue masses of peptide fragments were not seen. The highest peak in the N- terminal scan shown in Fig. 2(A) appears at approximately 1 Da above the residue mass at zero and corresponds to the presence of b-series ions. In addition, peaks due to loss of water and ammonia from b-series ions can be seen at approximately 17 and 16 Da, respectively, and peaks at approximately 34 and 35 Da likely correspond to loss of combinations of water and ammonia. Likewise, in Fig. 2(B), the highest peak in the C-terminal scan occurs at 19 Da and corresponds to y-ions. Peaks near 1 and 2 m/z in Fig. 2(B) again correspond to loss of water and ammonia from the y- series ions. Figures 3(A) and 3(B) show the same scans, but now focused in around the regions due to the b- and y-ions, respectively. As can be seen, both of the major peaks in the scans show significant shoulders located 1 and 2 Da up from the major peak apices. The shoulder peaks indicate that naturally abundant isotopes can commonly be identified using the LCQ mass spectrometer. This is not unexpected since the contributions of isotopes to the relative abundance increase with increasing number of atoms in the molecular formula (and hence increasing mass). For an average peptide of 1200 Da and molecular formula C 53 H 97 N 15 O 16, 28 the monoisotopic peak is the most abundant, the first isotope peak is expected to be 65% as abundant as the monoisotopic peak, and the second isotopic peak is expected to be 24% as abundant as the monoisotopic peak. Next, we show how this additional information due to the naturally occurring isotopes can be exploited in analyzing the spectra. The spectrum for the first example is shown in Fig. 4. The full-length parent peptide is ALEALQSNPK, as determined independently by SEQUEST (Xcorr ¼ 3.57 with the second highest scoring peptide having a Xcorr ¼ 1.72 and a deltcn of 0.52), and by our own sequence comparison and database search of tryptic peptides from the organism-specific sequence database. Additional MS/MS spectra were collected on this peptide which resulted in SEQUEST Xcorr scores up to 4.1, confirming the identity of this peptide. Furthermore, the mass of the parent peptide was confirmed to be to within 1 ppm with an 11.5 Tesla ioncyclotron resonance mass spectrometer. 26 Figure 3. Magnification of histogram shown in Fig. 2 around (A) the b-series ions and (B) the y-series ions. The resolution of the spectra is high enough to observe mass shifts due to isotopes in both ion series. The observed isotope distribution matches the expected distribution for peptides with masses in the order of 1000 u.

5 Peptide sequencing using isotope information in MS/MS spectra 1797 Figure 4. MS/MS spectrum for the peptide ALEALQSNPK. (A) Major peaks identified. The mass differences between consecutive peaks in a series such as the y- or b-ion series correspond to the mass of amino acid residues. Isotopic resolution of ion series peaks, as demonstrated in Fig. 2, enables the peptide sequence to be discovered multiple times in the spectrum. (B) Close-up view of isotopic resolution of y 6 - and b 7 -ions and peaks due to loss of water and ammonia. The de novo algorithm determined that the sequence of this peptide is partially spelled out by five series of peaks, as shown in Table 2. The partial sequence EALQSN was obtained by both the y-series ions and a series of peaks that occur consistently at a value 1 Da greater than the y-series ions. For the most part, the intensities of peaks in the latter series are the same order of magnitude as that seen for peaks in the y-series ions. As one might expect, the natural isotopic distribution is modified by the peak centroiding process, so it is not straightforward to compare natural distributions with those that survive the centroiding process. However, we do expect to be in the same ball park the distributions should not differ by an order of magnitude (when an isotope peak is actually called out by the centroiding process). As expected, the relative peak heights shown in Fig. 4 and Table 2 are in approximate agreement with the expected isotopic distribution for the peptide. That is, as the number of elements in the peptide increases, the monoisotopic peak decreases in relative intensity and the higher mass peaks increase in relative abundance. For the full-length peptide ALEALQSNP with a nominal mass of 1070, the first isotopic peak above the monoisotopic peak is expected to be approximately 60% of

6 1798 W. R. Cannon and K. D. Jarman Table 2. ALEALQSNPK-associated peaks. The sequence EALQSNP and four subsequences of at least five peaks each are present in the spectrum shown in Fig. 4. Isotopic resolution of ion series peaks enables the observation and analysis of these multiple instances of sequences as related families, as shown in Fig. 1 Amino acid m/z Relative intensity Ion type E y8 A y7 L y6 Q y5 S y4 N y3 Peak y2 E y8 þ1 A y7 þ1 L y6 þ1 Q y5 þ1 S y4 þ1 N y3 þ1 Peak y2 þ1 P b9 N b8 S b7 Q b6 L b5 A b4 E b3 Peak b2 P b9 þ1 N b8 þ1 S b7 þ1 Q b6 þ1 L b5 þ1 Peak b4 þ1 N b S b Q b L b Peak b the abundance of the monoisotopic peak. The second isotopic peak above the monoisotopic peak would be expected to be approximately 20% of the abundance of the monoisotopic peak. In addition, an isotopically shifted series of peaks was obtained by the de novo algorithm adjacent to the b-series ion peaks. While the b-series peaks spell out the partial sequence EALQSNP, the isotopically shifted series spell out the shorter partial sequence LQSNP. Again, the magnitudes of the peak heights approximately reflect the natural abundance of the isotopes. The presence of b-series ions that have additionally undergone neutral loss of ammonia makes up a series of five peaks that spell out NSQL. Other shorter partial sequences that are consistent with the parent ALEALQSNPK were also obtained by the de novo algorithm but are not shown as they are more likely to arise by random chance alone and are less informative. Using a minimum sequence length of 2 in the sequence tree (Fig. 1), 79% of the peaks in Fig. 4 are accounted for. The spectrum for the second example is shown in Fig. 5. The full-length parent peptide is ANHWLAQGAQPTDTAR, as determined independently by SEQUEST (Xcorr ¼ 4.34, with the second highest scoring peptide having a score of 0.96 and a deltcn of 0.78), along with our own sequence comparison and database search of tryptic peptides from the organism-specific sequence database. Additionally, multiple MS/MS data were collected and confirmed for this peptide on different days that resulted in SEQUEST scores up to 5.1. Again, the mass of the parent peptide was confirmed to be to within 1 ppm using a 11.5 Tesla ion-cyclotron resonance mass spectrometer 26. As shown in Table 3, the subsequence WLAQGAQ of the parent was obtained by the de novo algorithm for the y-series ions, b-series ions and additional series of peaks occurring 1 Th above both the y-ion series and b-ion series peaks. Again, the relative peak heights of the isotopic fragments are in approximate agreement with the expected isotopic distribution for the peptide. For the full-length peptide ANHWLAQ- GAQPTDTAR with a nominal mass of 1736, the first isotope of the peptide is expected to be approximately 92% as abundant as the monoisotopic peptide. The second isotope of the peptide is expected to be approximately 47% of the abundance of the monoisotopic peptide. In the series of peaks that consists of the first heavy isotope of the y-ion series (yionþ1 Th), the peak corresponding to (y 10 þ 1 Th) is missing. The effect of this is to break the 7-mer peptide consisting of eight peaks into a 2-mer of three peaks and a 3-mer consisting of four peaks. This reduces the significance of this series of peaks somewhat, but a method of accounting for the fulllength 7-mer, or the observation of any incomplete sequence, is discussed below. In addition to the sequence WLAQGAQ, the partial sequence WLA corresponding to the b-series ions at 17 Th was also observed which corresponds to loss of ammonia from b-series ions. Again, other shorter partial sequences that are consistent with the parent were also found but are not shown, as they are more likely to arise by random chance alone and are less informative. Using a minimum sequence length of 2 in the sequence tree (Fig. 1), 77% of the peaks in Fig. 5 are accounted for. DISCUSSION As can be seen from the data in Fig. 2 the presence of isotopic shifts in the y- and b-ion series are nearly as common as the y- and b-ions themselves and much more common than the a-, c-, x-, and z-ions and ions due to loss of neutral fragments such as ammonia or water. This may seem odd in that it leads one to wonder why b-ions would have a different isotope distribution from those of (b-h 2 O) or (b-nh 3 ) ions. And in fact the isotopic distributions of the peptide fragments are the same, barring small differences due to loss of water or ammonia. However, centroided mass spectrometer output does not report actual isotope distributions, but rather distributions that have been filtered because of the centroiding process. It is likely that, since there is more information regarding b-ions than (b-h 2 O) or (b-nh 3 ) ions in the spectra before the centroiding is done, the centroiding process can detect and process these isotopic peaks of the b-ions better than isotopic peaks of the (b-h 2 O) or (b-nh 3 ) ions. That is, there is likely more information in the raw spectrum (profile mode) for b-ions than for (b-nh 3 ) or (b-h 2 O) ions simply because b-ions are more abundant.

7 Peptide sequencing using isotope information in MS/MS spectra 1799 Figure 5. MS/MS spectrum for the peptide ANHWLAQGAQPTDTAR. (A) Major peaks identified. Again, isotopic resolution of ion series peaks, as demonstrated in Fig. 2, enables the peptide sequence to be discovered multiple times in the spectrum. (B) Closeup view of isotopic resolution of y 6 -, b 5 - and b 6 -ions along with peaks due to neutral loss of water and ammonia from these same ions. As a result, and as demonstrated in Tables 2 and 3, peptide sequencing from MS/MS data using the de novo analysis method described herein can be obtained in favorable cases with high confidence without resorting to chemical derivatization or isotopic enrichment. Although the method requires further development for use with precursor ions that have charges of either þ1orþ3, the method may result in a significant advantage as proteomic research becomes high throughput, since chemical derivatization of peptides increases the number of steps in the peptide/protein identification pipeline and requires twice as many spectra to be recorded, analyzed and stored. Furthermore, with unit m/z resolution of mass detectors available and advanced analysis tools, the need to isotopically label the C-terminus of peptides with 18 O is alleviated; 18 O-labeling, however, does provide additional information regarding which peaks in the spectrum are due to N-terminal fragments and which are due to C-terminal fragments. This is important for manual interpretation of the data, but less so for automated interpretation such as de novo methods. The computational method, used to analyze the data, naturally takes advantage of the additional presence of isotope information by calculating the probability that the peptide sequence or subsequences would show up in the data purely by chance. Any over-represented sequences in the sequence hierarchy (Fig. 1) will then score much higher.

8 1800 W. R. Cannon and K. D. Jarman Table 3. ANHWLAQGAQPTDTAR-associated peaks. With the exception of a single missing peak due to the þ1 isotope at y 10, the subsequence WLAQGAQ occurs four times in the spectrum shown in Fig. 5 because of the isotopic resolution of peptide fragments. In addition, smaller subsequences such as WLA, shown here associated with neutral loss of ammonia from the b-ions, can occur multiple times in the spectrum Amino acid m/z Relative intensity Ion type W Y13 L y12 A y11 Q y10 G y9 A y8 Q y7 Peak y6 W Y13 þ1 L Y12 þ1 A Y11 þ1 Q (Missing) G Y9 þ1 A Y8 þ1 Q y7 þ1 Peak y6 þ1 Q b10 A b9 G b8 Q b7 A b6 L b5 W b4 Peak b3 Q b10 þ1 A b9 þ1 G b8 þ1 Q b7 þ1 A b6 þ1 L b5 þ1 W b4 þ1 Peak b3 þ1 A b L b W b Peak b It should be noted that the problem being addressed here is the peptide-sequencing problem and not peptide identification. The peptide-sequencing problem is to derive the sequence of a peptide from the spectrum alone. The peptide-identification problem is a significantly easier problem of identifying the peptide from a list of candidate peptides that best explains the MS/MS spectrum. In the former approach, no prior knowledge of peptide sequence information is used and all distances between pairs of peaks in the spectrum are examined to see if they match the mass of an amino acid. A significant problem for all current de novo sequencing methods is that up to 50% of MS/MS spectra do not contain sufficient information for a complete sequence to be found. 29 For this reason, peptide sequencing de novo is also more computationally intensive than peptide identification from a list of possible peptides. One prudent scenario for high-throughput proteomic pipelines would be to use peptide identification software as a first pass to reliably identify as many peptides as possible, and to use peptidesequencing algorithms to tackle problematic spectra from which a peptide could not reliably be identified. One advantage of peptide sequencing compared to peptide identification lies in the inherent flexibility of the former. Clearly, peptide sequencing has the ability to discover sequences that are not present in sequence databases used as input to peptide identification programs. Furthermore, since this method does not rely specifically on the association of a peak with a particular ion series but rather on the presence of peaks that spell a peptide sequence or its subsequence multiple times, it can be adapted to analyze spectra that are derived from peptides with post-translational modifications. In fact, neither the modification nor the mass shift due to the modification need be known, but can potentially be determined from the analysis, as done in the program SALSA. 30 One complication in the analysis of peptides with post-translational modifications is that the chemical moieties, such as phosphate groups, are labile and can fragment off resulting in mass shifts in ion series peaks. The hierarchical nature of the chemical sequence of the peptide, including post-translational modifications, should allow this analytical method to be easily adapted to the problem. The main disadvantage of this approach as implemented here is that there is more reliance on sequential peak information. That is, an individual peak cannot be accounted for by itself. Instead, each peak must be part of a sequence of peaks, although the sequence can be as small as two. However, this reliance on sequential peaks can be alleviated somewhat by inferring a ghost parent sequence from the common child of two intermediate sequences. That is, missing parents in the sequence tree shown in Fig. 1 will not be observed by searching paths in the spectrum graph but can be inferred from the existence of their children. Consider the parents ABCDE in Fig. 1. If this peptide is not actually observed in the spectrum graph due to a missing peak, its presence can be inferred from its immediate children, ABCD and BCDE. The problem then is to align peptides ABCD and BCDE in order to infer ABCDE. However, this alignment is done automatically in the sequence tree through the common child BCD. The net effect of this is to allow for the presence of a parent sequence even though consecutive ion series peaks were not found for the entire sequence. This may somewhat alleviate the problem of incomplete peak information in MS/ MS spectra. Further research is needed into the problem of sequencing or identifying a peptide using de novo methods when faced with incomplete data. Acknowledgements This research was funded by the Office of Biological and Environmental Research in the US Department of Energy under contract 41966A and performed in the William R. Wiley Environmental Molecular Sciences Laboratory at the Pacific Northwest National Laboratory. The EMSL is funded by the Office of Biological and Environmental Research in the US Department of Energy. PNNL is operated by Battelle for the US Department of Energy under contract DE-AC06-76RLO 1830.

9 REFERENCES 1. Gavin A-C, Bösche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon A-M, Cruciat C-M, Remor M, Höfert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier M-A, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G. Nature 2002; 415: Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams S-L, Millar A, Taylor P, Bennett K, Boutilier K, Yang L-Y, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sørensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CWV, Figeys D, Tyers M. Nature 2002; 415: Link AJ, Eng J, Schieltz DM, Carmack E, Mize GJ, Morris DR, Garvik BM, Yates JR III. Nat. Biotechnol. 1999; 17: Pasa-Tolic L, Jensen PK, Anderson GA, Lipton MS, Peden KK, Martinovi S, Toli N, Bruce JE, Smith RD. J. Am. Chem. Soc. 1999; 121: Goodlett DR, Keller A, Watts JD, Newitt R, Yi EC, Purvine S, Eng JK, von Haller P, Aebersold R, Kolker E. Rapid Commun. Mass Spectrom. 2001; 15: Cardenas MS, van der Heeft E, de Jong APJM. Rapid Commun. Mass Spectrom. 1997; 11: Cagney G, Emili A. Nat. Biotechnol. 2002; 20: Desiderio DM, Kai M. Biomed. Mass Spectrom. 1983; 10: Qin J, Herring CJ, Zhang X. Rapid Commun. Mass Spectrom. 1998; 12: Rose K, Simona MG, Offord RE, Prior CP, Otto B, Thatcher DR. Biochem. J. 1983; 215: Gaskell SJ, Haroldsen PE, Reilly MH. Biomed. Environ. Mass Spectrom. 1988; 16: Schnolzer M, Jedrzejewski P, Lehmann WD. Electrophoresis 1996; 17: Shevchenko A, Chernushevich I, Ens W, Standing KG, Thomson B, Wilm M, Mann M. Rapid Commun. Mass Spectrom. 1997; 11: Peptide sequencing using isotope information in MS/MS spectra Takao T, Hori H, Okamoto K, Harada A, Kamachi M, Shimonishi Y. Rapid Commun. Mass Spectrom. 1991; 5: Whaley B, Caprioli RM. Biol. Mass Spectrom. 1991; 20: Oda Y, Huang K, Cross FR, Cowburn D, Chait BT. Proc. Natl. Acad. Sci. USA 1999; 96: Washburn MP, Ulaszek R, Deciu C, Schieltz DM, Yates JR III. Anal. Chem. 2002; 74: Conrads TP, Alving K, Veenstra TD, Belov ME, Anderson GA, Anderson DJ, Lipton MS, Pasa Tolic L, Udseth HR, Chrisler WB, Thrall BD, Smith RD. Anal. Chem. 2001; 73: Bartels C. Biomed. Environ. Mass Spectrom. 1990; 19: Dancik V, Addona TA, Clauser KR, Vath JE, Pevzner PA. J. Comput. Biol. 1999; 6: de-cossio JF, Gonzalez J, Satomi Y, Shima T, Okumura N, Besada V, Betancourt L, Padron G, Shimonishi Y, Takao T. Electrophoresis 2000; 21: Hines WM, Falick AM, Burlingame AL, Gibson BW. J. Am. Soc. Mass Spectrom. 1992; 3: Taylor JA, Johnson RS. Rapid Commun. Mass Spectrom. 1997; 11: Smith RD, Anderson GA, Lipton MS, Pasa-Tolic L, Shen Y, Conrads TP, Veenstra TD, Udseth HR. Proteomics 2002; 2: Eng K, McCormack AL, Yates JR III. J. Am. Soc. Mass Spectrom. 1994; 5: Harkewicz R, Belov ME, Anderson GA, Pasa-Tolic L, Masselon CD, Prior DC, Udseth HR, Smith RD. J. Am. Soc. Mass Spectrom. 2002; 13: Jarman KD, et al. A Model of Random Sequences for de novo Peptide Sequencing. InThird IEEE Symp. Bioinformatics and Bioengineering, Bethesda, MD, IEEE Computer Society, Senko MW, Beu SC, McLafferty FW. J. Am. Soc. Mass Spectrom. 1995; 6: Kinter M, Sherman NE. Protein Sequencing and Identification Using Tandem Mass Spectrometry. Wiley-Interscience Series on Mass Spectrometry, Wiley-Interscience: New York, 2000; xvi, Liebler DC, Hansen BT, Davey SW, Tiscareno L, Mason DE. Anal. Chem. 2002; 74: 203.

A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes

A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes Anal. Chem. 2003, 75, 768-774 A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes David Fenyo1 and Ronald C. Beavis* Genomic

More information

Modeling Mass Spectrometry-Based Protein Analysis

Modeling Mass Spectrometry-Based Protein Analysis Chapter 8 Jan Eriksson and David Fenyö Abstract The success of mass spectrometry based proteomics depends on efficient methods for data analysis. These methods require a detailed understanding of the information

More information

I-DIRT, A General Method for Distinguishing between Specific and Nonspecific Protein Interactions

I-DIRT, A General Method for Distinguishing between Specific and Nonspecific Protein Interactions I-DIRT, A General Method for Distinguishing between Specific and Nonspecific Protein Interactions Alan J. Tackett, Jeffrey A. DeGrasse, Matthew D. Sekedat, Marlene Oeffinger, Michael P. Rout, and Brian

More information

Physical and Functional Modularity of the Protein Network in Yeast*

Physical and Functional Modularity of the Protein Network in Yeast* Research Physical and Functional Modularity of the Protein Network in Yeast* Thomas Wilhelm, Heinz-Peter Nasheuer, and Sui Huang While protein-protein interactions have been studied largely as a network

More information

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry 17 th European Symposium on Computer Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 De Novo Peptide Identification Via Mixed-Integer Linear

More information

The influence of histidine on cleavage C-terminal to acidic residues in doubly protonated tryptic peptides

The influence of histidine on cleavage C-terminal to acidic residues in doubly protonated tryptic peptides International Journal of Mass Spectrometry 219 (2002) 233 244 The influence of histidine on cleavage C-terminal to acidic residues in doubly protonated tryptic peptides Yingying Huang a, Vicki H. Wysocki

More information

Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search

Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search Anal. Chem. 2002, 74, 5383-5392 Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search Andrew Keller,*, Alexey I. Nesvizhskii,*, Eugene Kolker,

More information

Proteomics. November 13, 2007

Proteomics. November 13, 2007 Proteomics November 13, 2007 Acknowledgement Slides presented here have been borrowed from presentations by : Dr. Mark A. Knepper (LKEM, NHLBI, NIH) Dr. Nathan Edwards (Center for Bioinformatics and Computational

More information

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Informatics Research Applied Biosystems Outline Proteomics context Tandem mass spectrometry Peptide fragmentation Peptide identification

More information

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry Ting Chen Department of Genetics arvard Medical School Boston, MA 02115, USA Ming-Yang Kao Department of Computer

More information

Identification of proteins by enzyme digestion, mass

Identification of proteins by enzyme digestion, mass Method for Screening Peptide Fragment Ion Mass Spectra Prior to Database Searching Roger E. Moore, Mary K. Young, and Terry D. Lee Beckman Research Institute of the City of Hope, Duarte, California, USA

More information

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were developed to allow the analysis of large intact (bigger than

More information

Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry

Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry Methods Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry Pavel A. Pevzner, 1,3 Zufar Mulyukov, 1 Vlado Dancik, 2 and Chris L Tang 2 Department of

More information

DE NOVO PEPTIDE SEQUENCING FOR MASS SPECTRA BASED ON MULTI-CHARGE STRONG TAGS

DE NOVO PEPTIDE SEQUENCING FOR MASS SPECTRA BASED ON MULTI-CHARGE STRONG TAGS DE NOVO PEPTIDE SEQUENCING FO MASS SPECTA BASED ON MULTI-CHAGE STONG TAGS KANG NING, KET FAH CHONG, HON WAI LEONG Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore

More information

Biological Pathway Completion Using Network Motifs and Random Walks on Graphs

Biological Pathway Completion Using Network Motifs and Random Walks on Graphs Biological Pathway Completion Using Network Motifs and Random Walks on Graphs Maya El Dayeh and Michael Hahsler Department of Computer Science and Engineering Southern Methodist University Dallas, TX,

More information

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software Supplementary Methods Software Interpretation of Tandem mass spectra Tandem mass spectra were extracted from the Xcalibur data system format (.RAW) and charge state assignment was performed using in house

More information

Computational Methods for Mass Spectrometry Proteomics

Computational Methods for Mass Spectrometry Proteomics Computational Methods for Mass Spectrometry Proteomics Eidhammer, Ingvar ISBN-13: 9780470512975 Table of Contents Preface. Acknowledgements. 1 Protein, Proteome, and Proteomics. 1.1 Primary goals for studying

More information

FOCUS: NOVEL APPROACHES TO PEPTIDE AND PROTEIN STRUCTURE

FOCUS: NOVEL APPROACHES TO PEPTIDE AND PROTEIN STRUCTURE FOCUS: NOVEL APPROACHES TO PEPTIDE AND PROTEIN STRUCTURE Computational Investigation and Hydrogen/Deuterium Exchange of the Fixed Charge Derivative Tris(2,4,6-Trimethoxyphenyl) Phosphonium: Implications

More information

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously Proteomics Sample prep 144 Lecture 5 Quantitation techniques Search Algorithms Proteomics

More information

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra Xiaowen Liu Department of BioHealth Informatics, Department of Computer and Information Sciences, Indiana University-Purdue

More information

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra Andrew Keller Outline Need to validate peptide assignments to MS/MS spectra Statistical approach to validation Running PeptideProphet

More information

Analysis of Peptide MS/MS Spectra from Large-Scale Proteomics Experiments Using Spectrum Libraries

Analysis of Peptide MS/MS Spectra from Large-Scale Proteomics Experiments Using Spectrum Libraries Anal. Chem. 2006, 78, 5678-5684 Analysis of Peptide MS/MS Spectra from Large-Scale Proteomics Experiments Using Spectrum Libraries Barbara E. Frewen, Gennifer E. Merrihew, Christine C. Wu, William Stafford

More information

Approximation Algorithms and Hardness Results for Shortest Path Based Graph Orientations

Approximation Algorithms and Hardness Results for Shortest Path Based Graph Orientations Approximation Algorithms and Hardness Results for Shortest Path Based Graph Orientations Dima Blokh 1, Danny Segev 2, and Roded Sharan 1 1 Blavatnik School of Computer Science, Tel Aviv University, Tel

More information

Tutorial 1: Setting up your Skyline document

Tutorial 1: Setting up your Skyline document Tutorial 1: Setting up your Skyline document Caution! For using Skyline the number formats of your computer have to be set to English (United States). Open the Control Panel Clock, Language, and Region

More information

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry by Xi Han A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree

More information

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra Andrew Keller Day 2 October 17, 2006 Andrew Keller Rosetta Bioinformatics, Seattle Outline Need to validate peptide assignments to MS/MS

More information

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS 1 Yan Yan Department of Computer Science University of Western Ontario, Canada OUTLINE Background Tandem mass spectrometry

More information

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA LECTURE-25 Quantitative proteomics: itraq and TMT TRANSCRIPT Welcome to the proteomics course. Today we will talk about quantitative proteomics and discuss about itraq and TMT techniques. The quantitative

More information

Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS

Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS Jonathan P. Williams, Jeffery M. Brown, Stephane Houel, Ying Qing Yu, and Weibin Chen Waters Corporation,

More information

Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification by Mass Spectrometry Protein Sequencing and Identification by Mass Spectrometry Tandem Mass Spectrometry De Novo Peptide Sequencing Spectrum Graph Protein Identification via Database Search Identifying Post Translationally

More information

Chapter 5. Complexation of Tholins by 18-crown-6:

Chapter 5. Complexation of Tholins by 18-crown-6: 5-1 Chapter 5. Complexation of Tholins by 18-crown-6: Identification of Primary Amines 5.1. Introduction Electrospray ionization (ESI) is an excellent technique for the ionization of complex mixtures,

More information

An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry

An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry Haipeng Wang, Yan Fu, Ruixiang Sun, Simin He, Rong Zeng, and Wen Gao Pacific Symposium on Biocomputing

More information

Methods for proteome analysis of obesity (Adipose tissue)

Methods for proteome analysis of obesity (Adipose tissue) Methods for proteome analysis of obesity (Adipose tissue) I. Sample preparation and liquid chromatography-tandem mass spectrometric analysis Instruments, softwares, and materials AB SCIEX Triple TOF 5600

More information

Mass spectrometry and proteomics Steven P Gygi* and Ruedi Aebersold

Mass spectrometry and proteomics Steven P Gygi* and Ruedi Aebersold 489 Mass spectrometry and proteomics Steven P Gygi* and Ruedi Aebersold Proteomics is the systematic analysis of the proteins expressed by a cell or tissue, and mass spectrometry is its essential analytical

More information

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent concentrations of PcTS (100 µm, blue; 500 µm, green; 1.5 mm,

More information

A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics

A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics Yan Fu,2, Qiang Yang 3, Charles X. Ling 4, Haipeng Wang, Dequan Li, Ruixiang Sun 2, Hu Zhou 5, Rong Zeng 5, Yiqiang Chen, Simin

More information

Applications of Mass Spectrometry for Biotherapeutic Characterization

Applications of Mass Spectrometry for Biotherapeutic Characterization Applications of Mass Spectrometry for Biotherapeutic Characterization Case Studies of Disulfide Characterization and Separation free Modes of Analysis Steven L. Cockrill Amgen Colorado Analytical Sciences

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons. Supplementary Figure 1 Fragment indexing allows efficient spectra similarity comparisons. The cost and efficiency of spectra similarity calculations can be approximated by the number of fragment comparisons

More information

NovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing

NovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing Anal. Chem. 2005, 77, 7265-7273 NovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing Bernd Fischer, Volker Roth, Franz Roos, Jonas Grossmann, Sacha Baginsky, Peter Widmayer, Wilhelm Gruissem,

More information

High-throughput, global proteomics assays are

High-throughput, global proteomics assays are Evaluation of the Influence of Amino Acid Composition on the Propensity for Collision-Induced Dissociation of Model Peptides Using Molecular Dynamics Simulations William R. Cannon, a Danny Taasevigen,

More information

MS-MS Analysis Programs

MS-MS Analysis Programs MS-MS Analysis Programs Basic Process Genome - Gives AA sequences of proteins Use this to predict spectra Compare data to prediction Determine degree of correctness Make assignment Did we see the protein?

More information

Technical Note. Introduction

Technical Note. Introduction Technical Note Analysis and Characterization of Psilocybin and Psilocin Using Liquid Chromatography - Electrospray Ionization Mass Spectrometry (LC-ESI-MS) with Collision-Induced-Dissociation (CID) and

More information

Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry

Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry Jonathan Williams Waters Corporation, Milford, MA, USA A P P L I C AT

More information

SRM assay generation and data analysis in Skyline

SRM assay generation and data analysis in Skyline in Skyline Preparation 1. Download the example data from www.srmcourse.ch/eupa.html (3 raw files, 1 csv file, 1 sptxt file). 2. The number formats of your computer have to be set to English (United States).

More information

PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search

PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Yunhu Wan, Austin Yang, and Ting Chen*, Department of Mathematics, Department of Pharmaceutical Sciences, and

More information

Roles for the Two-hybrid System in Exploration of the Yeast Protein Interactome*

Roles for the Two-hybrid System in Exploration of the Yeast Protein Interactome* Reviews/Perspectives Roles for the Two-hybrid System in Exploration of the Yeast Protein Interactome* Takashi Ito, Kazuhisa Ota, Hiroyuki Kubota, Yoshihiro Yamaguchi, Tomoko Chiba, Kazumi Sakuraba, and

More information

TUTORIAL EXERCISES WITH ANSWERS

TUTORIAL EXERCISES WITH ANSWERS TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129

More information

Parallel Algorithms For Real-Time Peptide-Spectrum Matching

Parallel Algorithms For Real-Time Peptide-Spectrum Matching Parallel Algorithms For Real-Time Peptide-Spectrum Matching A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science

More information

Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization

Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization Seungjin Na and Eunok Paek* Department of Mechanical and Information Engineering, University of Seoul, Seoul, Korea

More information

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database Overview - MS Proteomics in One Slide Obtain protein Digest into peptides Acquire spectra in mass spectrometer MS masses of peptides MS/MS fragments of a peptide Results! Match to sequence database 2 But

More information

Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry

Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry Ari Frank,*, Stephen Tanner, Vineet Bafna, and Pavel Pevzner Department of Computer Science & Engineering, University of California,

More information

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia De Novo Peptide Sequencing via Tandem Mass Spectrometry and Propositional Satisfiability Renato Bruni bruni@diei.unipg.it or bruni@dis.uniroma1.it University of Perugia I FIMA International Conference

More information

Supplementary Material for: Clustering Millions of Tandem Mass Spectra

Supplementary Material for: Clustering Millions of Tandem Mass Spectra Supplementary Material for: Clustering Millions of Tandem Mass Spectra Ari M. Frank 1 Nuno Bandeira 1 Zhouxin Shen 2 Stephen Tanner 3 Steven P. Briggs 2 Richard D. Smith 4 Pavel A. Pevzner 1 October 4,

More information

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion s s Key questions of proteomics What proteins are there? Bioinformatics 2 Lecture 2 roteomics How much is there of each of the proteins? - Absolute quantitation - Stoichiometry What (modification/splice)

More information

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables PROTEOME DISCOVERER Workflow concept Data goes through the workflow Spectra Peptides Quantitation A Node contains an operation An edge represents data flow The results are brought together in tables Protein

More information

Bernhard Spengler Institute of Inorganic and Analytical Chemistry, Justus Liebig University Giessen, Giessen, Germany

Bernhard Spengler Institute of Inorganic and Analytical Chemistry, Justus Liebig University Giessen, Giessen, Germany De Novo Sequencing, Peptide Composition Analysis, and Composition-Based Sequencing: A New Strategy Employing Accurate Mass Determination by Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Bernhard

More information

Computational Analysis of Mass Spectrometric Data for Whole Organism Proteomic Studies

Computational Analysis of Mass Spectrometric Data for Whole Organism Proteomic Studies University of Tennessee, Knoxville Trace: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 5-2006 Computational Analysis of Mass Spectrometric Data for Whole Organism Proteomic

More information

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

PC235: 2008 Lecture 5: Quantitation. Arnold Falick PC235: 2008 Lecture 5: Quantitation Arnold Falick falickam@berkeley.edu Summary What you will learn from this lecture: There are many methods to perform quantitation using mass spectrometry (any method

More information

Electrospray ionization mass spectrometry (ESI-

Electrospray ionization mass spectrometry (ESI- Automated Charge State Determination of Complex Isotope-Resolved Mass Spectra by Peak-Target Fourier Transform Li Chen a and Yee Leng Yap b a Bioinformatics Institute, 30 Biopolis Street, Singapore b Davos

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University Protein Quantitation II: Multiple Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional Affinity-based proteomics Use antibodies to quantify proteins Western Blot Immunohistochemistry

More information

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University Protein Quantitation II: Multiple Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional Affinity-based proteomics Use antibodies to quantify proteins Western Blot RPPA Immunohistochemistry

More information

ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY

ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY PEDRO ALVES, 1 RANDY J. ARNOLD, 2 MILOS V. NOVOTNY, 2 PREDRAG RADIVOJAC, 1 JAMES P. REILLY, 2 HAIXU TANG 1, 3* 1) School

More information

Biological Mass Spectrometry

Biological Mass Spectrometry Biochemistry 412 Biological Mass Spectrometry February 13 th, 2007 Proteomics The study of the complete complement of proteins found in an organism Degrees of Freedom for Protein Variability Covalent Modifications

More information

De Novo Peptide Sequencing

De Novo Peptide Sequencing De Novo Peptide Sequencing Outline A simple de novo sequencing algorithm PTM Other ion types Mass segment error De Novo Peptide Sequencing b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 A NELLLNVK AN ELLLNVK ANE LLLNVK

More information

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4 High-Resolution Mass spectrometry (HR-MS, HRAM-MS) (FT mass spectrometry) MS that enables identifying elemental compositions (empirical formulas) from accurate m/z data 9.05.2017 1 Atomic masses (atomic

More information

Learning Score Function Parameters for Improved Spectrum Identification in Tandem Mass Spectrometry Experiments

Learning Score Function Parameters for Improved Spectrum Identification in Tandem Mass Spectrometry Experiments pubs.acs.org/jpr Learning Score Function Parameters for Improved Spectrum Identification in Tandem Mass Spectrometry Experiments Marina Spivak, Michael S. Bereman, Michael J. MacCoss, and William Stafford

More information

Last updated: Copyright

Last updated: Copyright Last updated: 2012-08-20 Copyright 2004-2012 plabel (v2.4) User s Manual by Bioinformatics Group, Institute of Computing Technology, Chinese Academy of Sciences Tel: 86-10-62601016 Email: zhangkun01@ict.ac.cn,

More information

PROTEIN SEQUENCING AND IDENTIFICATION USING TANDEM MASS SPECTROMETRY

PROTEIN SEQUENCING AND IDENTIFICATION USING TANDEM MASS SPECTROMETRY PROTEIN SEQUENCING AND IDENTIFICATION USING TANDEM MASS SPECTROMETRY Michael Kinter Department of Cell Biology Lerner Research Institute Cleveland Clinic Foundation Nicholas E. Sherman Department of Microbiology

More information

MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples p.1

MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples p.1 MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples Parminder Kaur Bogdan Budnik, Konstantin Aizikov and Peter B. O Connor, Department of Electrical and Computer Engineering,

More information

Proteome Informatics. Brian C. Searle Creative Commons Attribution

Proteome Informatics. Brian C. Searle Creative Commons Attribution Proteome Informatics Brian C. Searle searleb@uw.edu Creative Commons Attribution Section structure Class 1 Class 2 Homework 1 Mass spectrometry and de novo sequencing Database searching and E-value estimation

More information

MS-based proteomics to investigate proteins and their modifications

MS-based proteomics to investigate proteins and their modifications MS-based proteomics to investigate proteins and their modifications Francis Impens VIB Proteomics Core October th 217 Overview Mass spectrometry-based proteomics: general workflow Identification of protein

More information

MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data

MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data Timothy Lee 1, Rahul Singh 1, Ten-Yang Yen 2, and Bruce Macher 2 1 Department

More information

Liangyi Zhang and James P. Reilly* Department of Chemistry, Indiana University, 800 East Kirkwood Avenue, Bloomington, Indiana 47405

Liangyi Zhang and James P. Reilly* Department of Chemistry, Indiana University, 800 East Kirkwood Avenue, Bloomington, Indiana 47405 De Novo Sequencing of Tryptic Peptides Derived from Deinococcus radiodurans Ribosomal Proteins Using 157 nm Photodissociation MALDI TOF/TOF Mass Spectrometry Liangyi Zhang and James P. Reilly* Department

More information

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * 1 Department of Chemistry, Pomona College, Claremont, California

More information

Identifying the proteome: software tools David Fenyö

Identifying the proteome: software tools David Fenyö 391 Identifying the proteome: software tools David Fenyö The interest in proteomics has recently increased dramatically and proteomic methods are now applied to many problems in cell biology. The method

More information

MS/MS of Peptides Manual Sequencing of Protonated Peptides

MS/MS of Peptides Manual Sequencing of Protonated Peptides S/S of Peptides anual Sequencing of Protonated Peptides Árpád Somogyi Associate irector CCIC, ass Spectrometry and Proteomics Laboratory SU July 11, 2018 Peptides Product Ion Scan Product ion spectra contain

More information

Chapter 4. strategies for protein quantitation Ⅱ

Chapter 4. strategies for protein quantitation Ⅱ Proteomics Chapter 4. strategies for protein quantitation Ⅱ 1 Multiplexed proteomics Multiplexed proteomics is the use of fluorescent stains or probes with different excitation and emission spectra to

More information

There has been growing effort to study modified

There has been growing effort to study modified Charge Effects for Differentiation of Oligodeoxynucleotide Isomers Containing 8-oxo-dG Residues SHORT COMMUNICATION Hai Luo, Mary S. Lipton, and Richard D. Smith Environmental Molecular Sciences Laboratory,

More information

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011 Proteome-wide label-free quantification with MaxQuant Jürgen Cox Max Planck Institute of Biochemistry July 2011 MaxQuant MaxQuant Feature detection Data acquisition Initial Andromeda search Statistics

More information

A New Hybrid De Novo Sequencing Method For Protein Identification

A New Hybrid De Novo Sequencing Method For Protein Identification A New Hybrid De Novo Sequencing Method For Protein Identification Penghao Wang 1*, Albert Zomaya 2, Susan Wilson 1,3 1. Prince of Wales Clinical School, University of New South Wales, Kensington NSW 2052,

More information

SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry

SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry pubs.acs.org/jpr SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry Wenzhou Li, Li Ji, Jonathan Goya, Guanhong Tan, and Vicki H. Wysocki* Department of Chemistry

More information

Intensity-based protein identification by machine learning from a library of tandem mass spectra

Intensity-based protein identification by machine learning from a library of tandem mass spectra Intensity-based protein identification by machine learning from a library of tandem mass spectra Joshua E Elias 1,Francis D Gibbons 2,Oliver D King 2,Frederick P Roth 2,4 & Steven P Gygi 1,3,4 Tandem mass

More information

A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry. BINGWEN LU and TING CHEN ABSTRACT

A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry. BINGWEN LU and TING CHEN ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 10, Number 1, 2003 Mary Ann Liebert, Inc. Pp. 1 12 A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry BINGWEN LU and TING CHEN ABSTRACT

More information

Isotope correction of mass spectrometry profiles

Isotope correction of mass spectrometry profiles RAPID COMMUNICATIONS IN MASS SPECTROMETRY Rapid Commun. Mass Spectrom. 2008; 22: 2248 2252 Published online in Wiley InterScience (www.interscience.wiley.com).3591 Isotope correction of mass spectrometry

More information

The Pitfalls of Peaklist Generation Software Performance on Database Searches

The Pitfalls of Peaklist Generation Software Performance on Database Searches Proceedings of the 56th ASMS Conference on Mass Spectrometry and Allied Topics, Denver, CO, June 1-5, 2008 The Pitfalls of Peaklist Generation Software Performance on Database Searches Aenoch J. Lynn,

More information

TANDEM mass spectrometry (MS/MS) is an essential and

TANDEM mass spectrometry (MS/MS) is an essential and IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 2, NO. 3, JULY-SEPTEMBER 2005 217 Predicting Molecular Formulas of Fragment Ions with Isotope Patterns in Tandem Mass Spectra Jingfen

More information

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics John R. Rose Computer Science and Engineering University of South Carolina 1 Overview Background Information Theoretic

More information

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics Chih-Chiang Tsou 1,2, Dmitry Avtonomov 2, Brett Larsen 3, Monika Tucholska 3, Hyungwon Choi 4 Anne-Claude Gingras

More information

MALDI-HDMS E : A Novel Data Independent Acquisition Method for the Enhanced Analysis of 2D-Gel Tryptic Peptide Digests

MALDI-HDMS E : A Novel Data Independent Acquisition Method for the Enhanced Analysis of 2D-Gel Tryptic Peptide Digests -HDMS E : A Novel Data Independent Acquisition Method for the Enhanced Analysis of 2D-Gel Tryptic Peptide Digests Emmanuelle Claude, 1 Mark Towers, 1 and Rachel Craven 2 1 Waters Corporation, Manchester,

More information

Yifei Bao. Beatrix. Manor Askenazi

Yifei Bao. Beatrix. Manor Askenazi Detection and Correction of Interference in MS1 Quantitation of Peptides Using their Isotope Distributions Yifei Bao Department of Computer Science Stevens Institute of Technology Beatrix Ueberheide Department

More information

HOWTO, example workflow and data files. (Version )

HOWTO, example workflow and data files. (Version ) HOWTO, example workflow and data files. (Version 20 09 2017) 1 Introduction: SugarQb is a collection of software tools (Nodes) which enable the automated identification of intact glycopeptides from HCD

More information

Mass Spectrometry. Hyphenated Techniques GC-MS LC-MS and MS-MS

Mass Spectrometry. Hyphenated Techniques GC-MS LC-MS and MS-MS Mass Spectrometry Hyphenated Techniques GC-MS LC-MS and MS-MS Reasons for Using Chromatography with MS Mixture analysis by MS alone is difficult Fragmentation from ionization (EI or CI) Fragments from

More information

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction Mass Spectrometry Based De Novo Peptide Sequencing Error Correction by Chenyu Yao A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics

More information

CSE182-L8. Mass Spectrometry

CSE182-L8. Mass Spectrometry CSE182-L8 Mass Spectrometry Project Notes Implement a few tools for proteomics C1:11/2/04 Answer MS questions to get started, select project partner, select a project. C2:11/15/04 (All but web-team) Plan

More information

Chapter 1. Introduction

Chapter 1. Introduction 1-1 Chapter 1. Introduction 1.1. Background Non covalent interactions are important for the structures and reactivity of biological molecules in the gas phase as well as in the solution phase. 1 It is

More information

Tandem Mass Spectrometry: Generating function, alignment and assembly

Tandem Mass Spectrometry: Generating function, alignment and assembly Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate

More information

Powerful Scan Modes of QTRAP System Technology

Powerful Scan Modes of QTRAP System Technology Powerful Scan Modes of QTRAP System Technology Unique Hybrid Triple Quadrupole Linear Ion Trap Technology Provides Powerful Workflows to Answer Complex Questions with No Compromises While there are many

More information

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data Anthony J Bonner Han Liu Abstract This paper addresses a central problem of Proteomics: estimating the amounts of each of

More information

Choosing the metabolomics platform

Choosing the metabolomics platform GBS 748 Choosing the metabolomics platform Stephen Barnes, PhD 4 7117; sbarnes@uab.edu So, I have my samples what s next? You ve collected your samples and you may have extracted them Protein precipitation

More information