Algorithms for Identifying Protein Cross-Links via Tandem Mass Spectrometry ABSTRACT
|
|
- Bruno Morgan
- 6 years ago
- Views:
Transcription
1 JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 6, 2001 Mary Ann Liebert, Inc. Pp Algorithms for Identifying Protein Cross-Links via Tandem Mass Spectrometry TING CHEN, 1 JACOB D. JAFFE, 2 and GEORGE M. CHURCH 3 ABSTRACT Cross-linking technology combined with tandem mass spectrometry (MS-MS) is a powerful method that provides a rapid solution to the discovery of protein protein interactions and protein structures. We studied the problem of detecting cross-linked peptides and crosslinked amino acids from tandem mass spectral data. Our method consists of two steps: the rst step nds two protein subsequences whose mass sum equals a given mass measured from the mass spectrometry; and the second step nds the best cross-linked amino acids in these two peptide sequences that are optimally correlated to a given tandem mass spectrum. We designed fast and space-ef cient algorithms for these two steps and implemented and tested them on experimental data of cross-linked hemoglobin proteins. An interchain crosslink between two subunits was found in two tandem mass spectra. The length of the cross-linker (7.7 Å ) is very close to the actual distance (8.18 Å ) obtained from the molecular structure in PDB. Key words: proteomics, mass spectrometry, algorithms, protein cross-linking, protein-protein interactions, protein folding. 1. INTRODUCTION In recent years, more and more genomes of model organisms have been sequenced. Using these genomic sequences, researchers have focused on the identi cation of genes on the genome, the study of gene regulation and gene regulatory networks, the discovery of signal transduction pathways, the determination of protein structures, the detection of protein protein, protein DNA, and protein metabolite interactions, and the elucidation of functions of genes and their protein products. A method which combines chemical cross-linking of proteins with mass spectrometry or tandem mass spectrometry may be useful in discovering protein complexes, determining their structure (Young et al., 2000) and/or quantitating in vivo concentrations. This paper focuses on new algorithms for interpretation of complex experimental data generated by protein cross-linking and tandem mass spectrometry. Traditionally, three-dimensional structures of proteins are solved by x-ray crystallography and NMR. However, generating an accurate structure that satis es constraints of experimental data can be extremely 1 Department of Molecular Biology, University of Southern California, Los Angeles, CA Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA Department of Genetics, Harvard Medical School, Boston, MA
2 572 CHEN ET AL. dif cult. There has been some success in other computational methods that predict structures from energy functions, multiple alignments, and threading. However, the accuracy and general applicability of these methods lags far behind the rates at which new protein sequences are being identi ed. Classical methods of studying protein protein interactions often scale up poorly, make assumptions about number of interacting proteins or their modi cation state, or lose details about environmental effects on complex formation. Cross-linking technology combined with mass spectrometry provides an alternative approach to detecting protein protein interactions and adding reliable distance constraints to protein structures. Previous studies of cross-linking have been able to produce low resolution interatomic distance constraints, which in conjunction with threading, has led to the determination of the three-dimensional structure of a model protein (Young et al., 2000). A technique called isotopic labeling (Kellogg et al., 2001) has been proposed to separate peptides and cross-links in mass spectrometry. Through a proper design of labeling, the crosslinks can be easily identi ed by checking their masses in the spectra. Similar techniques can be applied to identify the docking structure of two interacting proteins. However, these ideas mostly focus on the individual protein or interaction, and it is not clear how to scale them up to the whole proteome level. In this paper, we applied tandem mass spectrometry to detect cross-linked peptides and cross-linked amino acids, which has the potential for large scale applications. Tandem mass spectrometry plays a powerful role in the identi cation of cross-linking sites. Combined with high performance liquid chromatography (HPLC), tandem mass spectrometry has been widely used to identify peptides and analyze protein sequences. Peptides, i.e., of an amino acid sequence HNHCHR 1 CO- NHCHR 2 CO- -NHCHR n COOH, resulting from proteolytic digestion of proteins are separated by HPLC and then analyzed in a mass spectrometer. The latter is a two step process which consists of measuring the massto-charge (m/z) ratio of an ionized peptide (the parent ion) and then measuring the m/z of fragmentation products (the daughter ions) of the peptide after collision-induced dissociation (CID). For each molecule of this peptide, a randomly selected peptide bond is broken, and as a result, CID produces a ladder of ions, typically N-terminal b ions (HNHCHR 1 CO- -NHCHR i CO C ) and C-terminal y ions (HNHCHR i+1 CO- -NHCHR n COOH C H C ), where two consecutive b-ions (or y-ions) differ by exactly one amino acid. These ions display a characteristic pattern for this peptide on the tandem mass spectrum. Computer programs such as SEQUEST (Eng et al., 1994) correlate peptide sequences in a protein database with the tandem mass spectrum and report all peptides with signi cant correlation scores. Some programs (Pevzner et al., 2000) allow searching the database with amino acids modi cations. An alternative approach (Taylor and Johnson, 1997; Dancik et al., 2000; Chen et al., 2001), called de novo peptide sequencing, extracts candidate peptide sequences from the spectral data before they are validated and/or available in a database. We performed an experiment that rst chemically cross-linked interacting proteins, then digested them proteolytically (using trypsin), and nally separated and identi ed cross-linking sites through HPLC-tandem mass spectrometry. This paper focuses on the algorithmic solutions to the identi cation of cross-linked peptides from the tandem mass spectral data. Cross-link structures are shown in Fig. 1. Figure 1(a) shows an H-structure cross-link where two amino acids in two peptides are connected to a linker by covalent bonds, Fig. 1(b) shows that the linker may simply decorate some amino acid of a peptide, Fig. 1(c) shows a self-cross-linked peptide where two amino acids on this peptide are connected by a linker. Figure 1(d) shows two doubly cross-linked peptides. Our main interest is to identify an H-structure cross-link (Fig. 1(a)). Our approach to nding this crosslink from a tandem mass spectrum uses the following two steps: ² Step 1: Given the mass of the parent cross-link molecules measured in mass spectrometry, nd every pair of peptides whose mass sum (plus the mass of the linker) equals this mass. ² Step 2: Given a pair of peptides, nd the cross-linked amino acid pair that is optimally correlated to the tandem mass spectrum. Finally, the algorithm reports the pair of peptides with the maximum correlation score. For simplicity, assuming that we are given k protein sequences each with n proteolytic amino acids (the targets for proteolytic digestion, i.e., for trypsin, they are Ks and Rs), we can ² Solve Step 1 in O.kn 2 log.kn// time and O.kn/ space.
3 IDENTIFYING PROTEIN CROSS-LINKS 573 FIG. 1. Four cross-link structures: (a) two cross-linked peptides, (b) one decorated peptide, (c) one self cross-linked peptide, (d) two doubly cross-linked peptides. If two peptide sequences of m amino acids are given and the tandem mass spectrum has h mass peaks, we can ² Solve Step 2 in O.m C h/ time and O.m C h/ space. The algorithm does not depend on the number of possible cross-linked amino acid pairs. Our paper is organized as follows. Section 2 describes the algorithm to nd two protein subsequences whose mass sum equals a given mass. Section 3 describes the algorithm of nding the best cross-linking site between two peptides. Section 4 discusses implementation issues and the probabilities of solutions. Section 5 reports the implementation of our algorithms and the test on hemoglobin cross-linking experimental data. 2. IDENTIFYING TWO SUBSEQUENCES FOR A MASS SUM There are n.n C 1/=2 possible subsequences corresponding to a protein with n proteolytic amino acids. For example, trypsin cuts a protein sequence right after a lysine(k) or an arginine(r). The exact place of each cut depends on the distribution of Ks and Rs: if two Ks or Rs are very close, the rst one may not be cut. Moreover, local sequence in uences and modi cations may protect some Ks and Rs from being cleaved by trypsin. Therefore, two protein sequences have O.n 4 / possible pairs of subsequences. To speed up the computation, we translate a protein sequence P into an array A of nc1 masses, each of which corresponds to a unique subsequence between two adjacent proteolytic amino acids. For example, P =NRDNKT, when trypsin is used for digestion, is translated into an array A D.A 1 ; A 2 ; A 3 / where A 1 D the mass for NR, A 2 D the mass for DNK, and A 3 D the mass for T. Thus, any proteolytic subsequence
4 574 CHEN ET AL. of P has the mass sum equal to the sum of the elements in the corresponding interval of A. For example, the subsequence DNKT has the mass of A 2 C A 3. Since the mass of the linker is xed, we focus on nding two protein subsequences for a given mass M, which is equivalent to the Subsequence Sum Problem (SSP) de ned below. De nition 1. Subsequence Sum Problem (SSP). Given two positive n-number sequences A D.a 1 ; : : : ; a n / and B D.b 1 ; : : : ; b n /, and a number M, nd all possible pairs of subsequences,.a i ; : : : ; a j /, 1 i j n, and.b k ; : : : ; b l /, 1 k l n, such that jx a s C sdi lx b t D M: tdk First, we consider an n-number sequence A and a mass M, and we are asked to nd a subsequence of A such that its sum equals M. The following Algorithm Find-A solves this problem. Algorithm Find-A.A; M/ 1. i D 1, j D 1; # 1st seq:.a i ; ; a j / D.a 1 / 2. sum D a 1 ; # sum is the sum of.a 1 / 3. While j < n or sum M 4. If sum D M # Solution: sum D M 5 then output.i; j/; 6. If sum < M # Add a jc1 to.a i ; ; a j / 7. then j D min.j C 1; n/, sum D sum C a j ; 8. Else # Delete a i from.a i ; ; a j / 9. sum D sum a i, i D i C 1. Theorem 1. Given a number M and a positive number sequence A D.a 1 ; ; a n /, Algorithm Find-A nds all the subsequences.a i ; ; a j /; 1 i j n, satisfying P j sdi a s D M, in O.n/ time and O.n/ space. Proof. Straightforward. Following Algorithm Find-A and Theorem 1, we have Lemma 2. The SSP can be solved in O.n 3 / time and O.n C p/ space, where p is the number of solutions, and the number of solutions for SSP is at most O.n 3 /. Proof. For every subsequence a i ; : : : ; a j of A, Algorithm Find-A nds all subsequences of B that satisfy P l tdk b t D M P j sdi a s in O.n/ time. There are O.n 2 / subsequences of A, and thus SSP can be solved in O.n 3 / time. This algorithm stores all the solutions in O.p/ space and requires O.n/ space for A and B, a total of O.n C p/ space. Since there are at most O.n/ solutions (subsequences of B ) for every subsequence of A, The number of solutions for SSP is at most O.n 3 /. If O.n 2 / space is allowed, then the following holds. Lemma 3. The SSP can be solved in O.n 2 log n C p/ time and O.n 2 C p/ space, where p is the number of solutions. Proof. We compute the sums of all subsequences of B in O.n 2 / time and store them into an array of O.n 2 / space. Then, the sums are sorted in O.n 2 log n/ time. For every subsequence a i ; : : : ; a j of A, we can nd all the subsequences of B that satisfy P l tdk b t D M P j sdi a s in O.log n C q/ time using binary search, where q is the number of solutions. If p is the total number of solutions, nding all of them takes O.n 2 log n C p/ time and O.n 2 C p/ space.
5 IDENTIFYING PROTEIN CROSS-LINKS 575 Algorithm Find-AB.A; B; M/ 1. S A D f.a i ; ; a j / j. P j sdi a s < M PjC1 sdi a s; j < n/ or. P n sdi a s < M/g; 2. S B D f.b k /; k D 1; ; ng; 3. While (S A 6D ;) 4. Find.a i ; ; a j / 2 S A where v D P j sdi a s is the maximum sum; 5. For every.b k ; ; b l / 2 S B and P l tdk b t < M v 6. Delete.b k ; ; b l / from S B ; 7. If.l < n/, then add.b k ; ; b lc1 / into S B ; 8. For every.b k ; ; b l / 2 S B and P l tdk b t D M v 9. Output i; j; k; l; 10. Delete.a i ; ; a j / from S A ; 11. If i < j, then add.a i ; ; a j 1 / into S A. For every a i, Step 1 nds the longest subsequence.a i ; ; a j / that satis es P j sdi a s < M for every i. Any subsequence of A to be in a solution must have the sum less than M and be a pre x subsequence of some element in S A. Then, Step 2 nds the shortest subsequences of B. Any subsequence of B to be in a solution must be a pre x supersequence of some element in S B. Steps 3 11 check every subsequence of A with sum less than M in decreasing order and search its corresponding solutions in B. Step 4 nds.a i ; : : : ; a j / 2 S A with the maximum sum v. Steps 5 7 delete every element.b k ; : : : ; b l / 2 S B such that P l tdk b t < M v and replace it by its immediate pre x supersequence.b k ; : : : ; b lc1 /. Steps 8 9 nd every element.b k ; : : : ; b l / 2 S B such that P l tdk b t D M v in S B and report the solutions.a i ; : : : ; a j / and.b k ; : : : ; b l /. After all solutions for.a i ; : : : ; a j / are found, Steps replace them by the immediate pre x subsequence.a i ; : : : ; a j 1 / in S A. Theorem 4. Algorithm Find-AB solves SSP in O.n 2 logn C p log n/ time and O.n C p/ space, where p is the number of solutions. Proof. Obviously, every output of Algorithm Find-AB is correct and unique. Assume that.a i ; ; a j / and.b k ; ; b l / satisfy P j sdi a s C P l tdk b t D M. We show that the algorithm nds this solution. In Step 1, let.a i ; ; a j 0/ 2 S A, which is the longest subsequence of A that starts with a i and has a sum less than M. Because P j sdi a s < M, it is true that j j 0. The algorithm will check.a i ; ; a j 0/ (Step 4) and all its pre x subsequences including.a i ; ; a j / (Steps 10 11) in length-decreasing order. Similarly, every subsequence of B starting with b k, including.b k ; ; b l /, is checked in Steps 5, 6, and 7 in length-increasing order. Let Step 4 generate a series of v-values: v 1 ; v 2 ;. This series is in decreasing order, because Step 4 always nds an element in S A with the maximum sum and Steps 8 9 replace it by a subsequence with a smaller sum. Step 5 deletes subsequences of B that have a sum less than M v 1, M v 2, at each iteration, respectively, and Steps 6 7 always replace them with subsequences of larger sums. Let w D P j sdi a s. When the algorithm is deleting.a i ; ; a j / in Step 10, all the subsequences of B with a sum less than M w would have been deleted. Thus,.b k ; ; b l / is present in S B because P l tdk b t D M w. The algorithm should nd it and report the solution.a i ; ; a j / and.b k ; ; b l /. Both S A and S B maintain O.n/ elements throughout the algorithm. We store them in a data structure called a Red-Black Tree (Cormen et al., 1990). This is a binary tree that allows retrieval, addition, and deletion in O.log n/ time while keeping the tree height O.log n/ through ef cient tree rotations. Thus, the algorithm requires only O.n C p/ space. In Steps 3 11, every subsequence of A is checked and deleted at most once. Every subsequence of B is checked and deleted at most once in Steps 5 9 if it is not in any solution. If a subsequence is in r 1 solutions, it will be checked exactly r C 1 times. It takes O.log n/ time for every retrieval in Steps 4, 5, and 8, every deletion in Steps 6 and 10, and every addition in Steps 7 and 11. The total running time is O.n 2 log n C p log n/.
6 576 CHEN ET AL. Algorithm Find-AB requires basically only O.n/ space if the solutions are not stored. Similar algorithms can be applied to solve the SSP problem with k sequences: Theorem 5. Given a number M and k positive n-number sequences A t D.a t1 ; ; a tn /, t D 1; ; k, nding all possible pairs of subsequences of A 1, A 2,, or A k such that their mass sum equals M takes O.kn 2 log.kn/ C p log.kn// time and O.kn C p/ space, where p is the number of solutions. The total number of solutions is at most O.k 2 n 3 /. In a protein database of k n size where k À n, Theorem 5 implies that the solutions can be found in the scale of O.k log k/ time. Lemma 3 can also be applied to solve the SSP problem with k sequences, and it requires O.kn 2 log.kn/ C p/ time, similar to that in Theorem 5, but uses O.kn 2 / space. The O.kn/-space algorithm has an advantage in applications where kn is huge, because an O.kn 2 /-space algorithm could easily take up the whole virtual memory Cross-linking structures 3. IDENTIFYING CROSS-LINKED AMINO ACIDS Algorithms in Section 2 produce a list of peptide pairs whose mass sum equals the parent ion mass M. Let a pair of peptides P D.p 1 ; ; p m / and Q D.q 1 ; ; q n / form an H-structure cross-link shown in Fig. 1(a). In Figure 1(a), amino acids p i and q j are connected by a linker introduced by a cross-linking agent. A typical agent, disuccinimidyl glutarate (DSG), cross-links two lysines to form a Lys linker Lys structure. If each peptide sequence has r (r m and r n) such amino acids, there are r 2 possible cross-links. In tandem mass spectrometry, CID fragments each cross-link molecule into one of the four patterns shown in Fig. 2. In Fig. 2(a), the peptide bond p k p kc1, located on the N-terminal side (left) of the cross-linked amino acid (p i ), is broken, and the molecule is fragmented into an N-terminal ion, p 1 p k C, and a C-terminal ion, the H-structure ion Q. C p kc1 p m /. In Fig. 2(b), the peptide bond is located on the C-terminal side (right) of p i, and the molecule is fragmented into an N-terminal ion Q.p 1 p k C / and a C-terminal ion p C kc1 p m. Similarly, if a peptide bond of Q is broken, the molecule has two fragmentation patterns shown in Fig. 2(c) and 2(d). Depending on the cross-linking agent used in the experiment, a linker may contain a peptide bond. If the linker is broken, the H-structure molecule is cleaved into two peptide ions, each of which has a decoration similar to the molecule shown in Fig. 1(c). For simplicity, assume m D n in the following description. Theoretically, there are O.m/ possible ions for each cross-linked P and Q. A tandem mass spectrum is a collection of mass peaks, ideally, each of which corresponds to some ion in Fig Algorithms for identifying cross-linked amino acids De nition 2. Cross-Linked Amino Acids Problem (CLAA). Given an h-mass peak tandem mass spectrum T D.t 1 ; ; t h / and two cross-linked m-amino acid peptides P D.p 1 ; ; p m / and Q D.q 1 ; ; q m /, each having r (r m) amino acids that can be cross-linked, nd the cross-linked amino acids that maximize the value of a given scoring function F.P; Q; T /. There are r 2 cross-linked amino acid pairs corresponding to r 2 possible cross-link structures. For every such structure, we can generate a hypothetical tandem mass spectrum, S D.s 1 ; : : : ; s 4.m 1/ /, from the masses of all the possible fragmented ions in Fig. 2. A scoring function F compares S with T and gives a score based on how well they are correlated. Generally, F considers a match between two mass peaks t i of T and s j of S, if jt i s j j ", where " is the maximum measurement error allowed in an experiment. The higher the number of matches is, the higher the F-score is. For simplicity, consider F counting the number of matches. For every mass peak of S, nding its match in T takes O.log h/ time by binary search. The total number of matches can be determined in O.m log h/ time. Thus, nding the most likely cross-linked amino acid pair takes a total of O.r 2 m log h/ time. However, there is better way to do it.
7 IDENTIFYING PROTEIN CROSS-LINKS 577 FIG. 2. Four fragmentation patterns for a cross-linked molecule in tandem mass spectrometry. Theorem 6. The CLAA problem can be solved in O.m C h/ time and O.m C h/ space if the scoring function F counts the number of matches. Proof. Figures 2(a) (d) shows 4.m 1/ ions for one cross-link, and thus r 2 cross-links have a total of 4.m 1/r 2 ions. However, there are only 8.m 1/ different masses for these ions. Figure 3 shows two types of N- and C-terminal ions after breaking the peptide bond p i p ic1. In Fig. 3(a), the cross-linked amino acid of P locates after p i, and in Fig. 3(b), the cross-linked amino acid of P locates before p i. The C-terminal ion in Fig. 3(a) has a xed mass no matter where the cross-link may be located, as long as it is after p i. So does the N-terminal ion in Figure 3(b). Since P and Q together have 2.m 1/ peptide bonds, there are only 8.m 1/ different masses. Let M. ; T / D 1 indicate the match of a mass with a spectrum T, and M. ; T / D 0 otherwise. In the following steps, we can nd the best cross-link: 1. Calculate ion masses of P :.L N p 1 ; L C p 1 /,,.L N p m 1 ; L C p m 1 / for the N- and C-terminal ions in Fig. 3(a) corresponding to the breaking of peptide bonds.p 1 p 2 /,,.p m 1 p m /, respectively, and similarly,.r N p 1 ; R C p 1 /,,.R N p m 1 ; R C p m 1 / for the N- and C-terminal ions in Figure 3(b). 2. Look up these masses in T for matches: M.L N p i ; T /, M.L C p i ; T /, M.R N p i ; T /, and M.R C p i ; T /, for i D 1; ; m 1.
8 578 CHEN ET AL. FIG. 3. Two types of N- and C-terminal ions after breaking the p i p ic1 peptide bond. 3. For every possible cross-linked amino acid p i, calculate Xi 1 score i D.M.L N p j ; T / C M.L C p j ; T // j D1 C mx.m.rp N j ; T / C M.Rp C j ; T //: jdi Find the maximum score, score a, which indicates the most likely cross-linked amino acid p a. 4. Repeat Steps 1, 2, and 3 and nd the most likely cross-linked amino acid q b for Q. 5. Report the cross-linked amino acids p a and q b. In Step 1, it takes O.m/ time to calculate.l N p 1 ; L C p 1 / and from.l N p 1 ; L C p 1 / to.l N p 2 ; L C p 2 / takes only O.1/ time for adding and subtracting p 2. Thus, Step 1 takes O.m/ time. The matching in Step 2 takes O.m Ch/ time by merging four sorted lists, fl N p i g, fl C p i g, fr N p i g, and fr C p i g, into one sorted list and matching this list with the sorted list of T. Similarly, it takes O.m/ time to calculate score 1 and only additional O.1/ time for score 2, score 3, and so on. Thus, Step 3 takes O.m/ time. Step 4 has similar time complexity. Therefore, nding the cross-linked amino acids takes a total of O.m C h/ time. Although Theorem 6 is for the special F-function that counts the number of matches, this theorem can be generalized for many other functions, such as the correlation function, the convolution function, and so on Speed up searching 4. IMPLEMENTATION AND PROBABILITY Lemma 2 indicates that there may be as many as O.n 3 / different solutions for the Subsequence Sum Problem (SSP), given a pair of protein sequences. For a database of k protein sequences, Theorem 5 tells us that the number could be as large as O.k 2 n 3 /. In fact, it is possible that there are too many solutions for SSP, especially when the database is very large and the number of subsequence pairs grows quadratically to the database size k. On the other hand, if the mass/charge error is large, many peptide pairs could fall into this error range. In both cases, checking all cross-linked peptides using the algorithm in Theorem 6 is very time consuming. To speed up the SSP search, we propose to examine a peptide and determine
9 IDENTIFYING PROTEIN CROSS-LINKS 579 whether it could belong to a solution by comparing its hypothetical cross-linked tandem mass spectrum with the experimental spectrum. Let M be the parent ion mass, T be an experimental spectrum, and F be the scoring function that counts the number of matches between a peptide P and a spectrum T. Then F.P; T / can be computed by Steps 1, 2, and 3 in the algorithm in Theorem 6. Let ± be the minimum matching score (of F.P; T /) for accepting P as one of the peptides of the cross-link. We can speed up the database search by using the following steps: 1. For every peptide subsequence P in the database, compute F.P; T / and save P to a candidate list if F.P ; T / ±. 2. From the candidate list, nd every pair of P and Q satisfying mass.p / C mass.q/ C mass.linker/ D M, and let F.P; Q; T / DF.P ; T /CF.Q; T /. 3. Report the peptide pair with the maximum score. This process could eliminate a large number of peptide pairs giving low matching scores Matching probability Probabilistic models (Qin et al., 1997; Dancik et al., 2000; Bafna and Edwards, 2001) have been proposed for scoring tandem mass spectra against a peptide database. Environmental parameters such as the probability of losing a peak and the probability of a random noise have to be determined for each experiment in order to derive a good scoring function. In our case, we have already known that the experimental spectra came from some peptide in the human hemoglobin protein. In this study, we use a simple scoring function. Let T be an experimental spectrum of t mass peaks, and H be a hypothetical spectrum of h ions. Assume the abundance levels are ignored; otherwise we can set an abundance level threshold to lter out noise. Let the matching range be [ ±; ±] and let the machine detect ions up to r m/z ratio. In our ESI-MS-MS experiments, ± is at most 0.2, h is generally between 20 and 50 ions, r is about 2,000 m/z, and depending on the experiment, t ranges from tens to hundreds of mass peaks. Assume that matching ranges between peaks in H do not overlap. Then the expected (mean) number of matches ¹ between T and H is ¹ D t 2±h r where 2±h=r is the probability that a random mass peak hits an ion of H. If matching ranges overlap, we can replace 2±h by the sum of matching ranges. The number of matches between T and H has a binomial distribution. Since 2±h=r is small (< 0:1), the binomial distribution can be can be approximated by the Poisson distribution ¹ ¹x P.x/ D e x! and the probability of at least m matches can be computed by m 1 X P.x m/ D 1 P.x D i/: id Cross-linking experiments 5. EXPERIMENTS AND DATA ANALYSIS Chemical modi cations and cross-links were introduced into the human blood protein hemoglobin (Hb; Sigma, St. Louis, MO) by reaction with disuccinimidyl glutarate (DSG; Pierce, Rockford, IL). Hemoglobin is a four-subunit protein consisting of two subunits each of two unique polypeptide chains, and. The
10 580 CHEN ET AL. approximate molecular weight of each chain is 15,000 daltons. DSG is a homobifunctional cross-linker which creates a link between two separate lysine residues by inserting a 5-carbon chain between them. Brie y, 50 mm Hb was reacted with 5 mm DSG for one hour at room temperature in the presence of 90 mm sodium phosphate ph 7.5 and 10% DMSO. The reaction was stopped by addition of an equal volume of 1.0 M Tris ph 7.5. An aliquot of this reaction was mixed with 2 volumes of 6M urea and TPCK-treated trypsin (Sigma, St. Louis, MO) was added at a ratio of 1:125 w/w trypsin to Hb. This mixture was allowed to react at 37 ± C overnight. One hundred ¹l of the digestion mixture was subject to HPLC-MS-MS analysis. The sample was loaded onto a YMC ODS-AQ S3¹ 120 Å mm column (Waters, Milford, MA) and the peptides were separated by a gradient of 5 80% acetonitrile (0.03% tri uoroacetic acid and 0.10% acetic acid as ion pairing reagents) at a ow rate of 0.1 ml/min. The eluent was directed to a Finnigan LCQ (Thermoquest, San Jose, CA) ion trap mass spectrometer tted with an ESI source and operated in positive ion mode. Signi cantly ionized peptides were automatically measured for m/z and subjected to CID. Data were automatically collected Data analysis The data were analyzed visually and with several in-house algorithms, as well as the commercial package SEQUEST (Eng et al., 1994). Every tandem mass spectrum is preprocessed by the following steps: 1) close mass peaks that are candidates of isotopic ions are merged and replaced by a new mass peak, and the abundance levels are replaced by their sum; 2) mass peaks with low relative abundance levels are cut off. Then, we implemented the algorithm in Lemma 3 and computed the score between a pair of sequences S and a tandem mass spectrum T using the convolution function. The probability of each top score solution was calculated from the Poisson distribution. Several types of modi cation events were recognized: interpeptide cross-links (Fig. 1(a)), decoration (Fig. 1(b)), and internal cross-links (Fig. 1(c)). Of speci c interest to this work, a cross-link between lysine 82 of both subunits of Hb was detected, as shown in Fig. 4. Structurally, this would correspond to bridging the central channel of the Hb protein. The interresidue distance between the two lysines was FIG. 4. 3D structure of two identical peptides (67-VLGAFSDGLAHLDNLK-82) from two subunits cross-linked in a hemoglobin protein, and the cross-linked lysines (Ks).
11 IDENTIFYING PROTEIN CROSS-LINKS 581 FIG. 5. The tandem mass spectrum of two identical peptides (67-VLGAFSDGLAHLDNLK-82) from two subunits cross-linked in a hemoglobin protein where four b-ions and three y-ions containing the cross-link were identi ed. measured to be 8.18 Å (derived from the PDB le 1A3N), which is comparable with the 7.7 Å spacer length of DSG (Tame and Vallone, 2000). This cross-link was found in two tandem mass spectra, shown in Figs. 5 and 6, and both parent ions have zd3 charges. The masses are equivalent to two molecules of the tryptic peptide 67-VLGAFSDGLAHLDNLK-82 of Hb after being carbamylated (an N-terminal modi cation of 43 daltons due to urea) and cross-linked by the glutarate moiety of the DSG (96 daltons). Fragmented daughter ions from this peptide were observed in the spectrum shown in Fig. 5: four b-ions (ZD1), b7- H 2 O, b9 H 2 O, b11 H 2 O, and b14ch 2 O and three cross-linked y-ions (Z=2), L y2, L y3, and L y15. Using the error range of [ 0:2; 0:2], we compute the probability from the Poisson distribution: P.x 7/ D 2:42E 05. Figure 6 shows four b-ions, b7 H 2 O, b9 H 2 O, b11 H 2 O, and b14 H 2 O and two cross-linked y-ions (Z=2), L y3 and L y15. The probability computed from the Poisson distribution is P.x 6/ D 6:47E DISCUSSION The interchain cross-link is con rmed in two tandem mass spectra both with < 10 4 matching probabilities. The length of the cross-linker (7.7 Å ) is very close to the actual distance (8.18 Å ) obtained from the molecular structure in PDB. Our experiment strongly suggests that protein structure information can be derived via protein cross-linking and tandem mass spectrometry. It should be pointed out that the algorithms suggested here do not consider the case of amino acid modi cations. However, if the modi cation such as phosphorylation is speci ed, the algorithms still work. The underlying chemistry in MS-MS is still not completely understood. There are many mass peaks shown in both Figs. 5 and 6 which have not been interpreted. Some of them may be internal ions of the cross-link, and to verify this, further experiments on MS3 are probably necessary. There are many other complications
12 582 CHEN ET AL. FIG. 6. The tandem mass spectrum of two identical peptides (67-VLGAFSDGLAHLDNLK-82) from two subunits cross-linked in a Hemoglobin protein where four b-ions and two y-ions containing the cross-link were identi ed. in the experimental data because of noise, multiply charged ions, mass measurement errors, calibration errors, internal ions from double fragmentation, other types of ions such as a- and z-ions, and so on. All these can affect the interpretation and some of them are machine-dependent while others are experimentdependent. Also, a good scoring function is critical to judge what is the best interpretation for a tandem mass spectrum. Agreement on the best scoring function and how to use the abundance of information remain unresolved. REFERENCES Bafna, V., and Edwards, N A probablistic model for scoring tandem mass spectra against a peptide database. To appear in Bioinformatics. Chen, T., Kao, M., Tepel, M., Rush, J., and Church, G.M A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comp. Biol. 8(3), Cormen, T.H., Leiserson, C.E., and Rivest, R.L Introduction to Algorithms, MIT Press, Boston. Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E., and Pevzner, P.A De novo peptide sequencing via tandem mass spectrometry: A graph-theoretical approach. J. Comp. Biol. 6, Eng, J.K., McCormack, A.L., and Yates, J.R An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrometry 5, Kellogg, C.B., Kelley, J.J., Stein, C., and Donald, B.R Reducing mass degeneracy in SAR by MS by stable isotopic labeling. J. Comp. Biol. 8(1), Pevzner, P.A., Dancik, V., and Tang, C.L Mutation-tolerantprotein identi cation by mass spectrometry. J. Comp. Biol. 7(6), Qin, J., Fenyo, D., Zhao, Y., Hall, W.W., Chao, D.M., Wilson, C.J., Young, R.A., and Chait, B.T A strategy for rapid, high-con dence protein identi cation. Analytical Chemistry 69,
13 IDENTIFYING PROTEIN CROSS-LINKS 583 Tame, J.R., and Vallone, B The structures of deoxy human haemoglobin and the mutant Hb Tyralpha42His at 120 K. Acta Crystallogr. D. Biol. Crystallogr. 56(7), Taylor, J.A., and Johnson, R.S Sequence database searches via de novo peptide sequencing by tandem mass spectral data. Biomed. Environ. Mass Spectrum. 11, Young, M.M., Tang, N., Hempel, J.C., Oshiro, C.M., Taylor, E.W., Kuntz, I.D., Gibson, B.W., and Dollinger, G High throughput protein fold identi cation by using experimental constraints derived from intramolecular cross-links and mass spectrometry. Proc. Natl. Acad. Sci. USA 97(11), Address correspondence to: Ting Chen Department of Molecular Biology University of Southern California Los Angeles, CA
A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry
A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry Ting Chen Department of Genetics arvard Medical School Boston, MA 02115, USA Ming-Yang Kao Department of Computer
More informationA Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry. BINGWEN LU and TING CHEN ABSTRACT
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 10, Number 1, 2003 Mary Ann Liebert, Inc. Pp. 1 12 A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry BINGWEN LU and TING CHEN ABSTRACT
More informationModeling Mass Spectrometry-Based Protein Analysis
Chapter 8 Jan Eriksson and David Fenyö Abstract The success of mass spectrometry based proteomics depends on efficient methods for data analysis. These methods require a detailed understanding of the information
More informationDe Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry
17 th European Symposium on Computer Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 De Novo Peptide Identification Via Mixed-Integer Linear
More informationvia Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia
De Novo Peptide Sequencing via Tandem Mass Spectrometry and Propositional Satisfiability Renato Bruni bruni@diei.unipg.it or bruni@dis.uniroma1.it University of Perugia I FIMA International Conference
More informationMass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were
Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were developed to allow the analysis of large intact (bigger than
More informationProtein Sequencing and Identification by Mass Spectrometry
Protein Sequencing and Identification by Mass Spectrometry Tandem Mass Spectrometry De Novo Peptide Sequencing Spectrum Graph Protein Identification via Database Search Identifying Post Translationally
More informationKey questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion
s s Key questions of proteomics What proteins are there? Bioinformatics 2 Lecture 2 roteomics How much is there of each of the proteins? - Absolute quantitation - Stoichiometry What (modification/splice)
More informationLecture 15: Realities of Genome Assembly Protein Sequencing
Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing
More informationComputational Methods for Mass Spectrometry Proteomics
Computational Methods for Mass Spectrometry Proteomics Eidhammer, Ingvar ISBN-13: 9780470512975 Table of Contents Preface. Acknowledgements. 1 Protein, Proteome, and Proteomics. 1.1 Primary goals for studying
More informationCSE182-L8. Mass Spectrometry
CSE182-L8 Mass Spectrometry Project Notes Implement a few tools for proteomics C1:11/2/04 Answer MS questions to get started, select project partner, select a project. C2:11/15/04 (All but web-team) Plan
More informationDe Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics
De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics John R. Rose Computer Science and Engineering University of South Carolina 1 Overview Background Information Theoretic
More informationEfficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry
Methods Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry Pavel A. Pevzner, 1,3 Zufar Mulyukov, 1 Vlado Dancik, 2 and Chris L Tang 2 Department of
More informationIdentification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry
Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry Jonathan Williams Waters Corporation, Milford, MA, USA A P P L I C AT
More informationDe novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu
De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra Xiaowen Liu Department of BioHealth Informatics, Department of Computer and Information Sciences, Indiana University-Purdue
More informationTandem Mass Spectrometry: Generating function, alignment and assembly
Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate
More informationIdentification of proteins by enzyme digestion, mass
Method for Screening Peptide Fragment Ion Mass Spectra Prior to Database Searching Roger E. Moore, Mary K. Young, and Terry D. Lee Beckman Research Institute of the City of Hope, Duarte, California, USA
More informationProtein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.
Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists
More informationProteomics. November 13, 2007
Proteomics November 13, 2007 Acknowledgement Slides presented here have been borrowed from presentations by : Dr. Mark A. Knepper (LKEM, NHLBI, NIH) Dr. Nathan Edwards (Center for Bioinformatics and Computational
More informationBackground: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of
Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what
More informationImproved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *
Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * 1 Department of Chemistry, Pomona College, Claremont, California
More informationBackground: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry
Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what
More informationBST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1
BST 226 Statistical Methods for Bioinformatics David M. Rocke January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1 Mass Spectrometry Mass spectrometry (mass spec, MS) comprises a set of instrumental
More informationBiological Mass Spectrometry
Biochemistry 412 Biological Mass Spectrometry February 13 th, 2007 Proteomics The study of the complete complement of proteins found in an organism Degrees of Freedom for Protein Variability Covalent Modifications
More informationStructure of the α-helix
Structure of the α-helix Structure of the β Sheet Protein Dynamics Basics of Quenching HDX Hydrogen exchange of amide protons is catalyzed by H 2 O, OH -, and H 3 O +, but it s most dominated by base
More informationNPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA
LECTURE-25 Quantitative proteomics: itraq and TMT TRANSCRIPT Welcome to the proteomics course. Today we will talk about quantitative proteomics and discuss about itraq and TMT techniques. The quantitative
More informationAtomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4
High-Resolution Mass spectrometry (HR-MS, HRAM-MS) (FT mass spectrometry) MS that enables identifying elemental compositions (empirical formulas) from accurate m/z data 9.05.2017 1 Atomic masses (atomic
More informationProteome Informatics. Brian C. Searle Creative Commons Attribution
Proteome Informatics Brian C. Searle searleb@uw.edu Creative Commons Attribution Section structure Class 1 Class 2 Homework 1 Mass spectrometry and de novo sequencing Database searching and E-value estimation
More informationDE NOVO PEPTIDE SEQUENCING FOR MASS SPECTRA BASED ON MULTI-CHARGE STRONG TAGS
DE NOVO PEPTIDE SEQUENCING FO MASS SPECTA BASED ON MULTI-CHAGE STONG TAGS KANG NING, KET FAH CHONG, HON WAI LEONG Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore
More informationTandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software
Supplementary Methods Software Interpretation of Tandem mass spectra Tandem mass spectra were extracted from the Xcalibur data system format (.RAW) and charge state assignment was performed using in house
More informationChapter 5. Complexation of Tholins by 18-crown-6:
5-1 Chapter 5. Complexation of Tholins by 18-crown-6: Identification of Primary Amines 5.1. Introduction Electrospray ionization (ESI) is an excellent technique for the ionization of complex mixtures,
More informationMS-based proteomics to investigate proteins and their modifications
MS-based proteomics to investigate proteins and their modifications Francis Impens VIB Proteomics Core October th 217 Overview Mass spectrometry-based proteomics: general workflow Identification of protein
More informationSRM assay generation and data analysis in Skyline
in Skyline Preparation 1. Download the example data from www.srmcourse.ch/eupa.html (3 raw files, 1 csv file, 1 sptxt file). 2. The number formats of your computer have to be set to English (United States).
More informationMass Spectrometry. Hyphenated Techniques GC-MS LC-MS and MS-MS
Mass Spectrometry Hyphenated Techniques GC-MS LC-MS and MS-MS Reasons for Using Chromatography with MS Mixture analysis by MS alone is difficult Fragmentation from ionization (EI or CI) Fragments from
More informationTutorial 1: Setting up your Skyline document
Tutorial 1: Setting up your Skyline document Caution! For using Skyline the number formats of your computer have to be set to English (United States). Open the Control Panel Clock, Language, and Region
More informationQuantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry
Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry Jon Hao, Rong Ye, and Mason Tao Poochon Scientific, Frederick, Maryland 21701 Abstract Background:
More informationProtein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University
Protein Quantitation II: Multiple Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional Affinity-based proteomics Use antibodies to quantify proteins Western Blot Immunohistochemistry
More informationProtein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University
Protein Quantitation II: Multiple Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional Affinity-based proteomics Use antibodies to quantify proteins Western Blot RPPA Immunohistochemistry
More informationMass spectrometry in proteomics
I519 Introduction to Bioinformatics, Fall, 2013 Mass spectrometry in proteomics Haixu Tang School of Informatics and Computing Indiana University, Bloomington Modified from: www.bioalgorithms.info Outline
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationTANDEM MASS SPECTROSCOPY
TANDEM MASS SPECTROSCOPY 1 MASS SPECTROMETER TYPES OF MASS SPECTROMETER PRINCIPLE TANDEM MASS SPECTROMETER INSTRUMENTATION QUADRAPOLE MASS ANALYZER TRIPLE QUADRAPOLE MASS ANALYZER TIME OF FLIGHT MASS ANALYSER
More informationWas T. rex Just a Big Chicken? Computational Proteomics
Was T. rex Just a Big Chicken? Computational Proteomics Phillip Compeau and Pavel Pevzner adjusted by Jovana Kovačević Bioinformatics Algorithms: an Active Learning Approach 215 by Compeau and Pevzner.
More informationParallel Algorithms For Real-Time Peptide-Spectrum Matching
Parallel Algorithms For Real-Time Peptide-Spectrum Matching A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science
More informationZAHID IQBAL WARRAICH
Q1 Chromatography is an important analytical technique in chemistry. There is a number of techniques under the general heading of chromatography. (a) Paper and gas chromatography rely on partition to separate
More informationSPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS
SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS 1 Yan Yan Department of Computer Science University of Western Ontario, Canada OUTLINE Background Tandem mass spectrometry
More informationABSTRACT 1. INTRODUCTION
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 6, Numbers 3/4, 1999 Mary Ann Liebert, Inc. Pp. 327 342 De Novo Peptide Sequencing via Tandem Mass Spectrometry VLADO DAN Ï CÍK, 1, 2 THERESA A. ADDONA, 1 KARL R.
More informationData pre-processing in liquid chromatography mass spectrometry-based proteomics
BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 21 25, pages 454 459 doi:1.193/bioinformatics/bti66 Data and text mining Data pre-processing in liquid chromatography mass spectrometry-based proteomics Xiang
More informationSupporting Information. Labeled Ligand Displacement: Extending NMR-based Screening of Protein Targets
Supporting Information Labeled Ligand Displacement: Extending NMR-based Screening of Protein Targets Steven L. Swann, Danying Song, Chaohong Sun, Philip J. Hajduk, and Andrew M. Petros Global Pharmaceutical
More informationYifei Bao. Beatrix. Manor Askenazi
Detection and Correction of Interference in MS1 Quantitation of Peptides Using their Isotope Distributions Yifei Bao Department of Computer Science Stevens Institute of Technology Beatrix Ueberheide Department
More informationBIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods
BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer 2013 14. Systems Biology Exp. Methods Overview Transcriptomics Basics of microarrays Comparative analysis Interactomics:
More informationProtein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems
Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Informatics Research Applied Biosystems Outline Proteomics context Tandem mass spectrometry Peptide fragmentation Peptide identification
More informationLC-MS Based Metabolomics
LC-MS Based Metabolomics Analysing the METABOLOME 1. Metabolite Extraction 2. Metabolite detection (with or without separation) 3. Data analysis Metabolite Detection GC-MS: Naturally volatile or made volatile
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationEffective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry
Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry by Xi Han A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree
More informationMETABOLIC PATHWAY PREDICTION/ALIGNMENT
COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501
More informationLECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of
LECTURE-13 Peptide Mass Fingerprinting HANDOUT PREAMBLE Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of proteins, drugs and many biological moieties to elucidate
More informationSupplementary Material for: Clustering Millions of Tandem Mass Spectra
Supplementary Material for: Clustering Millions of Tandem Mass Spectra Ari M. Frank 1 Nuno Bandeira 1 Zhouxin Shen 2 Stephen Tanner 3 Steven P. Briggs 2 Richard D. Smith 4 Pavel A. Pevzner 1 October 4,
More informationOn Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering
On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering Jiří Novák, David Hoksza, Jakub Lokoč, and Tomáš Skopal Siret Research Group, Faculty of Mathematics and Physics, Charles
More informationPurdue-UAB Botanicals Center for Age- Related Disease
Purdue-UAB Botanicals Center for Age- Related Disease MALDI-TOF Mass Spectrometry Fingerprinting Technique Landon Wilson MALDI-TOF mass spectrometry is an advanced technique for rapid protein identification
More informationElectron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS
Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS Jonathan P. Williams, Jeffery M. Brown, Stephane Houel, Ying Qing Yu, and Weibin Chen Waters Corporation,
More informationTUTORIAL EXERCISES WITH ANSWERS
TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129
More informationMethods for proteome analysis of obesity (Adipose tissue)
Methods for proteome analysis of obesity (Adipose tissue) I. Sample preparation and liquid chromatography-tandem mass spectrometric analysis Instruments, softwares, and materials AB SCIEX Triple TOF 5600
More informationBIOINFORMATICS. Bingwen Lu and Ting Chen INTRODUCTION
BIOINFORMATICS Vol. 19 Suppl. 2 2003, pages ii113 ii121 DOI: 10.1093/bioinformatics/btg1068 A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific
More informationPeptide Isolation Using the Prep 150 LC System
Jo-Ann M. Jablonski and Andrew J. Aubin Waters Corporation, Milford, MA, USA APPLICATION BENEFITS The Prep 150 LC System, an affordable, highly reliable system for preparative chromatography, is suitable
More informationHigh-Throughput Protein Quantitation Using Multiple Reaction Monitoring
High-Throughput Protein Quantitation Using Multiple Reaction Monitoring Application Note Authors Ning Tang, Christine Miller, Joe Roark, Norton Kitagawa and Keith Waddell Agilent Technologies, Inc. Santa
More informationProteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?
Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains
More informationHemoglobin (Hb) exists in the blood cells of
Prediction of Product Ion Isotope Ratios in the Tandem Electrospray Ionization Mass Spectra from the Second Isotope of Tryptic Peptides: Identification of the Variant 131 Gln Glu, Hemoglobin Camden Brian
More informationPepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search
PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Yunhu Wan, Austin Yang, and Ting Chen*, Department of Mathematics, Department of Pharmaceutical Sciences, and
More informationElectrospray ionization mass spectrometry (ESI-
Automated Charge State Determination of Complex Isotope-Resolved Mass Spectra by Peak-Target Fourier Transform Li Chen a and Yee Leng Yap b a Bioinformatics Institute, 30 Biopolis Street, Singapore b Davos
More informationPropose a structure for an alcohol, C4H10O, that has the following
Propose a structure for an alcohol, C4H10O, that has the following 13CNMR spectral data: Broadband _ decoupled 13CNMR: 19.0, 31.7, 69.5 б DEPT _90: 31.7 б DEPT _ 135: positive peak at 19.0 & 31.7 б, negative
More informationApplication Note LCMS-112 A Fully Automated Two-Step Procedure for Quality Control of Synthetic Peptides
Application Note LCMS-112 A Fully Automated Two-Step Procedure for Quality Control of Synthetic Peptides Abstract Here we describe a two-step QC procedure for synthetic peptides. In the first step, the
More informationMS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data
MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data Timothy Lee 1, Rahul Singh 1, Ten-Yang Yen 2, and Bruce Macher 2 1 Department
More informationA Statistical Model of Proteolytic Digestion
A Statistical Model of Proteolytic Digestion I-Jeng Wang, Christopher P. Diehl Research and Technology Development Center Johns Hopkins University Applied Physics Laboratory Laurel, MD 20723 6099 Email:
More informationTHE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION
THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure
More informationQualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)
Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!) James A. Mobley, Ph.D. Director of Research in Urology Associate Director of Mass Spectrometry (contact: mobleyja@uab.edu)
More informationADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY
ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY PEDRO ALVES, 1 RANDY J. ARNOLD, 2 MILOS V. NOVOTNY, 2 PREDRAG RADIVOJAC, 1 JAMES P. REILLY, 2 HAIXU TANG 1, 3* 1) School
More informationTandem MS = MS / MS. ESI-MS give information on the mass of a molecule but none on the structure
Tandem MS = MS / MS ESI-MS give information on the mass of a molecule but none on the structure In tandem MS (MSMS) (pseudo-)molecular ions are selected in MS1 and fragmented by collision with gas. collision
More informationApplications of LC/MS in Qualitative Drug Metabolite Studies
Applications of LC/M in Qualitative Drug Metabolite tudies Case tudy: Identifying in vivo metabolites in human urine - sample cleanup and LC/M/M strategies Application detection of drug (Vanlev) metabolites
More informationReagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure.
DATA SHEET Protein Expression Analysis Reagents Background The ultimate goal of proteomics is to identify and quantify proteins that are relevant to a given biological state; and to unearth networks of
More informationLecture Notes for Fall Network Modeling. Ernest Fraenkel
Lecture Notes for 20.320 Fall 2012 Network Modeling Ernest Fraenkel In this lecture we will explore ways in which network models can help us to understand better biological data. We will explore how networks
More informationWADA Technical Document TD2015IDCR
MINIMUM CRITERIA FOR CHROMATOGRAPHIC-MASS SPECTROMETRIC CONFIRMATION OF THE IDENTITY OF ANALYTES FOR DOPING CONTROL PURPOSES. The ability of a method to identify an analyte is a function of the entire
More informationA rapid and highly selective colorimetric method for direct detection of tryptophan in proteins via DMSO acceleration
A rapid and highly selective colorimetric method for direct detection of tryptophan in proteins via DMSO acceleration Yanyan Huang, Shaoxiang Xiong, Guoquan Liu, Rui Zhao Beijing National Laboratory for
More informationNature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.
Supplementary Figure 1 Fragment indexing allows efficient spectra similarity comparisons. The cost and efficiency of spectra similarity calculations can be approximated by the number of fragment comparisons
More informationMotif Prediction in Amino Acid Interaction Networks
Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions
More informationRapid Distinction of Leucine and Isoleucine in Monoclonal Antibodies Using Nanoflow. LCMS n. Discovery Attribute Sciences
Rapid Distinction of Leucine and Isoleucine in Monoclonal Antibodies Using Nanoflow LCMS n Dhanashri Bagal *, Eddie Kast, Ping Cao Discovery Attribute Sciences Amgen, South San Francisco, California, United
More informationBioinformatics 2. Yeast two hybrid. Proteomics. Proteomics
GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein
More informationIsotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics
Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics Xiao-jun Li, Ph.D. Current address: Homestead Clinical Day 4 October 19, 2006 Protein Quantification LC-MS/MS Data XLink mzxml file
More informationA Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry
389 A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry Ting Chen *t Ming-Yang Kao $ Matthew Tepel* George M. Church* John Rush* Abstract The tandem mass spectrometry
More informationLECTURE-11. Hybrid MS Configurations HANDOUT. As discussed in our previous lecture, mass spectrometry is by far the most versatile
LECTURE-11 Hybrid MS Configurations HANDOUT PREAMBLE As discussed in our previous lecture, mass spectrometry is by far the most versatile technique used in proteomics. We had also discussed some of the
More informationDe Novo Peptide Sequencing
De Novo Peptide Sequencing Outline A simple de novo sequencing algorithm PTM Other ion types Mass segment error De Novo Peptide Sequencing b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 A NELLLNVK AN ELLLNVK ANE LLLNVK
More informationApplication Note. Edgar Naegele. Abstract
Fast identification of main drug metabolites by quadrupole time-of-flight LC/MS Measuring accurate MS and MS/MS data with the Agilent 651 Q-TOF LC/MS and identification of main meta-bolites by comparison
More informationDeveloping Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data
Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data RIPS Team Jake Marcus (Project Manager) Anne Eaton Melanie Kanter Aru Ray Faculty Mentors Shawn Cokus Matteo
More informationTowards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data
Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data Anthony J Bonner Han Liu Abstract This paper addresses a central problem of Proteomics: estimating the amounts of each of
More information1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s).
Analysis of a Peptide Sequence from a Proteolytic Digest by MALDI-TOF Post-Source Decay (PSD) and Collision-Induced Dissociation (CID) Standard Operating Procedure Purpose: The following procedure may
More informationProfiling of Diferulates (Plant Cell Wall Cross- Linkers) Using Ultrahigh-performance Liquid. Chromatography-Tandem Mass Spectrometry
Supporting Information for: Profiling of Diferulates (Plant Cell Wall Cross- Linkers) Using Ultrahigh-performance Liquid Chromatography-Tandem Mass Spectrometry Ramin Vismeh a,b, Fachuang Lu c,d, Shishir
More informationA Workflow Approach for the Identification and Structural Elucidation of Impurities of Quetiapine Hemifumarate Drug Substance
A Workflow Approach for the Identification and Structural Elucidation of Impurities of Quetiapine Hemifumarate Drug Substance Michael D. Jones, Marian Twohig, Karen Haas, and Robert S. Plumb Waters Corporation,
More informationSupporting Information. Copyright Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2007
Supporting Information Copyright Wiley-VCH Verlag GmbH & Co. KGaA, 69451 Weinheim, 2007 Copyright Wiley-VCH Verlag GmbH & Co. KGaA, 69451 Weinheim, 2007 Supporting Information for Head-to-Tail Cyclized
More informationNovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing
Anal. Chem. 2005, 77, 7265-7273 NovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing Bernd Fischer, Volker Roth, Franz Roos, Jonas Grossmann, Sacha Baginsky, Peter Widmayer, Wilhelm Gruissem,
More informationWADA Technical Document TD2003IDCR
IDENTIFICATION CRITERIA FOR QUALITATIVE ASSAYS INCORPORATING CHROMATOGRAPHY AND MASS SPECTROMETRY The appropriate analytical characteristics must be documented for a particular assay. The Laboratory must
More informationStudy of Non-Covalent Complexes by ESI-MS. By Quinn Tays
Study of Non-Covalent Complexes by ESI-MS By Quinn Tays History Overview Background Electrospray Ionization How it is used in study of noncovalent interactions Uses of the Technique Types of molecules
More information