Algorithms for Identifying Protein Cross-Links via Tandem Mass Spectrometry ABSTRACT

Size: px
Start display at page:

Download "Algorithms for Identifying Protein Cross-Links via Tandem Mass Spectrometry ABSTRACT"

Transcription

1 JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 6, 2001 Mary Ann Liebert, Inc. Pp Algorithms for Identifying Protein Cross-Links via Tandem Mass Spectrometry TING CHEN, 1 JACOB D. JAFFE, 2 and GEORGE M. CHURCH 3 ABSTRACT Cross-linking technology combined with tandem mass spectrometry (MS-MS) is a powerful method that provides a rapid solution to the discovery of protein protein interactions and protein structures. We studied the problem of detecting cross-linked peptides and crosslinked amino acids from tandem mass spectral data. Our method consists of two steps: the rst step nds two protein subsequences whose mass sum equals a given mass measured from the mass spectrometry; and the second step nds the best cross-linked amino acids in these two peptide sequences that are optimally correlated to a given tandem mass spectrum. We designed fast and space-ef cient algorithms for these two steps and implemented and tested them on experimental data of cross-linked hemoglobin proteins. An interchain crosslink between two subunits was found in two tandem mass spectra. The length of the cross-linker (7.7 Å ) is very close to the actual distance (8.18 Å ) obtained from the molecular structure in PDB. Key words: proteomics, mass spectrometry, algorithms, protein cross-linking, protein-protein interactions, protein folding. 1. INTRODUCTION In recent years, more and more genomes of model organisms have been sequenced. Using these genomic sequences, researchers have focused on the identi cation of genes on the genome, the study of gene regulation and gene regulatory networks, the discovery of signal transduction pathways, the determination of protein structures, the detection of protein protein, protein DNA, and protein metabolite interactions, and the elucidation of functions of genes and their protein products. A method which combines chemical cross-linking of proteins with mass spectrometry or tandem mass spectrometry may be useful in discovering protein complexes, determining their structure (Young et al., 2000) and/or quantitating in vivo concentrations. This paper focuses on new algorithms for interpretation of complex experimental data generated by protein cross-linking and tandem mass spectrometry. Traditionally, three-dimensional structures of proteins are solved by x-ray crystallography and NMR. However, generating an accurate structure that satis es constraints of experimental data can be extremely 1 Department of Molecular Biology, University of Southern California, Los Angeles, CA Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA Department of Genetics, Harvard Medical School, Boston, MA

2 572 CHEN ET AL. dif cult. There has been some success in other computational methods that predict structures from energy functions, multiple alignments, and threading. However, the accuracy and general applicability of these methods lags far behind the rates at which new protein sequences are being identi ed. Classical methods of studying protein protein interactions often scale up poorly, make assumptions about number of interacting proteins or their modi cation state, or lose details about environmental effects on complex formation. Cross-linking technology combined with mass spectrometry provides an alternative approach to detecting protein protein interactions and adding reliable distance constraints to protein structures. Previous studies of cross-linking have been able to produce low resolution interatomic distance constraints, which in conjunction with threading, has led to the determination of the three-dimensional structure of a model protein (Young et al., 2000). A technique called isotopic labeling (Kellogg et al., 2001) has been proposed to separate peptides and cross-links in mass spectrometry. Through a proper design of labeling, the crosslinks can be easily identi ed by checking their masses in the spectra. Similar techniques can be applied to identify the docking structure of two interacting proteins. However, these ideas mostly focus on the individual protein or interaction, and it is not clear how to scale them up to the whole proteome level. In this paper, we applied tandem mass spectrometry to detect cross-linked peptides and cross-linked amino acids, which has the potential for large scale applications. Tandem mass spectrometry plays a powerful role in the identi cation of cross-linking sites. Combined with high performance liquid chromatography (HPLC), tandem mass spectrometry has been widely used to identify peptides and analyze protein sequences. Peptides, i.e., of an amino acid sequence HNHCHR 1 CO- NHCHR 2 CO- -NHCHR n COOH, resulting from proteolytic digestion of proteins are separated by HPLC and then analyzed in a mass spectrometer. The latter is a two step process which consists of measuring the massto-charge (m/z) ratio of an ionized peptide (the parent ion) and then measuring the m/z of fragmentation products (the daughter ions) of the peptide after collision-induced dissociation (CID). For each molecule of this peptide, a randomly selected peptide bond is broken, and as a result, CID produces a ladder of ions, typically N-terminal b ions (HNHCHR 1 CO- -NHCHR i CO C ) and C-terminal y ions (HNHCHR i+1 CO- -NHCHR n COOH C H C ), where two consecutive b-ions (or y-ions) differ by exactly one amino acid. These ions display a characteristic pattern for this peptide on the tandem mass spectrum. Computer programs such as SEQUEST (Eng et al., 1994) correlate peptide sequences in a protein database with the tandem mass spectrum and report all peptides with signi cant correlation scores. Some programs (Pevzner et al., 2000) allow searching the database with amino acids modi cations. An alternative approach (Taylor and Johnson, 1997; Dancik et al., 2000; Chen et al., 2001), called de novo peptide sequencing, extracts candidate peptide sequences from the spectral data before they are validated and/or available in a database. We performed an experiment that rst chemically cross-linked interacting proteins, then digested them proteolytically (using trypsin), and nally separated and identi ed cross-linking sites through HPLC-tandem mass spectrometry. This paper focuses on the algorithmic solutions to the identi cation of cross-linked peptides from the tandem mass spectral data. Cross-link structures are shown in Fig. 1. Figure 1(a) shows an H-structure cross-link where two amino acids in two peptides are connected to a linker by covalent bonds, Fig. 1(b) shows that the linker may simply decorate some amino acid of a peptide, Fig. 1(c) shows a self-cross-linked peptide where two amino acids on this peptide are connected by a linker. Figure 1(d) shows two doubly cross-linked peptides. Our main interest is to identify an H-structure cross-link (Fig. 1(a)). Our approach to nding this crosslink from a tandem mass spectrum uses the following two steps: ² Step 1: Given the mass of the parent cross-link molecules measured in mass spectrometry, nd every pair of peptides whose mass sum (plus the mass of the linker) equals this mass. ² Step 2: Given a pair of peptides, nd the cross-linked amino acid pair that is optimally correlated to the tandem mass spectrum. Finally, the algorithm reports the pair of peptides with the maximum correlation score. For simplicity, assuming that we are given k protein sequences each with n proteolytic amino acids (the targets for proteolytic digestion, i.e., for trypsin, they are Ks and Rs), we can ² Solve Step 1 in O.kn 2 log.kn// time and O.kn/ space.

3 IDENTIFYING PROTEIN CROSS-LINKS 573 FIG. 1. Four cross-link structures: (a) two cross-linked peptides, (b) one decorated peptide, (c) one self cross-linked peptide, (d) two doubly cross-linked peptides. If two peptide sequences of m amino acids are given and the tandem mass spectrum has h mass peaks, we can ² Solve Step 2 in O.m C h/ time and O.m C h/ space. The algorithm does not depend on the number of possible cross-linked amino acid pairs. Our paper is organized as follows. Section 2 describes the algorithm to nd two protein subsequences whose mass sum equals a given mass. Section 3 describes the algorithm of nding the best cross-linking site between two peptides. Section 4 discusses implementation issues and the probabilities of solutions. Section 5 reports the implementation of our algorithms and the test on hemoglobin cross-linking experimental data. 2. IDENTIFYING TWO SUBSEQUENCES FOR A MASS SUM There are n.n C 1/=2 possible subsequences corresponding to a protein with n proteolytic amino acids. For example, trypsin cuts a protein sequence right after a lysine(k) or an arginine(r). The exact place of each cut depends on the distribution of Ks and Rs: if two Ks or Rs are very close, the rst one may not be cut. Moreover, local sequence in uences and modi cations may protect some Ks and Rs from being cleaved by trypsin. Therefore, two protein sequences have O.n 4 / possible pairs of subsequences. To speed up the computation, we translate a protein sequence P into an array A of nc1 masses, each of which corresponds to a unique subsequence between two adjacent proteolytic amino acids. For example, P =NRDNKT, when trypsin is used for digestion, is translated into an array A D.A 1 ; A 2 ; A 3 / where A 1 D the mass for NR, A 2 D the mass for DNK, and A 3 D the mass for T. Thus, any proteolytic subsequence

4 574 CHEN ET AL. of P has the mass sum equal to the sum of the elements in the corresponding interval of A. For example, the subsequence DNKT has the mass of A 2 C A 3. Since the mass of the linker is xed, we focus on nding two protein subsequences for a given mass M, which is equivalent to the Subsequence Sum Problem (SSP) de ned below. De nition 1. Subsequence Sum Problem (SSP). Given two positive n-number sequences A D.a 1 ; : : : ; a n / and B D.b 1 ; : : : ; b n /, and a number M, nd all possible pairs of subsequences,.a i ; : : : ; a j /, 1 i j n, and.b k ; : : : ; b l /, 1 k l n, such that jx a s C sdi lx b t D M: tdk First, we consider an n-number sequence A and a mass M, and we are asked to nd a subsequence of A such that its sum equals M. The following Algorithm Find-A solves this problem. Algorithm Find-A.A; M/ 1. i D 1, j D 1; # 1st seq:.a i ; ; a j / D.a 1 / 2. sum D a 1 ; # sum is the sum of.a 1 / 3. While j < n or sum M 4. If sum D M # Solution: sum D M 5 then output.i; j/; 6. If sum < M # Add a jc1 to.a i ; ; a j / 7. then j D min.j C 1; n/, sum D sum C a j ; 8. Else # Delete a i from.a i ; ; a j / 9. sum D sum a i, i D i C 1. Theorem 1. Given a number M and a positive number sequence A D.a 1 ; ; a n /, Algorithm Find-A nds all the subsequences.a i ; ; a j /; 1 i j n, satisfying P j sdi a s D M, in O.n/ time and O.n/ space. Proof. Straightforward. Following Algorithm Find-A and Theorem 1, we have Lemma 2. The SSP can be solved in O.n 3 / time and O.n C p/ space, where p is the number of solutions, and the number of solutions for SSP is at most O.n 3 /. Proof. For every subsequence a i ; : : : ; a j of A, Algorithm Find-A nds all subsequences of B that satisfy P l tdk b t D M P j sdi a s in O.n/ time. There are O.n 2 / subsequences of A, and thus SSP can be solved in O.n 3 / time. This algorithm stores all the solutions in O.p/ space and requires O.n/ space for A and B, a total of O.n C p/ space. Since there are at most O.n/ solutions (subsequences of B ) for every subsequence of A, The number of solutions for SSP is at most O.n 3 /. If O.n 2 / space is allowed, then the following holds. Lemma 3. The SSP can be solved in O.n 2 log n C p/ time and O.n 2 C p/ space, where p is the number of solutions. Proof. We compute the sums of all subsequences of B in O.n 2 / time and store them into an array of O.n 2 / space. Then, the sums are sorted in O.n 2 log n/ time. For every subsequence a i ; : : : ; a j of A, we can nd all the subsequences of B that satisfy P l tdk b t D M P j sdi a s in O.log n C q/ time using binary search, where q is the number of solutions. If p is the total number of solutions, nding all of them takes O.n 2 log n C p/ time and O.n 2 C p/ space.

5 IDENTIFYING PROTEIN CROSS-LINKS 575 Algorithm Find-AB.A; B; M/ 1. S A D f.a i ; ; a j / j. P j sdi a s < M PjC1 sdi a s; j < n/ or. P n sdi a s < M/g; 2. S B D f.b k /; k D 1; ; ng; 3. While (S A 6D ;) 4. Find.a i ; ; a j / 2 S A where v D P j sdi a s is the maximum sum; 5. For every.b k ; ; b l / 2 S B and P l tdk b t < M v 6. Delete.b k ; ; b l / from S B ; 7. If.l < n/, then add.b k ; ; b lc1 / into S B ; 8. For every.b k ; ; b l / 2 S B and P l tdk b t D M v 9. Output i; j; k; l; 10. Delete.a i ; ; a j / from S A ; 11. If i < j, then add.a i ; ; a j 1 / into S A. For every a i, Step 1 nds the longest subsequence.a i ; ; a j / that satis es P j sdi a s < M for every i. Any subsequence of A to be in a solution must have the sum less than M and be a pre x subsequence of some element in S A. Then, Step 2 nds the shortest subsequences of B. Any subsequence of B to be in a solution must be a pre x supersequence of some element in S B. Steps 3 11 check every subsequence of A with sum less than M in decreasing order and search its corresponding solutions in B. Step 4 nds.a i ; : : : ; a j / 2 S A with the maximum sum v. Steps 5 7 delete every element.b k ; : : : ; b l / 2 S B such that P l tdk b t < M v and replace it by its immediate pre x supersequence.b k ; : : : ; b lc1 /. Steps 8 9 nd every element.b k ; : : : ; b l / 2 S B such that P l tdk b t D M v in S B and report the solutions.a i ; : : : ; a j / and.b k ; : : : ; b l /. After all solutions for.a i ; : : : ; a j / are found, Steps replace them by the immediate pre x subsequence.a i ; : : : ; a j 1 / in S A. Theorem 4. Algorithm Find-AB solves SSP in O.n 2 logn C p log n/ time and O.n C p/ space, where p is the number of solutions. Proof. Obviously, every output of Algorithm Find-AB is correct and unique. Assume that.a i ; ; a j / and.b k ; ; b l / satisfy P j sdi a s C P l tdk b t D M. We show that the algorithm nds this solution. In Step 1, let.a i ; ; a j 0/ 2 S A, which is the longest subsequence of A that starts with a i and has a sum less than M. Because P j sdi a s < M, it is true that j j 0. The algorithm will check.a i ; ; a j 0/ (Step 4) and all its pre x subsequences including.a i ; ; a j / (Steps 10 11) in length-decreasing order. Similarly, every subsequence of B starting with b k, including.b k ; ; b l /, is checked in Steps 5, 6, and 7 in length-increasing order. Let Step 4 generate a series of v-values: v 1 ; v 2 ;. This series is in decreasing order, because Step 4 always nds an element in S A with the maximum sum and Steps 8 9 replace it by a subsequence with a smaller sum. Step 5 deletes subsequences of B that have a sum less than M v 1, M v 2, at each iteration, respectively, and Steps 6 7 always replace them with subsequences of larger sums. Let w D P j sdi a s. When the algorithm is deleting.a i ; ; a j / in Step 10, all the subsequences of B with a sum less than M w would have been deleted. Thus,.b k ; ; b l / is present in S B because P l tdk b t D M w. The algorithm should nd it and report the solution.a i ; ; a j / and.b k ; ; b l /. Both S A and S B maintain O.n/ elements throughout the algorithm. We store them in a data structure called a Red-Black Tree (Cormen et al., 1990). This is a binary tree that allows retrieval, addition, and deletion in O.log n/ time while keeping the tree height O.log n/ through ef cient tree rotations. Thus, the algorithm requires only O.n C p/ space. In Steps 3 11, every subsequence of A is checked and deleted at most once. Every subsequence of B is checked and deleted at most once in Steps 5 9 if it is not in any solution. If a subsequence is in r 1 solutions, it will be checked exactly r C 1 times. It takes O.log n/ time for every retrieval in Steps 4, 5, and 8, every deletion in Steps 6 and 10, and every addition in Steps 7 and 11. The total running time is O.n 2 log n C p log n/.

6 576 CHEN ET AL. Algorithm Find-AB requires basically only O.n/ space if the solutions are not stored. Similar algorithms can be applied to solve the SSP problem with k sequences: Theorem 5. Given a number M and k positive n-number sequences A t D.a t1 ; ; a tn /, t D 1; ; k, nding all possible pairs of subsequences of A 1, A 2,, or A k such that their mass sum equals M takes O.kn 2 log.kn/ C p log.kn// time and O.kn C p/ space, where p is the number of solutions. The total number of solutions is at most O.k 2 n 3 /. In a protein database of k n size where k À n, Theorem 5 implies that the solutions can be found in the scale of O.k log k/ time. Lemma 3 can also be applied to solve the SSP problem with k sequences, and it requires O.kn 2 log.kn/ C p/ time, similar to that in Theorem 5, but uses O.kn 2 / space. The O.kn/-space algorithm has an advantage in applications where kn is huge, because an O.kn 2 /-space algorithm could easily take up the whole virtual memory Cross-linking structures 3. IDENTIFYING CROSS-LINKED AMINO ACIDS Algorithms in Section 2 produce a list of peptide pairs whose mass sum equals the parent ion mass M. Let a pair of peptides P D.p 1 ; ; p m / and Q D.q 1 ; ; q n / form an H-structure cross-link shown in Fig. 1(a). In Figure 1(a), amino acids p i and q j are connected by a linker introduced by a cross-linking agent. A typical agent, disuccinimidyl glutarate (DSG), cross-links two lysines to form a Lys linker Lys structure. If each peptide sequence has r (r m and r n) such amino acids, there are r 2 possible cross-links. In tandem mass spectrometry, CID fragments each cross-link molecule into one of the four patterns shown in Fig. 2. In Fig. 2(a), the peptide bond p k p kc1, located on the N-terminal side (left) of the cross-linked amino acid (p i ), is broken, and the molecule is fragmented into an N-terminal ion, p 1 p k C, and a C-terminal ion, the H-structure ion Q. C p kc1 p m /. In Fig. 2(b), the peptide bond is located on the C-terminal side (right) of p i, and the molecule is fragmented into an N-terminal ion Q.p 1 p k C / and a C-terminal ion p C kc1 p m. Similarly, if a peptide bond of Q is broken, the molecule has two fragmentation patterns shown in Fig. 2(c) and 2(d). Depending on the cross-linking agent used in the experiment, a linker may contain a peptide bond. If the linker is broken, the H-structure molecule is cleaved into two peptide ions, each of which has a decoration similar to the molecule shown in Fig. 1(c). For simplicity, assume m D n in the following description. Theoretically, there are O.m/ possible ions for each cross-linked P and Q. A tandem mass spectrum is a collection of mass peaks, ideally, each of which corresponds to some ion in Fig Algorithms for identifying cross-linked amino acids De nition 2. Cross-Linked Amino Acids Problem (CLAA). Given an h-mass peak tandem mass spectrum T D.t 1 ; ; t h / and two cross-linked m-amino acid peptides P D.p 1 ; ; p m / and Q D.q 1 ; ; q m /, each having r (r m) amino acids that can be cross-linked, nd the cross-linked amino acids that maximize the value of a given scoring function F.P; Q; T /. There are r 2 cross-linked amino acid pairs corresponding to r 2 possible cross-link structures. For every such structure, we can generate a hypothetical tandem mass spectrum, S D.s 1 ; : : : ; s 4.m 1/ /, from the masses of all the possible fragmented ions in Fig. 2. A scoring function F compares S with T and gives a score based on how well they are correlated. Generally, F considers a match between two mass peaks t i of T and s j of S, if jt i s j j ", where " is the maximum measurement error allowed in an experiment. The higher the number of matches is, the higher the F-score is. For simplicity, consider F counting the number of matches. For every mass peak of S, nding its match in T takes O.log h/ time by binary search. The total number of matches can be determined in O.m log h/ time. Thus, nding the most likely cross-linked amino acid pair takes a total of O.r 2 m log h/ time. However, there is better way to do it.

7 IDENTIFYING PROTEIN CROSS-LINKS 577 FIG. 2. Four fragmentation patterns for a cross-linked molecule in tandem mass spectrometry. Theorem 6. The CLAA problem can be solved in O.m C h/ time and O.m C h/ space if the scoring function F counts the number of matches. Proof. Figures 2(a) (d) shows 4.m 1/ ions for one cross-link, and thus r 2 cross-links have a total of 4.m 1/r 2 ions. However, there are only 8.m 1/ different masses for these ions. Figure 3 shows two types of N- and C-terminal ions after breaking the peptide bond p i p ic1. In Fig. 3(a), the cross-linked amino acid of P locates after p i, and in Fig. 3(b), the cross-linked amino acid of P locates before p i. The C-terminal ion in Fig. 3(a) has a xed mass no matter where the cross-link may be located, as long as it is after p i. So does the N-terminal ion in Figure 3(b). Since P and Q together have 2.m 1/ peptide bonds, there are only 8.m 1/ different masses. Let M. ; T / D 1 indicate the match of a mass with a spectrum T, and M. ; T / D 0 otherwise. In the following steps, we can nd the best cross-link: 1. Calculate ion masses of P :.L N p 1 ; L C p 1 /,,.L N p m 1 ; L C p m 1 / for the N- and C-terminal ions in Fig. 3(a) corresponding to the breaking of peptide bonds.p 1 p 2 /,,.p m 1 p m /, respectively, and similarly,.r N p 1 ; R C p 1 /,,.R N p m 1 ; R C p m 1 / for the N- and C-terminal ions in Figure 3(b). 2. Look up these masses in T for matches: M.L N p i ; T /, M.L C p i ; T /, M.R N p i ; T /, and M.R C p i ; T /, for i D 1; ; m 1.

8 578 CHEN ET AL. FIG. 3. Two types of N- and C-terminal ions after breaking the p i p ic1 peptide bond. 3. For every possible cross-linked amino acid p i, calculate Xi 1 score i D.M.L N p j ; T / C M.L C p j ; T // j D1 C mx.m.rp N j ; T / C M.Rp C j ; T //: jdi Find the maximum score, score a, which indicates the most likely cross-linked amino acid p a. 4. Repeat Steps 1, 2, and 3 and nd the most likely cross-linked amino acid q b for Q. 5. Report the cross-linked amino acids p a and q b. In Step 1, it takes O.m/ time to calculate.l N p 1 ; L C p 1 / and from.l N p 1 ; L C p 1 / to.l N p 2 ; L C p 2 / takes only O.1/ time for adding and subtracting p 2. Thus, Step 1 takes O.m/ time. The matching in Step 2 takes O.m Ch/ time by merging four sorted lists, fl N p i g, fl C p i g, fr N p i g, and fr C p i g, into one sorted list and matching this list with the sorted list of T. Similarly, it takes O.m/ time to calculate score 1 and only additional O.1/ time for score 2, score 3, and so on. Thus, Step 3 takes O.m/ time. Step 4 has similar time complexity. Therefore, nding the cross-linked amino acids takes a total of O.m C h/ time. Although Theorem 6 is for the special F-function that counts the number of matches, this theorem can be generalized for many other functions, such as the correlation function, the convolution function, and so on Speed up searching 4. IMPLEMENTATION AND PROBABILITY Lemma 2 indicates that there may be as many as O.n 3 / different solutions for the Subsequence Sum Problem (SSP), given a pair of protein sequences. For a database of k protein sequences, Theorem 5 tells us that the number could be as large as O.k 2 n 3 /. In fact, it is possible that there are too many solutions for SSP, especially when the database is very large and the number of subsequence pairs grows quadratically to the database size k. On the other hand, if the mass/charge error is large, many peptide pairs could fall into this error range. In both cases, checking all cross-linked peptides using the algorithm in Theorem 6 is very time consuming. To speed up the SSP search, we propose to examine a peptide and determine

9 IDENTIFYING PROTEIN CROSS-LINKS 579 whether it could belong to a solution by comparing its hypothetical cross-linked tandem mass spectrum with the experimental spectrum. Let M be the parent ion mass, T be an experimental spectrum, and F be the scoring function that counts the number of matches between a peptide P and a spectrum T. Then F.P; T / can be computed by Steps 1, 2, and 3 in the algorithm in Theorem 6. Let ± be the minimum matching score (of F.P; T /) for accepting P as one of the peptides of the cross-link. We can speed up the database search by using the following steps: 1. For every peptide subsequence P in the database, compute F.P; T / and save P to a candidate list if F.P ; T / ±. 2. From the candidate list, nd every pair of P and Q satisfying mass.p / C mass.q/ C mass.linker/ D M, and let F.P; Q; T / DF.P ; T /CF.Q; T /. 3. Report the peptide pair with the maximum score. This process could eliminate a large number of peptide pairs giving low matching scores Matching probability Probabilistic models (Qin et al., 1997; Dancik et al., 2000; Bafna and Edwards, 2001) have been proposed for scoring tandem mass spectra against a peptide database. Environmental parameters such as the probability of losing a peak and the probability of a random noise have to be determined for each experiment in order to derive a good scoring function. In our case, we have already known that the experimental spectra came from some peptide in the human hemoglobin protein. In this study, we use a simple scoring function. Let T be an experimental spectrum of t mass peaks, and H be a hypothetical spectrum of h ions. Assume the abundance levels are ignored; otherwise we can set an abundance level threshold to lter out noise. Let the matching range be [ ±; ±] and let the machine detect ions up to r m/z ratio. In our ESI-MS-MS experiments, ± is at most 0.2, h is generally between 20 and 50 ions, r is about 2,000 m/z, and depending on the experiment, t ranges from tens to hundreds of mass peaks. Assume that matching ranges between peaks in H do not overlap. Then the expected (mean) number of matches ¹ between T and H is ¹ D t 2±h r where 2±h=r is the probability that a random mass peak hits an ion of H. If matching ranges overlap, we can replace 2±h by the sum of matching ranges. The number of matches between T and H has a binomial distribution. Since 2±h=r is small (< 0:1), the binomial distribution can be can be approximated by the Poisson distribution ¹ ¹x P.x/ D e x! and the probability of at least m matches can be computed by m 1 X P.x m/ D 1 P.x D i/: id Cross-linking experiments 5. EXPERIMENTS AND DATA ANALYSIS Chemical modi cations and cross-links were introduced into the human blood protein hemoglobin (Hb; Sigma, St. Louis, MO) by reaction with disuccinimidyl glutarate (DSG; Pierce, Rockford, IL). Hemoglobin is a four-subunit protein consisting of two subunits each of two unique polypeptide chains, and. The

10 580 CHEN ET AL. approximate molecular weight of each chain is 15,000 daltons. DSG is a homobifunctional cross-linker which creates a link between two separate lysine residues by inserting a 5-carbon chain between them. Brie y, 50 mm Hb was reacted with 5 mm DSG for one hour at room temperature in the presence of 90 mm sodium phosphate ph 7.5 and 10% DMSO. The reaction was stopped by addition of an equal volume of 1.0 M Tris ph 7.5. An aliquot of this reaction was mixed with 2 volumes of 6M urea and TPCK-treated trypsin (Sigma, St. Louis, MO) was added at a ratio of 1:125 w/w trypsin to Hb. This mixture was allowed to react at 37 ± C overnight. One hundred ¹l of the digestion mixture was subject to HPLC-MS-MS analysis. The sample was loaded onto a YMC ODS-AQ S3¹ 120 Å mm column (Waters, Milford, MA) and the peptides were separated by a gradient of 5 80% acetonitrile (0.03% tri uoroacetic acid and 0.10% acetic acid as ion pairing reagents) at a ow rate of 0.1 ml/min. The eluent was directed to a Finnigan LCQ (Thermoquest, San Jose, CA) ion trap mass spectrometer tted with an ESI source and operated in positive ion mode. Signi cantly ionized peptides were automatically measured for m/z and subjected to CID. Data were automatically collected Data analysis The data were analyzed visually and with several in-house algorithms, as well as the commercial package SEQUEST (Eng et al., 1994). Every tandem mass spectrum is preprocessed by the following steps: 1) close mass peaks that are candidates of isotopic ions are merged and replaced by a new mass peak, and the abundance levels are replaced by their sum; 2) mass peaks with low relative abundance levels are cut off. Then, we implemented the algorithm in Lemma 3 and computed the score between a pair of sequences S and a tandem mass spectrum T using the convolution function. The probability of each top score solution was calculated from the Poisson distribution. Several types of modi cation events were recognized: interpeptide cross-links (Fig. 1(a)), decoration (Fig. 1(b)), and internal cross-links (Fig. 1(c)). Of speci c interest to this work, a cross-link between lysine 82 of both subunits of Hb was detected, as shown in Fig. 4. Structurally, this would correspond to bridging the central channel of the Hb protein. The interresidue distance between the two lysines was FIG. 4. 3D structure of two identical peptides (67-VLGAFSDGLAHLDNLK-82) from two subunits cross-linked in a hemoglobin protein, and the cross-linked lysines (Ks).

11 IDENTIFYING PROTEIN CROSS-LINKS 581 FIG. 5. The tandem mass spectrum of two identical peptides (67-VLGAFSDGLAHLDNLK-82) from two subunits cross-linked in a hemoglobin protein where four b-ions and three y-ions containing the cross-link were identi ed. measured to be 8.18 Å (derived from the PDB le 1A3N), which is comparable with the 7.7 Å spacer length of DSG (Tame and Vallone, 2000). This cross-link was found in two tandem mass spectra, shown in Figs. 5 and 6, and both parent ions have zd3 charges. The masses are equivalent to two molecules of the tryptic peptide 67-VLGAFSDGLAHLDNLK-82 of Hb after being carbamylated (an N-terminal modi cation of 43 daltons due to urea) and cross-linked by the glutarate moiety of the DSG (96 daltons). Fragmented daughter ions from this peptide were observed in the spectrum shown in Fig. 5: four b-ions (ZD1), b7- H 2 O, b9 H 2 O, b11 H 2 O, and b14ch 2 O and three cross-linked y-ions (Z=2), L y2, L y3, and L y15. Using the error range of [ 0:2; 0:2], we compute the probability from the Poisson distribution: P.x 7/ D 2:42E 05. Figure 6 shows four b-ions, b7 H 2 O, b9 H 2 O, b11 H 2 O, and b14 H 2 O and two cross-linked y-ions (Z=2), L y3 and L y15. The probability computed from the Poisson distribution is P.x 6/ D 6:47E DISCUSSION The interchain cross-link is con rmed in two tandem mass spectra both with < 10 4 matching probabilities. The length of the cross-linker (7.7 Å ) is very close to the actual distance (8.18 Å ) obtained from the molecular structure in PDB. Our experiment strongly suggests that protein structure information can be derived via protein cross-linking and tandem mass spectrometry. It should be pointed out that the algorithms suggested here do not consider the case of amino acid modi cations. However, if the modi cation such as phosphorylation is speci ed, the algorithms still work. The underlying chemistry in MS-MS is still not completely understood. There are many mass peaks shown in both Figs. 5 and 6 which have not been interpreted. Some of them may be internal ions of the cross-link, and to verify this, further experiments on MS3 are probably necessary. There are many other complications

12 582 CHEN ET AL. FIG. 6. The tandem mass spectrum of two identical peptides (67-VLGAFSDGLAHLDNLK-82) from two subunits cross-linked in a Hemoglobin protein where four b-ions and two y-ions containing the cross-link were identi ed. in the experimental data because of noise, multiply charged ions, mass measurement errors, calibration errors, internal ions from double fragmentation, other types of ions such as a- and z-ions, and so on. All these can affect the interpretation and some of them are machine-dependent while others are experimentdependent. Also, a good scoring function is critical to judge what is the best interpretation for a tandem mass spectrum. Agreement on the best scoring function and how to use the abundance of information remain unresolved. REFERENCES Bafna, V., and Edwards, N A probablistic model for scoring tandem mass spectra against a peptide database. To appear in Bioinformatics. Chen, T., Kao, M., Tepel, M., Rush, J., and Church, G.M A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comp. Biol. 8(3), Cormen, T.H., Leiserson, C.E., and Rivest, R.L Introduction to Algorithms, MIT Press, Boston. Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E., and Pevzner, P.A De novo peptide sequencing via tandem mass spectrometry: A graph-theoretical approach. J. Comp. Biol. 6, Eng, J.K., McCormack, A.L., and Yates, J.R An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrometry 5, Kellogg, C.B., Kelley, J.J., Stein, C., and Donald, B.R Reducing mass degeneracy in SAR by MS by stable isotopic labeling. J. Comp. Biol. 8(1), Pevzner, P.A., Dancik, V., and Tang, C.L Mutation-tolerantprotein identi cation by mass spectrometry. J. Comp. Biol. 7(6), Qin, J., Fenyo, D., Zhao, Y., Hall, W.W., Chao, D.M., Wilson, C.J., Young, R.A., and Chait, B.T A strategy for rapid, high-con dence protein identi cation. Analytical Chemistry 69,

13 IDENTIFYING PROTEIN CROSS-LINKS 583 Tame, J.R., and Vallone, B The structures of deoxy human haemoglobin and the mutant Hb Tyralpha42His at 120 K. Acta Crystallogr. D. Biol. Crystallogr. 56(7), Taylor, J.A., and Johnson, R.S Sequence database searches via de novo peptide sequencing by tandem mass spectral data. Biomed. Environ. Mass Spectrum. 11, Young, M.M., Tang, N., Hempel, J.C., Oshiro, C.M., Taylor, E.W., Kuntz, I.D., Gibson, B.W., and Dollinger, G High throughput protein fold identi cation by using experimental constraints derived from intramolecular cross-links and mass spectrometry. Proc. Natl. Acad. Sci. USA 97(11), Address correspondence to: Ting Chen Department of Molecular Biology University of Southern California Los Angeles, CA

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry Ting Chen Department of Genetics arvard Medical School Boston, MA 02115, USA Ming-Yang Kao Department of Computer

More information

A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry. BINGWEN LU and TING CHEN ABSTRACT

A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry. BINGWEN LU and TING CHEN ABSTRACT JOURNAL OF COMPUTATIONAL BIOLOGY Volume 10, Number 1, 2003 Mary Ann Liebert, Inc. Pp. 1 12 A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry BINGWEN LU and TING CHEN ABSTRACT

More information

Modeling Mass Spectrometry-Based Protein Analysis

Modeling Mass Spectrometry-Based Protein Analysis Chapter 8 Jan Eriksson and David Fenyö Abstract The success of mass spectrometry based proteomics depends on efficient methods for data analysis. These methods require a detailed understanding of the information

More information

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry 17 th European Symposium on Computer Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 De Novo Peptide Identification Via Mixed-Integer Linear

More information

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia De Novo Peptide Sequencing via Tandem Mass Spectrometry and Propositional Satisfiability Renato Bruni bruni@diei.unipg.it or bruni@dis.uniroma1.it University of Perugia I FIMA International Conference

More information

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were developed to allow the analysis of large intact (bigger than

More information

Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and Identification by Mass Spectrometry Protein Sequencing and Identification by Mass Spectrometry Tandem Mass Spectrometry De Novo Peptide Sequencing Spectrum Graph Protein Identification via Database Search Identifying Post Translationally

More information

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion s s Key questions of proteomics What proteins are there? Bioinformatics 2 Lecture 2 roteomics How much is there of each of the proteins? - Absolute quantitation - Stoichiometry What (modification/splice)

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

Computational Methods for Mass Spectrometry Proteomics

Computational Methods for Mass Spectrometry Proteomics Computational Methods for Mass Spectrometry Proteomics Eidhammer, Ingvar ISBN-13: 9780470512975 Table of Contents Preface. Acknowledgements. 1 Protein, Proteome, and Proteomics. 1.1 Primary goals for studying

More information

CSE182-L8. Mass Spectrometry

CSE182-L8. Mass Spectrometry CSE182-L8 Mass Spectrometry Project Notes Implement a few tools for proteomics C1:11/2/04 Answer MS questions to get started, select project partner, select a project. C2:11/15/04 (All but web-team) Plan

More information

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics John R. Rose Computer Science and Engineering University of South Carolina 1 Overview Background Information Theoretic

More information

Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry

Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry Methods Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry Pavel A. Pevzner, 1,3 Zufar Mulyukov, 1 Vlado Dancik, 2 and Chris L Tang 2 Department of

More information

Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry

Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry Jonathan Williams Waters Corporation, Milford, MA, USA A P P L I C AT

More information

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra Xiaowen Liu Department of BioHealth Informatics, Department of Computer and Information Sciences, Indiana University-Purdue

More information

Tandem Mass Spectrometry: Generating function, alignment and assembly

Tandem Mass Spectrometry: Generating function, alignment and assembly Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate

More information

Identification of proteins by enzyme digestion, mass

Identification of proteins by enzyme digestion, mass Method for Screening Peptide Fragment Ion Mass Spectra Prior to Database Searching Roger E. Moore, Mary K. Young, and Terry D. Lee Beckman Research Institute of the City of Hope, Duarte, California, USA

More information

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1. Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists

More information

Proteomics. November 13, 2007

Proteomics. November 13, 2007 Proteomics November 13, 2007 Acknowledgement Slides presented here have been borrowed from presentations by : Dr. Mark A. Knepper (LKEM, NHLBI, NIH) Dr. Nathan Edwards (Center for Bioinformatics and Computational

More information

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what

More information

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * 1 Department of Chemistry, Pomona College, Claremont, California

More information

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what

More information

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1 BST 226 Statistical Methods for Bioinformatics David M. Rocke January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1 Mass Spectrometry Mass spectrometry (mass spec, MS) comprises a set of instrumental

More information

Biological Mass Spectrometry

Biological Mass Spectrometry Biochemistry 412 Biological Mass Spectrometry February 13 th, 2007 Proteomics The study of the complete complement of proteins found in an organism Degrees of Freedom for Protein Variability Covalent Modifications

More information

Structure of the α-helix

Structure of the α-helix Structure of the α-helix Structure of the β Sheet Protein Dynamics Basics of Quenching HDX Hydrogen exchange of amide protons is catalyzed by H 2 O, OH -, and H 3 O +, but it s most dominated by base

More information

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA LECTURE-25 Quantitative proteomics: itraq and TMT TRANSCRIPT Welcome to the proteomics course. Today we will talk about quantitative proteomics and discuss about itraq and TMT techniques. The quantitative

More information

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4 High-Resolution Mass spectrometry (HR-MS, HRAM-MS) (FT mass spectrometry) MS that enables identifying elemental compositions (empirical formulas) from accurate m/z data 9.05.2017 1 Atomic masses (atomic

More information

Proteome Informatics. Brian C. Searle Creative Commons Attribution

Proteome Informatics. Brian C. Searle Creative Commons Attribution Proteome Informatics Brian C. Searle searleb@uw.edu Creative Commons Attribution Section structure Class 1 Class 2 Homework 1 Mass spectrometry and de novo sequencing Database searching and E-value estimation

More information

DE NOVO PEPTIDE SEQUENCING FOR MASS SPECTRA BASED ON MULTI-CHARGE STRONG TAGS

DE NOVO PEPTIDE SEQUENCING FOR MASS SPECTRA BASED ON MULTI-CHARGE STRONG TAGS DE NOVO PEPTIDE SEQUENCING FO MASS SPECTA BASED ON MULTI-CHAGE STONG TAGS KANG NING, KET FAH CHONG, HON WAI LEONG Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore

More information

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software Supplementary Methods Software Interpretation of Tandem mass spectra Tandem mass spectra were extracted from the Xcalibur data system format (.RAW) and charge state assignment was performed using in house

More information

Chapter 5. Complexation of Tholins by 18-crown-6:

Chapter 5. Complexation of Tholins by 18-crown-6: 5-1 Chapter 5. Complexation of Tholins by 18-crown-6: Identification of Primary Amines 5.1. Introduction Electrospray ionization (ESI) is an excellent technique for the ionization of complex mixtures,

More information

MS-based proteomics to investigate proteins and their modifications

MS-based proteomics to investigate proteins and their modifications MS-based proteomics to investigate proteins and their modifications Francis Impens VIB Proteomics Core October th 217 Overview Mass spectrometry-based proteomics: general workflow Identification of protein

More information

SRM assay generation and data analysis in Skyline

SRM assay generation and data analysis in Skyline in Skyline Preparation 1. Download the example data from www.srmcourse.ch/eupa.html (3 raw files, 1 csv file, 1 sptxt file). 2. The number formats of your computer have to be set to English (United States).

More information

Mass Spectrometry. Hyphenated Techniques GC-MS LC-MS and MS-MS

Mass Spectrometry. Hyphenated Techniques GC-MS LC-MS and MS-MS Mass Spectrometry Hyphenated Techniques GC-MS LC-MS and MS-MS Reasons for Using Chromatography with MS Mixture analysis by MS alone is difficult Fragmentation from ionization (EI or CI) Fragments from

More information

Tutorial 1: Setting up your Skyline document

Tutorial 1: Setting up your Skyline document Tutorial 1: Setting up your Skyline document Caution! For using Skyline the number formats of your computer have to be set to English (United States). Open the Control Panel Clock, Language, and Region

More information

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry Jon Hao, Rong Ye, and Mason Tao Poochon Scientific, Frederick, Maryland 21701 Abstract Background:

More information

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University Protein Quantitation II: Multiple Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional Affinity-based proteomics Use antibodies to quantify proteins Western Blot Immunohistochemistry

More information

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University Protein Quantitation II: Multiple Reaction Monitoring Kelly Ruggles kelly@fenyolab.org New York University Traditional Affinity-based proteomics Use antibodies to quantify proteins Western Blot RPPA Immunohistochemistry

More information

Mass spectrometry in proteomics

Mass spectrometry in proteomics I519 Introduction to Bioinformatics, Fall, 2013 Mass spectrometry in proteomics Haixu Tang School of Informatics and Computing Indiana University, Bloomington Modified from: www.bioalgorithms.info Outline

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

TANDEM MASS SPECTROSCOPY

TANDEM MASS SPECTROSCOPY TANDEM MASS SPECTROSCOPY 1 MASS SPECTROMETER TYPES OF MASS SPECTROMETER PRINCIPLE TANDEM MASS SPECTROMETER INSTRUMENTATION QUADRAPOLE MASS ANALYZER TRIPLE QUADRAPOLE MASS ANALYZER TIME OF FLIGHT MASS ANALYSER

More information

Was T. rex Just a Big Chicken? Computational Proteomics

Was T. rex Just a Big Chicken? Computational Proteomics Was T. rex Just a Big Chicken? Computational Proteomics Phillip Compeau and Pavel Pevzner adjusted by Jovana Kovačević Bioinformatics Algorithms: an Active Learning Approach 215 by Compeau and Pevzner.

More information

Parallel Algorithms For Real-Time Peptide-Spectrum Matching

Parallel Algorithms For Real-Time Peptide-Spectrum Matching Parallel Algorithms For Real-Time Peptide-Spectrum Matching A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master of Science

More information

ZAHID IQBAL WARRAICH

ZAHID IQBAL WARRAICH Q1 Chromatography is an important analytical technique in chemistry. There is a number of techniques under the general heading of chromatography. (a) Paper and gas chromatography rely on partition to separate

More information

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS 1 Yan Yan Department of Computer Science University of Western Ontario, Canada OUTLINE Background Tandem mass spectrometry

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION JOURNAL OF COMPUTATIONAL BIOLOGY Volume 6, Numbers 3/4, 1999 Mary Ann Liebert, Inc. Pp. 327 342 De Novo Peptide Sequencing via Tandem Mass Spectrometry VLADO DAN Ï CÍK, 1, 2 THERESA A. ADDONA, 1 KARL R.

More information

Data pre-processing in liquid chromatography mass spectrometry-based proteomics

Data pre-processing in liquid chromatography mass spectrometry-based proteomics BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 21 25, pages 454 459 doi:1.193/bioinformatics/bti66 Data and text mining Data pre-processing in liquid chromatography mass spectrometry-based proteomics Xiang

More information

Supporting Information. Labeled Ligand Displacement: Extending NMR-based Screening of Protein Targets

Supporting Information. Labeled Ligand Displacement: Extending NMR-based Screening of Protein Targets Supporting Information Labeled Ligand Displacement: Extending NMR-based Screening of Protein Targets Steven L. Swann, Danying Song, Chaohong Sun, Philip J. Hajduk, and Andrew M. Petros Global Pharmaceutical

More information

Yifei Bao. Beatrix. Manor Askenazi

Yifei Bao. Beatrix. Manor Askenazi Detection and Correction of Interference in MS1 Quantitation of Peptides Using their Isotope Distributions Yifei Bao Department of Computer Science Stevens Institute of Technology Beatrix Ueberheide Department

More information

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer 2013 14. Systems Biology Exp. Methods Overview Transcriptomics Basics of microarrays Comparative analysis Interactomics:

More information

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Informatics Research Applied Biosystems Outline Proteomics context Tandem mass spectrometry Peptide fragmentation Peptide identification

More information

LC-MS Based Metabolomics

LC-MS Based Metabolomics LC-MS Based Metabolomics Analysing the METABOLOME 1. Metabolite Extraction 2. Metabolite detection (with or without separation) 3. Data analysis Metabolite Detection GC-MS: Naturally volatile or made volatile

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry by Xi Han A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree

More information

METABOLIC PATHWAY PREDICTION/ALIGNMENT

METABOLIC PATHWAY PREDICTION/ALIGNMENT COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501

More information

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of LECTURE-13 Peptide Mass Fingerprinting HANDOUT PREAMBLE Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of proteins, drugs and many biological moieties to elucidate

More information

Supplementary Material for: Clustering Millions of Tandem Mass Spectra

Supplementary Material for: Clustering Millions of Tandem Mass Spectra Supplementary Material for: Clustering Millions of Tandem Mass Spectra Ari M. Frank 1 Nuno Bandeira 1 Zhouxin Shen 2 Stephen Tanner 3 Steven P. Briggs 2 Richard D. Smith 4 Pavel A. Pevzner 1 October 4,

More information

On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering

On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering Jiří Novák, David Hoksza, Jakub Lokoč, and Tomáš Skopal Siret Research Group, Faculty of Mathematics and Physics, Charles

More information

Purdue-UAB Botanicals Center for Age- Related Disease

Purdue-UAB Botanicals Center for Age- Related Disease Purdue-UAB Botanicals Center for Age- Related Disease MALDI-TOF Mass Spectrometry Fingerprinting Technique Landon Wilson MALDI-TOF mass spectrometry is an advanced technique for rapid protein identification

More information

Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS

Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS Jonathan P. Williams, Jeffery M. Brown, Stephane Houel, Ying Qing Yu, and Weibin Chen Waters Corporation,

More information

TUTORIAL EXERCISES WITH ANSWERS

TUTORIAL EXERCISES WITH ANSWERS TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129

More information

Methods for proteome analysis of obesity (Adipose tissue)

Methods for proteome analysis of obesity (Adipose tissue) Methods for proteome analysis of obesity (Adipose tissue) I. Sample preparation and liquid chromatography-tandem mass spectrometric analysis Instruments, softwares, and materials AB SCIEX Triple TOF 5600

More information

BIOINFORMATICS. Bingwen Lu and Ting Chen INTRODUCTION

BIOINFORMATICS. Bingwen Lu and Ting Chen INTRODUCTION BIOINFORMATICS Vol. 19 Suppl. 2 2003, pages ii113 ii121 DOI: 10.1093/bioinformatics/btg1068 A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific

More information

Peptide Isolation Using the Prep 150 LC System

Peptide Isolation Using the Prep 150 LC System Jo-Ann M. Jablonski and Andrew J. Aubin Waters Corporation, Milford, MA, USA APPLICATION BENEFITS The Prep 150 LC System, an affordable, highly reliable system for preparative chromatography, is suitable

More information

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring High-Throughput Protein Quantitation Using Multiple Reaction Monitoring Application Note Authors Ning Tang, Christine Miller, Joe Roark, Norton Kitagawa and Keith Waddell Agilent Technologies, Inc. Santa

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Hemoglobin (Hb) exists in the blood cells of

Hemoglobin (Hb) exists in the blood cells of Prediction of Product Ion Isotope Ratios in the Tandem Electrospray Ionization Mass Spectra from the Second Isotope of Tryptic Peptides: Identification of the Variant 131 Gln Glu, Hemoglobin Camden Brian

More information

PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search

PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search Yunhu Wan, Austin Yang, and Ting Chen*, Department of Mathematics, Department of Pharmaceutical Sciences, and

More information

Electrospray ionization mass spectrometry (ESI-

Electrospray ionization mass spectrometry (ESI- Automated Charge State Determination of Complex Isotope-Resolved Mass Spectra by Peak-Target Fourier Transform Li Chen a and Yee Leng Yap b a Bioinformatics Institute, 30 Biopolis Street, Singapore b Davos

More information

Propose a structure for an alcohol, C4H10O, that has the following

Propose a structure for an alcohol, C4H10O, that has the following Propose a structure for an alcohol, C4H10O, that has the following 13CNMR spectral data: Broadband _ decoupled 13CNMR: 19.0, 31.7, 69.5 б DEPT _90: 31.7 б DEPT _ 135: positive peak at 19.0 & 31.7 б, negative

More information

Application Note LCMS-112 A Fully Automated Two-Step Procedure for Quality Control of Synthetic Peptides

Application Note LCMS-112 A Fully Automated Two-Step Procedure for Quality Control of Synthetic Peptides Application Note LCMS-112 A Fully Automated Two-Step Procedure for Quality Control of Synthetic Peptides Abstract Here we describe a two-step QC procedure for synthetic peptides. In the first step, the

More information

MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data

MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data Timothy Lee 1, Rahul Singh 1, Ten-Yang Yen 2, and Bruce Macher 2 1 Department

More information

A Statistical Model of Proteolytic Digestion

A Statistical Model of Proteolytic Digestion A Statistical Model of Proteolytic Digestion I-Jeng Wang, Christopher P. Diehl Research and Technology Development Center Johns Hopkins University Applied Physics Laboratory Laurel, MD 20723 6099 Email:

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!) Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!) James A. Mobley, Ph.D. Director of Research in Urology Associate Director of Mass Spectrometry (contact: mobleyja@uab.edu)

More information

ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY

ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY PEDRO ALVES, 1 RANDY J. ARNOLD, 2 MILOS V. NOVOTNY, 2 PREDRAG RADIVOJAC, 1 JAMES P. REILLY, 2 HAIXU TANG 1, 3* 1) School

More information

Tandem MS = MS / MS. ESI-MS give information on the mass of a molecule but none on the structure

Tandem MS = MS / MS. ESI-MS give information on the mass of a molecule but none on the structure Tandem MS = MS / MS ESI-MS give information on the mass of a molecule but none on the structure In tandem MS (MSMS) (pseudo-)molecular ions are selected in MS1 and fragmented by collision with gas. collision

More information

Applications of LC/MS in Qualitative Drug Metabolite Studies

Applications of LC/MS in Qualitative Drug Metabolite Studies Applications of LC/M in Qualitative Drug Metabolite tudies Case tudy: Identifying in vivo metabolites in human urine - sample cleanup and LC/M/M strategies Application detection of drug (Vanlev) metabolites

More information

Reagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure.

Reagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure. DATA SHEET Protein Expression Analysis Reagents Background The ultimate goal of proteomics is to identify and quantify proteins that are relevant to a given biological state; and to unearth networks of

More information

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Lecture Notes for Fall Network Modeling. Ernest Fraenkel Lecture Notes for 20.320 Fall 2012 Network Modeling Ernest Fraenkel In this lecture we will explore ways in which network models can help us to understand better biological data. We will explore how networks

More information

WADA Technical Document TD2015IDCR

WADA Technical Document TD2015IDCR MINIMUM CRITERIA FOR CHROMATOGRAPHIC-MASS SPECTROMETRIC CONFIRMATION OF THE IDENTITY OF ANALYTES FOR DOPING CONTROL PURPOSES. The ability of a method to identify an analyte is a function of the entire

More information

A rapid and highly selective colorimetric method for direct detection of tryptophan in proteins via DMSO acceleration

A rapid and highly selective colorimetric method for direct detection of tryptophan in proteins via DMSO acceleration A rapid and highly selective colorimetric method for direct detection of tryptophan in proteins via DMSO acceleration Yanyan Huang, Shaoxiang Xiong, Guoquan Liu, Rui Zhao Beijing National Laboratory for

More information

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons. Supplementary Figure 1 Fragment indexing allows efficient spectra similarity comparisons. The cost and efficiency of spectra similarity calculations can be approximated by the number of fragment comparisons

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Rapid Distinction of Leucine and Isoleucine in Monoclonal Antibodies Using Nanoflow. LCMS n. Discovery Attribute Sciences

Rapid Distinction of Leucine and Isoleucine in Monoclonal Antibodies Using Nanoflow. LCMS n. Discovery Attribute Sciences Rapid Distinction of Leucine and Isoleucine in Monoclonal Antibodies Using Nanoflow LCMS n Dhanashri Bagal *, Eddie Kast, Ping Cao Discovery Attribute Sciences Amgen, South San Francisco, California, United

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics Xiao-jun Li, Ph.D. Current address: Homestead Clinical Day 4 October 19, 2006 Protein Quantification LC-MS/MS Data XLink mzxml file

More information

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry 389 A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry Ting Chen *t Ming-Yang Kao $ Matthew Tepel* George M. Church* John Rush* Abstract The tandem mass spectrometry

More information

LECTURE-11. Hybrid MS Configurations HANDOUT. As discussed in our previous lecture, mass spectrometry is by far the most versatile

LECTURE-11. Hybrid MS Configurations HANDOUT. As discussed in our previous lecture, mass spectrometry is by far the most versatile LECTURE-11 Hybrid MS Configurations HANDOUT PREAMBLE As discussed in our previous lecture, mass spectrometry is by far the most versatile technique used in proteomics. We had also discussed some of the

More information

De Novo Peptide Sequencing

De Novo Peptide Sequencing De Novo Peptide Sequencing Outline A simple de novo sequencing algorithm PTM Other ion types Mass segment error De Novo Peptide Sequencing b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 A NELLLNVK AN ELLLNVK ANE LLLNVK

More information

Application Note. Edgar Naegele. Abstract

Application Note. Edgar Naegele. Abstract Fast identification of main drug metabolites by quadrupole time-of-flight LC/MS Measuring accurate MS and MS/MS data with the Agilent 651 Q-TOF LC/MS and identification of main meta-bolites by comparison

More information

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data RIPS Team Jake Marcus (Project Manager) Anne Eaton Melanie Kanter Aru Ray Faculty Mentors Shawn Cokus Matteo

More information

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data Anthony J Bonner Han Liu Abstract This paper addresses a central problem of Proteomics: estimating the amounts of each of

More information

1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s).

1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s). Analysis of a Peptide Sequence from a Proteolytic Digest by MALDI-TOF Post-Source Decay (PSD) and Collision-Induced Dissociation (CID) Standard Operating Procedure Purpose: The following procedure may

More information

Profiling of Diferulates (Plant Cell Wall Cross- Linkers) Using Ultrahigh-performance Liquid. Chromatography-Tandem Mass Spectrometry

Profiling of Diferulates (Plant Cell Wall Cross- Linkers) Using Ultrahigh-performance Liquid. Chromatography-Tandem Mass Spectrometry Supporting Information for: Profiling of Diferulates (Plant Cell Wall Cross- Linkers) Using Ultrahigh-performance Liquid Chromatography-Tandem Mass Spectrometry Ramin Vismeh a,b, Fachuang Lu c,d, Shishir

More information

A Workflow Approach for the Identification and Structural Elucidation of Impurities of Quetiapine Hemifumarate Drug Substance

A Workflow Approach for the Identification and Structural Elucidation of Impurities of Quetiapine Hemifumarate Drug Substance A Workflow Approach for the Identification and Structural Elucidation of Impurities of Quetiapine Hemifumarate Drug Substance Michael D. Jones, Marian Twohig, Karen Haas, and Robert S. Plumb Waters Corporation,

More information

Supporting Information. Copyright Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2007

Supporting Information. Copyright Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2007 Supporting Information Copyright Wiley-VCH Verlag GmbH & Co. KGaA, 69451 Weinheim, 2007 Copyright Wiley-VCH Verlag GmbH & Co. KGaA, 69451 Weinheim, 2007 Supporting Information for Head-to-Tail Cyclized

More information

NovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing

NovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing Anal. Chem. 2005, 77, 7265-7273 NovoHMM: A Hidden Markov Model for de Novo Peptide Sequencing Bernd Fischer, Volker Roth, Franz Roos, Jonas Grossmann, Sacha Baginsky, Peter Widmayer, Wilhelm Gruissem,

More information

WADA Technical Document TD2003IDCR

WADA Technical Document TD2003IDCR IDENTIFICATION CRITERIA FOR QUALITATIVE ASSAYS INCORPORATING CHROMATOGRAPHY AND MASS SPECTROMETRY The appropriate analytical characteristics must be documented for a particular assay. The Laboratory must

More information

Study of Non-Covalent Complexes by ESI-MS. By Quinn Tays

Study of Non-Covalent Complexes by ESI-MS. By Quinn Tays Study of Non-Covalent Complexes by ESI-MS By Quinn Tays History Overview Background Electrospray Ionization How it is used in study of noncovalent interactions Uses of the Technique Types of molecules

More information