A Statistical Model of Proteolytic Digestion
|
|
- Hugh Flynn
- 5 years ago
- Views:
Transcription
1 A Statistical Model of Proteolytic Digestion I-Jeng Wang, Christopher P. Diehl Research and Technology Development Center Johns Hopkins University Applied Physics Laboratory Laurel, MD Fernando J. Pineda Dept. of Molecular Microbiology and Immunology Johns Hopkins Bloomberg School of Public Health Baltimore, MD Abstract We present a stochastic model of proteolytic digestion of a proteome, assuming the distribution of parent protein lengths in the proteome, the relative abundances of the 20 amino acids in the proteome, and the digestion rules of the enzyme used in the digestion. We derived a closed form expression for the fragment mass distribution for a large class of enzymes including the widely used Trypsin. The expression uses the distribution of lengths in a mixture of proteins taken from a proteome, as well as the relative abundances of the 20 amino acids in the proteome. The agreement between theory and the in silico digest is excellent. I. INTRODUCTION We present a rigorous statistical model of proteolytic digestion of a proteome, assuming (1) the distribution of parent protein lengths in the proteome, (2) the relative abundances of the 20 amino acids in the proteome, and (3) the digestion rules of the enzyme used in the digestion. Such a statistical model can serve as a model of the null hypothesis to calculate p-values in hypothesis testing for microorganism identification by proteolytic digestion. Current protein identification software (e.g., MOWSE [1, MassSearch [2, and Mascot [3) is based on histograms of digested proteomes, or on models derived from unrealistic assumptions (e.g., a uniform distribution of proteolytic fragments). The proposed model accounts for the fact that digestion products come from the mixture of proteins that constitutes a microorganism s proteome. To facilitate our discussion, we organize digestion models into a taxonomy according to (1) the class of digestion rules obeyed by the enzyme and (2) the order of the Markov model used to model the amino acid sequence. We denote by C(m) the class of enzymes whose digestion rules depend on m adjacent residues. Table I shows a selection of enzymes and their corresponding cleavage rules. In this paper, we present the result assuming that the cleavage rules are deterministic. We also introduce a technique to extend the result to address simple probabilistic digestion rules that account for possible (probabilistic) partial digestions. We classify amino acid sequence models according to the order of the Markov chain used to model the sequence. For example, M(0) denotes the simplest sequence models assuming that the adjacent amino acid residues are independent; M(1) denotes the next most complex models assuming that the next residue depends on the previous amino acid residue; and so forth. Our methodology views the enzymatic digestion problem as a regenerative process to which we can apply Wald s Digesting Reagent Class C(1) Cleavage Rules Armillaria Before {C K } AspN Before D Thermolysin Before {F I L M V W} Clostripain After R LysC After K Pancreatic Elastase After {A G S V} Digesting Reagent Class C(2) Cleavage Rules Chymotrypsin After {F L M W Y} if not followed by P Post Proline After P if not followed by P Trypsin After {K R} if not followed by P V8 Ammonium Acetate After E if not followed by P V8 Phosphate Buffer After {D E} if not followed by P CNBr Cys { Before C After M } Hydroxylamine After N if followed by G Mild Acid Hydrolysis After D if followed by P TABLE I CLEAVAGE RULES FOR SELECTED REAGENTS equation and its generalizations to simplify the computation of the fragment mass distribution. In particular, when overlaid with a cleavage process (either C(1) or C(2)), a protein sequence with models M(0) can be equivalently modeled as a regenerative process with the cleavage sites as the regeneration points. Hence the regenerative cycles (the fragments) are i.i.d.. Using Wald s first lemma and its extensions [4, [5, the fragment mass distribution can be conveniently partitioned into two terms: the length distribution and the conditional mass probability density. In this paper, we present results for {C(1), M(0)} and a subset of {C(2), M(0)} with the class of enzymes C P (2) that exhibit Proline blocking. The class C P (2) includes the widely used enzyme, Trypsin, which cleaves after Lysine or Arginine unless followed by Proline. The same technique can be applied to derive expressions for general models in {C(2), M(0)}. The agreement between theory and the in silico digest is excellent as illustrated by the examples given in Section 4. Note that a summary of part of the results given here was presented in [6 as a poster. II. MODELS AND DECOMPOSITION Our objective is to analytically compute the expected number of proteolytic fragments with mass in an arbitrary mass interval (m m, m + m), denoted by H(m, m). We assume that H(m, m) depends only on the parent protein length distribution in the database of interest, a probabilistic
2 protein sequence model derived from the database, and the chosen enzyme digestion rule. Since the parent proteins are finite in length, application of digestions rules to a finite parent protein can lead to two distinct classes of fragments: fragments ending at a cleavage site and fragments not ending at a cleavage site. Here the cleavage site is referred to a specific amino acid that defines a cleavage following a enzyme digestion rule. For example, amino acids C and K are both cleavage sites for fragments from digesting enzyme Armillaria (see Table I). In this paper, we only focus on the derivation of H(m, m) contributed by fragments ending at a cleavage site. The contribution by the other class of fragments is less significant and can be more easily derived since there can be at most one of such fragment from every single parent protein. A. Sequence and fragment models Throughout the paper, we assume that the protein sequence is i.i.d.. That is, we only consider class M(0) models. We use S = X 1 X 2 X 3 to denote an infinite random protein sequence, where the X i s are random variables taking value from a finite alphabet A (the set of 20 amino acids). We use p a, to denote P {X n = a}, and p C, C A, to denote P {X n C}. We assume that p a s are estimated from the relative amino acid abundance in the database of interest. We consider C(1) and C P (2) enzyme classes. For C(1) enzymes, cleavage occurs at sites adjacent to amino acids in the set C. For C P (2) enzymes, cleavage occurs after any amino acid in C if the following amino acid is not Proline. To facilitate our discussion, we will use {F n } to denote a generic random fragment sequence resulting from the application of a cleavage rule to a random protein sequence S. Here, we provide an informal discussion on why {F n } is i.i.d. except for the first and possibly the last fragments (a rigorous proof can be easily constructed following the discussion ). We first address digestion rules in C(1). Given that amino acids in M(0) protein models are i.i.d., fragment mass distributions do not depend on which end of the protein we choose as the N-terminus. Therefore, without loss of generality, we can assume that for class C(1) enzymes, cleavage sites always occur after (instead of before as indicated in Table 1 for some C(1) enzymes) an amino acid in C. For models in {C(1), M(0)}, it is clear that the cleavage event {X n C} is a regenerative point for the protein sequence S. Hence the process between cleavage (that is, the fragments {F n }) is i.i.d.. For models in {C(2), M(0)}, a similar conclusion can be drawn by constructing an equivalent two-step model for the protein sequence. Figure 1 illustrates such a model for digestion with Trypsin for which a cleavage occurs after K or R unless followed by P. In step one, we first determine whether the amino acid is P or not (occurs with probability p P ). In step two, we determine the specific amino acid conditioned on the outcome of the step one. It is obvious that the protein sequence generated by this two-step model is statistically equivalent to a sequence S generated by a model in M(0). With this equivalent protein model, the digestion rule (with Proline blocking) can be applied right after the step one to determine cleavage Fig. 1. A two-step process to generating an i.i.d protein sequence and fragmentation by Trypsin. sites. It can be seen that the cleavage sites are regenerative points and an i.i.d. fragment process {F n } can be defined from the regenerative cycles. Note that this leads to a very minor approximation since the first fragment F 1 has a different distribution at the first amino acid (the last fragment could have a different distribution as well if it ends exactly at the last amino acid in the parent protein). Given that {F n } is i.i.d. (or approximately i.i.d. as for the C P (2) models), we will use F to denote a generic random fragment in the sequel. B. Decomposition of mass distribution Based on the i.i.d. property of fragments established in the last section, we can establish a decomposition that enable us to treat the fragment length distribution and mass probability density separately in the derivation of H(m, m). This is accomplished by applying Wald s first lemma and its extensions [4, [5. Here we first state the decomposition: where H(m, m) N = g(k)p { M(F ) m < m L(F )=k}, k=1 g(k) = N N=k n(n)l N,k, and n(n) is the number of proteins of length N in the database of interest; L N,k is the expected number of fragments of length k from a parent protein sequence of length N; M(F ) denotes the mass of F ; and L(F ) denotes the length of F. This decomposition described by (1) significantly simplifies the computation of H(m, m) in that it separates the calculation of fragment length distribution (g(k)), from the mass distribution of a random fragment (P { M(F ) m < m L(F )=k}). To prove the decomposition (1), we first observe that H(m, m) can be derived by breaking it into terms based on fragment and parent protein lengths. Define H N,k (m, m) as the expected number of fragments in the mass range that come from parent proteins with length N and has length k. Then (1)
3 we can write H(m, m) as H(m, m) = N N=1 n(n) N H N,k (m, m). (2) k=1 Hence we can focus our discussion on the decomposition of H N,k (m, m). We first provide an expression for L N,k (expected number of fragments with length k from protein with length N) by applying the Wald s equation. Define a stopping time associated with the fragment process {F n } by τ n min{k : k L(F i ) n}. Using the indicator function we can write L N,k as [ τn k+1 L N,k = E I{L(F i )=k}, (3) where I{} denote the indicator function. Note that the proper choice of stopping time (τ N k+1 ) is important to avoid overor under-counting of fragments. Since {F n } is i.i.d., we can apply the Wald s equation to (3) and obtain L N,k = E(τ N k+1 )P {L(F )=k}. (4) Following the same argument, we can also obtain H N,k (m, m) = E(τ N k+1 )P { M(F ) m < m, L(F )=k} = E(τ N k+1 )P {L(F )=k} P { M(F ) m < m L(F )=k}. By substituting (4) into the above equation we obtain the decomposition for H N,k (m, m): H N,k (m, m) =L N,k P { M(F ) m < m L(F )=k}. (5) The desired decomposition (1) can be easily derived by substituting (5) into (2). III. FRAGMENT MASS DISTRIBUTIONS Based on the decomposition (1) we can obtain the closed form expression for the fragment mass distribution H(m), for models in both {C(1), M(0)} and {C P (2), M(0)}, by deriving expressions for L N,k and P { M(F ) m < m L(F )=k}. A. Digestion model {C(1), M(0)} Given a random protein sequence of length N, S N = X 1 X 2 X N, L N,k for models in {C(1), M(0)} can be derived as [ N k+1 L N,k = E I {X i...x i+k 1 is a fragment } = P {X 1,...,X k 1 C, X k C}+ N k+1 i=2 P {X i 1 C,X i,...,x i+k 2 C,X i+k 1 C}. By deriving the needed probabilities based on the M(0) sequence model, we can obtain L N,k =[(N k)p C +1(1 p C ) k 1 p C. (6) To derive the conditional mass probability P { M(F ) m < m L(F ) = k}, we assume that the mass of a fragment depends only on its amino acid composition. We represent the composition of a fragment F by a 20-dimensional vector of nonnegative integers, denoted by C(F )=[n 1,...,n 20, where n i N {0} is the number of occurrences of the i-th amino acid (assuming that a fixed ordering is defined on A) inf. We will use n =[n 1,...,n 20 to denote a generic composition vector. We assume that the mass of a composition (and hence the mass of the fragment with the particular composition) is determined by 20 M(n) =m H2O + m i n i, where m i is the mass of the i-th amino acid. Note that we can also generalize this model to incorporate uncertainty in amino acid masses modeled by appropriate distributions. Following the above assumptions, we can write the desired conditional mass distribution as I{ M(n) m < m}p {C(F )=n L(F)=k}, n =k i C n (7) where n i n i. Note that the indicator function in (7) is deterministic and should be replaced by a probability if a probabilistic model is used to define the mass of a composition. Based on (7) the closed-form expression for the conditional mass probability can be derived as n =k i C n I{ M(n) m < m}(k 1)! where ˆp i is defined by ˆp i = p i z i = z i { p C if i C, 1 p C otherwise. ( 20 ) ˆp ni i, (8) n i! And again the indicator function can be replaced by a probability to incorporate uncertainty in amino acid masses. B. Digestion model {C P (2), M(0)} To derive L N,k, we again use indicator functions to obtain [ N k+1 L N,k = E I {X i...x i+k 1 is a fragment } = N k+1 P {X i...x i+k 1 is a fragment}. However, the derivation of the probabilities are not as straightforward due to the blockings of cleavage by proline. Let us
4 denote the set of amino acid pairs that represent unblocked cleavage sites by C NB. That is C NB {XX : X C,X P}. Then we can derive the needed probabilities using Markovian arguments. Since the protein sequence is i.i.d., we have P {X 1...X k is a fragment} = P {X 1 X 2 C NB } k 1 j=2 P {X k X k+1 C NB X k 1 X k C NB } ; for i =2,...,N k, wehave P {X i...x i+k 1 is a fragment} = P {X i 1 X i C NB } P {X i X i+1 C NB X i 1 X i C NB } and i+k 2 j=i+1 P {X i+k 1 X i+k C NB X i+k 2 X i+k 1 C NB } ; P {X N k+1...x N is a fragment} = P {X N k X N k+1 C NB } P {X N k+1 X N k C NB X N k X N k+1 C NB } N 1 j=n k+2 P {X N C X N 1 X N C NB }. Deriving the needed conditional probabilities and substituting them into the above equations we can obtain [ ( L N,k = α 1+ N k + p ) P p C (1 p 1 p C) k 1 p C, P (9) where p P p C = p C [1 1 p C (1 p P ) [ α = (1 p C {1 ) 1 p C, 1 p C 1 p C (1 p P ) }. Note that p C <p C as long as p P > 0. Furthermore, when p P is small, (9) can be approximated quite accurately by L N,k [1 + (N k) p C(1 p C) k 1 p C, which is identical to the length distribution for C(1) enzymes given in (6) but with scaled cleavage probability p C. To compute the conditional mass probability, we first decompose the probability based on the composition of the fragment composition as in (7): I{ M(n) m < m}p {C(F )=n L(F)=k}. n =k 1 i C ni<np (10) (theory- in_silico)/sqrt(theory) fragments per bin (1Da bins) mass (Da) 800 in silico theory Fig. 2. Comparison between analytical expression and in silico digestion for E. Coli proteome over a 100 Da window. Note that the number of proline occurrences, n P, needs to be larger than or equal to ( i C n i) 1 in order to block the cleavages in the interior of a fragment. The conditional composition probability P {C(F ) = n L(F ) = k} can be derived by treating each case with a specific number of blocked cleavages separately. C. Accounting for partial digestions The presented results for deterministic digestion can be easily extended to account for partial digestions modelled with a simple probabilistic digestion model. Consider the case for C(1) models under which cleavages occurs after every amino acid in C. Assume now that each cleavage after an amino acid a C occurs only with probability q a (q a < 1 implies partial digestions). The presented results can be applied by expanding the set A to include one additional fictitious amino acid a + for each a C.Fora C, the probability is re-defined as q a p a. For the additional amino acid a +, the probability is (1 q a )p a. The mass of the additional amino acid a + is the same as the mass of its associated original amino acid a. With this expanded alphabet and modified protein sequence model, the presented result can be applied directly as if the digestions are deterministic (with the same C). Note that the same technique applies to models in C P (2) as well. IV. NUMERICAL RESULTS A comparison of analytic and in-silico results is shown in Figure 2. We use an E. coli proteome taken from the SWISSPROT database. The proteins were shuffled to destroy sequential correlations. The analytic calculation is faster than 1000
5 the corresponding in silico digestion. Except for unexplained excess dispersion at 450 Da, the residuals between the in-silco and analytic results are consistent with Poisson statistics. It is clear from this figure that the mass distribution can vary by several orders of magnitude over just few Daltons. This result leads us to believe that heuristically derived mass distributions are likely to result in biased p-values in hypothesis tests. This is especially true for small peptides. It becomes a minor effect in heavier peptides. It is typically recommended that protein identification be performed using peptides heavier than 500 Da due to lack of selectivity of the lighter peptides. Our results suggest that, a more careful treatment of the mass distribution below 500Da, could make this mass range more informative. ACKNOWLEDGMENT This work was supported by the JHU/APL Independent Research and Development Program. REFERENCES [1 D. J. C. Pappin, P. Hojrup, and A. J. Bleasby, Rapid Identification of Proteins by Peptide-mass Fingerprinting, Current Biology, vol. 3, no. 6, 1993, pp [2 P. James, M. Quadroni, E. Carafoli, and G. Gonnet, Protein Identification by Mass Profile Fingerprinting, Biochemical and Biophysical Research Communications, 195(1), August 1993, pp [3 D. N. Perkins, D. J. C. Pappin, D. M. Creasy, and J. S. Cottrell, Probability-based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data, Electrophoresis, 20, 1999, pp [4 A. Wald, Sequential Tests of Statistical Hypotheses, Annals of Mathematical Statistics, 16, 1945, pp [5 G.V. Moustakides, Extensions of Wald s first lemma to Markov processes, Journal of Applied Probability, 36(1), 1999, pp [6 I-J. Wang, C. P. Diehl, and F. J. Pineda, A statistical model of proteolytic digestion, conference poster, IEEE Computer Society Bioinformatics Conference, Stanford, CA, August 2003.
Identification of proteins by enzyme digestion, mass
Method for Screening Peptide Fragment Ion Mass Spectra Prior to Database Searching Roger E. Moore, Mary K. Young, and Terry D. Lee Beckman Research Institute of the City of Hope, Duarte, California, USA
More informationNPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA
LECTURE-25 Quantitative proteomics: itraq and TMT TRANSCRIPT Welcome to the proteomics course. Today we will talk about quantitative proteomics and discuss about itraq and TMT techniques. The quantitative
More informationLECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of
LECTURE-13 Peptide Mass Fingerprinting HANDOUT PREAMBLE Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of proteins, drugs and many biological moieties to elucidate
More informationMass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were
Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were developed to allow the analysis of large intact (bigger than
More informationPackage cleaver. August 2, 2013
Package cleaver August 2, 2013 Version 0.99.5 Date 2013-07-24 Title Cleavage of polypeptide sequences Maintainer Sebastian Gibb Depends R (>= 3.0.0), methods, Biostrings (>= 1.29.8)
More informationTutorial 1: Setting up your Skyline document
Tutorial 1: Setting up your Skyline document Caution! For using Skyline the number formats of your computer have to be set to English (United States). Open the Control Panel Clock, Language, and Region
More informationMS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10
MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 3: Regenerative Processes Contents 3.1 Regeneration: The Basic Idea............................... 1 3.2
More informationModeling Mass Spectrometry-Based Protein Analysis
Chapter 8 Jan Eriksson and David Fenyö Abstract The success of mass spectrometry based proteomics depends on efficient methods for data analysis. These methods require a detailed understanding of the information
More informationPackage cleaver. December 13, 2018
Version 1.20.0 Date 2017-06-08 Title Cleavage of Polypeptide Sequences Package cleaver December 13, 2018 Maintainer Sebastian Gibb Depends R (>= 3.0.0), methods, Biostrings (>=
More information4 Examples of enzymes
Catalysis 1 4 Examples of enzymes Adding water to a substrate: Serine proteases. Carbonic anhydrase. Restrictions Endonuclease. Transfer of a Phosphoryl group from ATP to a nucleotide. Nucleoside monophosphate
More informationAtomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4
High-Resolution Mass spectrometry (HR-MS, HRAM-MS) (FT mass spectrometry) MS that enables identifying elemental compositions (empirical formulas) from accurate m/z data 9.05.2017 1 Atomic masses (atomic
More informationProtein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems
Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Informatics Research Applied Biosystems Outline Proteomics context Tandem mass spectrometry Peptide fragmentation Peptide identification
More informationProteomics. November 13, 2007
Proteomics November 13, 2007 Acknowledgement Slides presented here have been borrowed from presentations by : Dr. Mark A. Knepper (LKEM, NHLBI, NIH) Dr. Nathan Edwards (Center for Bioinformatics and Computational
More informationProtein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.
Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists
More informationComputational Methods for Mass Spectrometry Proteomics
Computational Methods for Mass Spectrometry Proteomics Eidhammer, Ingvar ISBN-13: 9780470512975 Table of Contents Preface. Acknowledgements. 1 Protein, Proteome, and Proteomics. 1.1 Primary goals for studying
More informationBENG 183 Trey Ideker. Protein Sequencing
BENG 183 Trey Ideker Protein Sequencing The following slides borrowed from Hong Li s Biochemistry Course: www.sb.fsu.edu/~hongli/4053notes Introduction to Proteins Proteins are of vital importance to biological
More informationSERVA ICPL Kit (Cat.-No )
INSTRUCTION MANUAL SERVA ICPL Kit (Cat.-No. 39230.01) SERVA Electrophoresis GmbH Carl-Benz-Str. 7 D-69115 Heidelberg Phone +49-6221-138400, Fax +49-6221-1384010 e-mail: info@serva.de http://www.serva.de
More informationProf. Jason Kahn Your Signature: Exams written in pencil or erasable ink will not be re-graded under any circumstances.
1 Biochemistry 461 February 16, 1995 Exam #1 Prof. Jason Kahn Your Printed Name: Your SS#: Your Signature: You have 75 minutes for this exam. Exams written in pencil or erasable ink will not be re-graded
More informationKey questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion
s s Key questions of proteomics What proteins are there? Bioinformatics 2 Lecture 2 roteomics How much is there of each of the proteins? - Absolute quantitation - Stoichiometry What (modification/splice)
More informationFigure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent
Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent concentrations of PcTS (100 µm, blue; 500 µm, green; 1.5 mm,
More informationA Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry
A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry Ting Chen Department of Genetics arvard Medical School Boston, MA 02115, USA Ming-Yang Kao Department of Computer
More informationLecture 15: Realities of Genome Assembly Protein Sequencing
Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing
More informationA Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics
A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics Yan Fu,2, Qiang Yang 3, Charles X. Ling 4, Haipeng Wang, Dequan Li, Ruixiang Sun 2, Hu Zhou 5, Rong Zeng 5, Yiqiang Chen, Simin
More informationProbabilistic Arithmetic Automata
Probabilistic Arithmetic Automata Applications of a Stochastic Computational Framework in Biological Sequence Analysis Inke Herms PhD thesis defense Overview 1 Probabilistic Arithmetic Automata 2 Application
More informationHigh-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning
High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning Dongmei Yang, Kevin Ramkissoon, Eric Hamlett, and Morgan C. Giddings*,, Department of Microbiology and Immunology,
More informationDe Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry
17 th European Symposium on Computer Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 De Novo Peptide Identification Via Mixed-Integer Linear
More informationChapter 2 What are the Common Mass Spectrometry-Based Analyses Used in Biology?
Chapter 2 What are the Common Mass Spectrometry-Based Analyses Used in Biology? Abstract Mass spectrometry is used in many field of research, such as biology, chemistry, geology, etc. The focus of this
More informationBioSystems 109 (2012) Contents lists available at SciVerse ScienceDirect. BioSystems
BioSystems 9 (22) 79 85 Contents lists available at SciVerse ScienceDirect BioSystems journa l h o me pa g e: www.elsevier.com/locate/biosystems Periodic patterns in distributions of peptide masses Shane
More informationAn Analysis of Top Trading Cycles in Two-Sided Matching Markets
An Analysis of Top Trading Cycles in Two-Sided Matching Markets Yeon-Koo Che Olivier Tercieux July 30, 2015 Preliminary and incomplete. Abstract We study top trading cycles in a two-sided matching environment
More informationPC235: 2008 Lecture 5: Quantitation. Arnold Falick
PC235: 2008 Lecture 5: Quantitation Arnold Falick falickam@berkeley.edu Summary What you will learn from this lecture: There are many methods to perform quantitation using mass spectrometry (any method
More informationMethods for proteome analysis of obesity (Adipose tissue)
Methods for proteome analysis of obesity (Adipose tissue) I. Sample preparation and liquid chromatography-tandem mass spectrometric analysis Instruments, softwares, and materials AB SCIEX Triple TOF 5600
More informationYifei Bao. Beatrix. Manor Askenazi
Detection and Correction of Interference in MS1 Quantitation of Peptides Using their Isotope Distributions Yifei Bao Department of Computer Science Stevens Institute of Technology Beatrix Ueberheide Department
More informationMULTI-RESTRAINED STIRLING NUMBERS
MULTI-RESTRAINED STIRLING NUMBERS JI YOUNG CHOI DEPARTMENT OF MATHEMATICS SHIPPENSBURG UNIVERSITY SHIPPENSBURG, PA 17257, U.S.A. Abstract. Given positive integers n, k, and m, the (n, k)-th m- restrained
More informationChange-point models and performance measures for sequential change detection
Change-point models and performance measures for sequential change detection Department of Electrical and Computer Engineering, University of Patras, 26500 Rion, Greece moustaki@upatras.gr George V. Moustakides
More informationDe novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu
De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra Xiaowen Liu Department of BioHealth Informatics, Department of Computer and Information Sciences, Indiana University-Purdue
More informationRapid and Sensitive Fluorescent Peptide Quantification Using LavaPep
Rapid and Sensitive Fluorescent Peptide Quantification Using LavaPep Purpose To develop a fast, sensitive and robust fluorescent-based assay, to quantify peptides that is compatible with downstream proteomics
More informationChapter 4. strategies for protein quantitation Ⅱ
Proteomics Chapter 4. strategies for protein quantitation Ⅱ 1 Multiplexed proteomics Multiplexed proteomics is the use of fluorescent stains or probes with different excitation and emission spectra to
More informationMS-MS Analysis Programs
MS-MS Analysis Programs Basic Process Genome - Gives AA sequences of proteins Use this to predict spectra Compare data to prediction Determine degree of correctness Make assignment Did we see the protein?
More informationIdentifying the proteome: software tools David Fenyö
391 Identifying the proteome: software tools David Fenyö The interest in proteomics has recently increased dramatically and proteomic methods are now applied to many problems in cell biology. The method
More informationProteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?
Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains
More informationLecture 8 (9/29/17) Lecture 8 (9/29/17)
Reading: Ch3; 97-102 Lecture 8 (9/29/17) Problems: Ch3 (text); 18, 19, 20, 23 Ch3 (Study guide); 9 Ch4 (Study guide); 3 NEXT Reading: Ch4; 119-122, 125-126, 131-133 Ch4; 123-124, 130-131, 133, 137-138
More informationBioinformatics 2. Yeast two hybrid. Proteomics. Proteomics
GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein
More informationI will post Homework 1 soon, probably over the weekend, due Friday, September 30.
Random Variables Friday, September 09, 2011 2:02 PM I will post Homework 1 soon, probably over the weekend, due Friday, September 30. No class or office hours next week. Next class is on Tuesday, September
More informationAlternative Characterization of Ergodicity for Doubly Stochastic Chains
Alternative Characterization of Ergodicity for Doubly Stochastic Chains Behrouz Touri and Angelia Nedić Abstract In this paper we discuss the ergodicity of stochastic and doubly stochastic chains. We define
More information14 Random Variables and Simulation
14 Random Variables and Simulation In this lecture note we consider the relationship between random variables and simulation models. Random variables play two important roles in simulation models. We assume
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationA Markov chain approach to quality control
A Markov chain approach to quality control F. Bartolucci Istituto di Scienze Economiche Università di Urbino www.stat.unipg.it/ bart Preliminaries We use the finite Markov chain imbedding approach of Alexandrou,
More informationDevelopment and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu
Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data by Han Liu A thesis submitted in conformity with the requirements for the degree of Master of Science
More informationEnzymes I. Dr. Mamoun Ahram Summer semester,
Enzymes I Dr. Mamoun Ahram Summer semester, 2017-2018 Resources Mark's Basic Medical Biochemistry Other resources NCBI Bookshelf: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=books The Medical Biochemistry
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationChem 250 Unit 1 Proteomics by Mass Spectrometry
Chem 250 Unit 1 Proteomics by Mass Spectrometry Article #1 Quantitative MS for proteomics: teaching a new dog old tricks. MacCoss MJ, Matthews DE., Anal Chem. 2005 Aug 1;77(15):294A-302A. 1. Synopsis 1.1.
More informationStephen Scott.
1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative
More informationEffective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry
Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry by Xi Han A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree
More informationMixture Mode for Peptide Mass Fingerprinting ASMS 2003
Mixture Mode for Peptide Mass Fingerprinting ASMS 2003 1 Mixture Mode: New in Mascot 1.9 All peptide mass fingerprint searches now test for the possibility that the sample is a mixture of proteins. Mascot
More informationGeneral Theory of Large Deviations
Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations
More informationBasic Definitions: Indexed Collections and Random Functions
Chapter 1 Basic Definitions: Indexed Collections and Random Functions Section 1.1 introduces stochastic processes as indexed collections of random variables. Section 1.2 builds the necessary machinery
More informationAdventures in Stochastic Processes
Sidney Resnick Adventures in Stochastic Processes with Illustrations Birkhäuser Boston Basel Berlin Table of Contents Preface ix CHAPTER 1. PRELIMINARIES: DISCRETE INDEX SETS AND/OR DISCRETE STATE SPACES
More informationA Criterion for the Stochasticity of Matrices with Specified Order Relations
Rend. Istit. Mat. Univ. Trieste Vol. XL, 55 64 (2009) A Criterion for the Stochasticity of Matrices with Specified Order Relations Luca Bortolussi and Andrea Sgarro Abstract. We tackle the following problem:
More informationLecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321
Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process
More informationHigh-Throughput Protein Quantitation Using Multiple Reaction Monitoring
High-Throughput Protein Quantitation Using Multiple Reaction Monitoring Application Note Authors Ning Tang, Christine Miller, Joe Roark, Norton Kitagawa and Keith Waddell Agilent Technologies, Inc. Santa
More informationHOWTO, example workflow and data files. (Version )
HOWTO, example workflow and data files. (Version 20 09 2017) 1 Introduction: SugarQb is a collection of software tools (Nodes) which enable the automated identification of intact glycopeptides from HCD
More informationTUTORIAL EXERCISES WITH ANSWERS
TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129
More informationNotes on Paths, Trees and Lagrange Inversion
Notes on Paths, Trees and Lagrange Inversion Today we are going to start with a problem that may seem somewhat unmotivated, and solve it in two ways. From there, we will proceed to discuss applications
More informationMS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data
MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data Timothy Lee 1, Rahul Singh 1, Ten-Yang Yen 2, and Bruce Macher 2 1 Department
More informationLecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.
1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if
More informationJiří Novák and David Hoksza
ParametrisedHausdorff HausdorffDistance Distanceas asa Non-Metric a Non-Metric Similarity Similarity Model Model for Tandem for Tandem Mass Mass Spectrometry Spectrometry Jiří Novák and David Hoksza Jiří
More informationC a h p a t p e t r e r 6 E z n y z m y e m s
Chapter 6 Enzymes 1. An Introduction to Enzymes Enzymes are catalytically active biological macromolecules Enzymes are catalysts of biological systems Almost every biochemical reaction is catalyzed by
More informationComprehensive support for quantitation
Comprehensive support for quantitation One of the major new features in the current release of Mascot is support for quantitation. This is still work in progress. Our goal is to support all of the popular
More informationarxiv: v1 [stat.ml] 23 Oct 2016
Formulas for counting the sizes of Markov Equivalence Classes Formulas for Counting the Sizes of Markov Equivalence Classes of Directed Acyclic Graphs arxiv:1610.07921v1 [stat.ml] 23 Oct 2016 Yangbo He
More informationSpectrum-to-Spectrum Searching Using a. Proteome-wide Spectral Library
MCP Papers in Press. Published on April 30, 2011 as Manuscript M111.007666 Spectrum-to-Spectrum Searching Using a Proteome-wide Spectral Library Chia-Yu Yen, Stephane Houel, Natalie G. Ahn, and William
More informationParameter Discovery for Stochastic Biological Models against Temporal Behavioral Specifications using an SPRT based Metric for Simulated Annealing
Parameter Discovery for Stochastic Biological Models against Temporal Behavioral Specifications using an SPRT based Metric for Simulated Annealing Faraz Hussain, Raj Gautam Dutta, Sumit Kumar Jha Electrical
More informationQualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)
Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!) James A. Mobley, Ph.D. Director of Research in Urology Associate Director of Mass Spectrometry (contact: mobleyja@uab.edu)
More informationUnder strongly acidic conditions at ph = 1 every functional group in phosphoserine that can pick up a proton, does.
1. (48 pts) a. L-Phosphoserine is a modified amino acid that is generated in proteins by phosphorylation of serine residues. The amino acid side chain has two acidic protons, which exhibit different pk
More informationDiscrete Probability Refresher
ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015
More informationPeptides And Proteins
Kevin Burgess, May 3, 2017 1 Peptides And Proteins from chapter(s) in the recommended text A. Introduction B. omenclature And Conventions by amide bonds. on the left, right. 2 -terminal C-terminal triglycine
More informationSolutions to Homework Discrete Stochastic Processes MIT, Spring 2011
Exercise 1 Solutions to Homework 6 6.262 Discrete Stochastic Processes MIT, Spring 2011 Let {Y n ; n 1} be a sequence of rv s and assume that lim n E[ Y n ] = 0. Show that {Y n ; n 1} converges to 0 in
More informationAdvanced Computer Networks Lecture 2. Markov Processes
Advanced Computer Networks Lecture 2. Markov Processes Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/28 Outline 2/28 1 Definition
More informationA NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM
J. Appl. Prob. 49, 876 882 (2012 Printed in England Applied Probability Trust 2012 A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM BRIAN FRALIX and COLIN GALLAGHER, Clemson University Abstract
More informationMASS SPECTROMETRY. Topics
MASS SPECTROMETRY MALDI-TOF AND ESI-MS Topics Principle of Mass Spectrometry MALDI-TOF Determination of Mw of Proteins Structural Information by MS: Primary Sequence of a Protein 1 A. Principles Ionization:
More informationQuantitative Proteomics
Quantitative Proteomics Quantitation AND Mass Spectrometry Condition A Condition B Identify and quantify differently expressed proteins resulting from a change in the environment (stimulus, disease) Lyse
More informationP i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=
2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]
More informationMass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University
Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously Proteomics Sample prep 144 Lecture 5 Quantitation techniques Search Algorithms Proteomics
More informationFirst Problem Set, Due 30 Sep. (1st midterm is 7 Oct.) If you want it graded by 6 Oct, turn in on or before 30 Sep. at 5 pm.
First Problem Set, Due 30 Sep. (1st midterm is 7 ct.) If you want it graded by 6 ct, turn in on or before 30 Sep. at 5 pm. Show your work calculations / steps!! Problems from Chapter 1 1 EM of a eukaryotic
More informationQuantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry
Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry Jon Hao, Rong Ye, and Mason Tao Poochon Scientific, Frederick, Maryland 21701 Abstract Background:
More informationWhen do diffusion-limited trajectories become memoryless?
When do diffusion-limited trajectories become memoryless? Maciej Dobrzyński CWI (Center for Mathematics and Computer Science) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands Abstract Stochastic description
More informationExercise Solutions to Functional Analysis
Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationInformation Theory and Statistics Lecture 3: Stationary ergodic processes
Information Theory and Statistics Lecture 3: Stationary ergodic processes Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Measurable space Definition (measurable space) Measurable space
More informationBayesian Modeling and Classification of Neural Signals
Bayesian Modeling and Classification of Neural Signals Michael S. Lewicki Computation and Neural Systems Program California Institute of Technology 216-76 Pasadena, CA 91125 lewickiocns.caltech.edu Abstract
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationImproved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *
Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * 1 Department of Chemistry, Pomona College, Claremont, California
More informationQR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS
QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS DIANNE P. O LEARY AND STEPHEN S. BULLOCK Dedicated to Alan George on the occasion of his 60th birthday Abstract. Any matrix A of dimension m n (m n)
More informationDistance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics
Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.
More informationBAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS
BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA
More informationStochastic Simulation
Stochastic Simulation Ulm University Institute of Stochastics Lecture Notes Dr. Tim Brereton Summer Term 2015 Ulm, 2015 2 Contents 1 Discrete-Time Markov Chains 5 1.1 Discrete-Time Markov Chains.....................
More informationBLAST: Target frequencies and information content Dannie Durand
Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences
More informationModel Complexity of Pseudo-independent Models
Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract
More informationComputing and Communicating Functions over Sensor Networks
Computing and Communicating Functions over Sensor Networks Solmaz Torabi Dept. of Electrical and Computer Engineering Drexel University solmaz.t@drexel.edu Advisor: Dr. John M. Walsh 1/35 1 Refrences [1]
More informationADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY
ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY PEDRO ALVES, 1 RANDY J. ARNOLD, 2 MILOS V. NOVOTNY, 2 PREDRAG RADIVOJAC, 1 JAMES P. REILLY, 2 HAIXU TANG 1, 3* 1) School
More information