A Statistical Model of Proteolytic Digestion

Size: px
Start display at page:

Download "A Statistical Model of Proteolytic Digestion"

Transcription

1 A Statistical Model of Proteolytic Digestion I-Jeng Wang, Christopher P. Diehl Research and Technology Development Center Johns Hopkins University Applied Physics Laboratory Laurel, MD Fernando J. Pineda Dept. of Molecular Microbiology and Immunology Johns Hopkins Bloomberg School of Public Health Baltimore, MD Abstract We present a stochastic model of proteolytic digestion of a proteome, assuming the distribution of parent protein lengths in the proteome, the relative abundances of the 20 amino acids in the proteome, and the digestion rules of the enzyme used in the digestion. We derived a closed form expression for the fragment mass distribution for a large class of enzymes including the widely used Trypsin. The expression uses the distribution of lengths in a mixture of proteins taken from a proteome, as well as the relative abundances of the 20 amino acids in the proteome. The agreement between theory and the in silico digest is excellent. I. INTRODUCTION We present a rigorous statistical model of proteolytic digestion of a proteome, assuming (1) the distribution of parent protein lengths in the proteome, (2) the relative abundances of the 20 amino acids in the proteome, and (3) the digestion rules of the enzyme used in the digestion. Such a statistical model can serve as a model of the null hypothesis to calculate p-values in hypothesis testing for microorganism identification by proteolytic digestion. Current protein identification software (e.g., MOWSE [1, MassSearch [2, and Mascot [3) is based on histograms of digested proteomes, or on models derived from unrealistic assumptions (e.g., a uniform distribution of proteolytic fragments). The proposed model accounts for the fact that digestion products come from the mixture of proteins that constitutes a microorganism s proteome. To facilitate our discussion, we organize digestion models into a taxonomy according to (1) the class of digestion rules obeyed by the enzyme and (2) the order of the Markov model used to model the amino acid sequence. We denote by C(m) the class of enzymes whose digestion rules depend on m adjacent residues. Table I shows a selection of enzymes and their corresponding cleavage rules. In this paper, we present the result assuming that the cleavage rules are deterministic. We also introduce a technique to extend the result to address simple probabilistic digestion rules that account for possible (probabilistic) partial digestions. We classify amino acid sequence models according to the order of the Markov chain used to model the sequence. For example, M(0) denotes the simplest sequence models assuming that the adjacent amino acid residues are independent; M(1) denotes the next most complex models assuming that the next residue depends on the previous amino acid residue; and so forth. Our methodology views the enzymatic digestion problem as a regenerative process to which we can apply Wald s Digesting Reagent Class C(1) Cleavage Rules Armillaria Before {C K } AspN Before D Thermolysin Before {F I L M V W} Clostripain After R LysC After K Pancreatic Elastase After {A G S V} Digesting Reagent Class C(2) Cleavage Rules Chymotrypsin After {F L M W Y} if not followed by P Post Proline After P if not followed by P Trypsin After {K R} if not followed by P V8 Ammonium Acetate After E if not followed by P V8 Phosphate Buffer After {D E} if not followed by P CNBr Cys { Before C After M } Hydroxylamine After N if followed by G Mild Acid Hydrolysis After D if followed by P TABLE I CLEAVAGE RULES FOR SELECTED REAGENTS equation and its generalizations to simplify the computation of the fragment mass distribution. In particular, when overlaid with a cleavage process (either C(1) or C(2)), a protein sequence with models M(0) can be equivalently modeled as a regenerative process with the cleavage sites as the regeneration points. Hence the regenerative cycles (the fragments) are i.i.d.. Using Wald s first lemma and its extensions [4, [5, the fragment mass distribution can be conveniently partitioned into two terms: the length distribution and the conditional mass probability density. In this paper, we present results for {C(1), M(0)} and a subset of {C(2), M(0)} with the class of enzymes C P (2) that exhibit Proline blocking. The class C P (2) includes the widely used enzyme, Trypsin, which cleaves after Lysine or Arginine unless followed by Proline. The same technique can be applied to derive expressions for general models in {C(2), M(0)}. The agreement between theory and the in silico digest is excellent as illustrated by the examples given in Section 4. Note that a summary of part of the results given here was presented in [6 as a poster. II. MODELS AND DECOMPOSITION Our objective is to analytically compute the expected number of proteolytic fragments with mass in an arbitrary mass interval (m m, m + m), denoted by H(m, m). We assume that H(m, m) depends only on the parent protein length distribution in the database of interest, a probabilistic

2 protein sequence model derived from the database, and the chosen enzyme digestion rule. Since the parent proteins are finite in length, application of digestions rules to a finite parent protein can lead to two distinct classes of fragments: fragments ending at a cleavage site and fragments not ending at a cleavage site. Here the cleavage site is referred to a specific amino acid that defines a cleavage following a enzyme digestion rule. For example, amino acids C and K are both cleavage sites for fragments from digesting enzyme Armillaria (see Table I). In this paper, we only focus on the derivation of H(m, m) contributed by fragments ending at a cleavage site. The contribution by the other class of fragments is less significant and can be more easily derived since there can be at most one of such fragment from every single parent protein. A. Sequence and fragment models Throughout the paper, we assume that the protein sequence is i.i.d.. That is, we only consider class M(0) models. We use S = X 1 X 2 X 3 to denote an infinite random protein sequence, where the X i s are random variables taking value from a finite alphabet A (the set of 20 amino acids). We use p a, to denote P {X n = a}, and p C, C A, to denote P {X n C}. We assume that p a s are estimated from the relative amino acid abundance in the database of interest. We consider C(1) and C P (2) enzyme classes. For C(1) enzymes, cleavage occurs at sites adjacent to amino acids in the set C. For C P (2) enzymes, cleavage occurs after any amino acid in C if the following amino acid is not Proline. To facilitate our discussion, we will use {F n } to denote a generic random fragment sequence resulting from the application of a cleavage rule to a random protein sequence S. Here, we provide an informal discussion on why {F n } is i.i.d. except for the first and possibly the last fragments (a rigorous proof can be easily constructed following the discussion ). We first address digestion rules in C(1). Given that amino acids in M(0) protein models are i.i.d., fragment mass distributions do not depend on which end of the protein we choose as the N-terminus. Therefore, without loss of generality, we can assume that for class C(1) enzymes, cleavage sites always occur after (instead of before as indicated in Table 1 for some C(1) enzymes) an amino acid in C. For models in {C(1), M(0)}, it is clear that the cleavage event {X n C} is a regenerative point for the protein sequence S. Hence the process between cleavage (that is, the fragments {F n }) is i.i.d.. For models in {C(2), M(0)}, a similar conclusion can be drawn by constructing an equivalent two-step model for the protein sequence. Figure 1 illustrates such a model for digestion with Trypsin for which a cleavage occurs after K or R unless followed by P. In step one, we first determine whether the amino acid is P or not (occurs with probability p P ). In step two, we determine the specific amino acid conditioned on the outcome of the step one. It is obvious that the protein sequence generated by this two-step model is statistically equivalent to a sequence S generated by a model in M(0). With this equivalent protein model, the digestion rule (with Proline blocking) can be applied right after the step one to determine cleavage Fig. 1. A two-step process to generating an i.i.d protein sequence and fragmentation by Trypsin. sites. It can be seen that the cleavage sites are regenerative points and an i.i.d. fragment process {F n } can be defined from the regenerative cycles. Note that this leads to a very minor approximation since the first fragment F 1 has a different distribution at the first amino acid (the last fragment could have a different distribution as well if it ends exactly at the last amino acid in the parent protein). Given that {F n } is i.i.d. (or approximately i.i.d. as for the C P (2) models), we will use F to denote a generic random fragment in the sequel. B. Decomposition of mass distribution Based on the i.i.d. property of fragments established in the last section, we can establish a decomposition that enable us to treat the fragment length distribution and mass probability density separately in the derivation of H(m, m). This is accomplished by applying Wald s first lemma and its extensions [4, [5. Here we first state the decomposition: where H(m, m) N = g(k)p { M(F ) m < m L(F )=k}, k=1 g(k) = N N=k n(n)l N,k, and n(n) is the number of proteins of length N in the database of interest; L N,k is the expected number of fragments of length k from a parent protein sequence of length N; M(F ) denotes the mass of F ; and L(F ) denotes the length of F. This decomposition described by (1) significantly simplifies the computation of H(m, m) in that it separates the calculation of fragment length distribution (g(k)), from the mass distribution of a random fragment (P { M(F ) m < m L(F )=k}). To prove the decomposition (1), we first observe that H(m, m) can be derived by breaking it into terms based on fragment and parent protein lengths. Define H N,k (m, m) as the expected number of fragments in the mass range that come from parent proteins with length N and has length k. Then (1)

3 we can write H(m, m) as H(m, m) = N N=1 n(n) N H N,k (m, m). (2) k=1 Hence we can focus our discussion on the decomposition of H N,k (m, m). We first provide an expression for L N,k (expected number of fragments with length k from protein with length N) by applying the Wald s equation. Define a stopping time associated with the fragment process {F n } by τ n min{k : k L(F i ) n}. Using the indicator function we can write L N,k as [ τn k+1 L N,k = E I{L(F i )=k}, (3) where I{} denote the indicator function. Note that the proper choice of stopping time (τ N k+1 ) is important to avoid overor under-counting of fragments. Since {F n } is i.i.d., we can apply the Wald s equation to (3) and obtain L N,k = E(τ N k+1 )P {L(F )=k}. (4) Following the same argument, we can also obtain H N,k (m, m) = E(τ N k+1 )P { M(F ) m < m, L(F )=k} = E(τ N k+1 )P {L(F )=k} P { M(F ) m < m L(F )=k}. By substituting (4) into the above equation we obtain the decomposition for H N,k (m, m): H N,k (m, m) =L N,k P { M(F ) m < m L(F )=k}. (5) The desired decomposition (1) can be easily derived by substituting (5) into (2). III. FRAGMENT MASS DISTRIBUTIONS Based on the decomposition (1) we can obtain the closed form expression for the fragment mass distribution H(m), for models in both {C(1), M(0)} and {C P (2), M(0)}, by deriving expressions for L N,k and P { M(F ) m < m L(F )=k}. A. Digestion model {C(1), M(0)} Given a random protein sequence of length N, S N = X 1 X 2 X N, L N,k for models in {C(1), M(0)} can be derived as [ N k+1 L N,k = E I {X i...x i+k 1 is a fragment } = P {X 1,...,X k 1 C, X k C}+ N k+1 i=2 P {X i 1 C,X i,...,x i+k 2 C,X i+k 1 C}. By deriving the needed probabilities based on the M(0) sequence model, we can obtain L N,k =[(N k)p C +1(1 p C ) k 1 p C. (6) To derive the conditional mass probability P { M(F ) m < m L(F ) = k}, we assume that the mass of a fragment depends only on its amino acid composition. We represent the composition of a fragment F by a 20-dimensional vector of nonnegative integers, denoted by C(F )=[n 1,...,n 20, where n i N {0} is the number of occurrences of the i-th amino acid (assuming that a fixed ordering is defined on A) inf. We will use n =[n 1,...,n 20 to denote a generic composition vector. We assume that the mass of a composition (and hence the mass of the fragment with the particular composition) is determined by 20 M(n) =m H2O + m i n i, where m i is the mass of the i-th amino acid. Note that we can also generalize this model to incorporate uncertainty in amino acid masses modeled by appropriate distributions. Following the above assumptions, we can write the desired conditional mass distribution as I{ M(n) m < m}p {C(F )=n L(F)=k}, n =k i C n (7) where n i n i. Note that the indicator function in (7) is deterministic and should be replaced by a probability if a probabilistic model is used to define the mass of a composition. Based on (7) the closed-form expression for the conditional mass probability can be derived as n =k i C n I{ M(n) m < m}(k 1)! where ˆp i is defined by ˆp i = p i z i = z i { p C if i C, 1 p C otherwise. ( 20 ) ˆp ni i, (8) n i! And again the indicator function can be replaced by a probability to incorporate uncertainty in amino acid masses. B. Digestion model {C P (2), M(0)} To derive L N,k, we again use indicator functions to obtain [ N k+1 L N,k = E I {X i...x i+k 1 is a fragment } = N k+1 P {X i...x i+k 1 is a fragment}. However, the derivation of the probabilities are not as straightforward due to the blockings of cleavage by proline. Let us

4 denote the set of amino acid pairs that represent unblocked cleavage sites by C NB. That is C NB {XX : X C,X P}. Then we can derive the needed probabilities using Markovian arguments. Since the protein sequence is i.i.d., we have P {X 1...X k is a fragment} = P {X 1 X 2 C NB } k 1 j=2 P {X k X k+1 C NB X k 1 X k C NB } ; for i =2,...,N k, wehave P {X i...x i+k 1 is a fragment} = P {X i 1 X i C NB } P {X i X i+1 C NB X i 1 X i C NB } and i+k 2 j=i+1 P {X i+k 1 X i+k C NB X i+k 2 X i+k 1 C NB } ; P {X N k+1...x N is a fragment} = P {X N k X N k+1 C NB } P {X N k+1 X N k C NB X N k X N k+1 C NB } N 1 j=n k+2 P {X N C X N 1 X N C NB }. Deriving the needed conditional probabilities and substituting them into the above equations we can obtain [ ( L N,k = α 1+ N k + p ) P p C (1 p 1 p C) k 1 p C, P (9) where p P p C = p C [1 1 p C (1 p P ) [ α = (1 p C {1 ) 1 p C, 1 p C 1 p C (1 p P ) }. Note that p C <p C as long as p P > 0. Furthermore, when p P is small, (9) can be approximated quite accurately by L N,k [1 + (N k) p C(1 p C) k 1 p C, which is identical to the length distribution for C(1) enzymes given in (6) but with scaled cleavage probability p C. To compute the conditional mass probability, we first decompose the probability based on the composition of the fragment composition as in (7): I{ M(n) m < m}p {C(F )=n L(F)=k}. n =k 1 i C ni<np (10) (theory- in_silico)/sqrt(theory) fragments per bin (1Da bins) mass (Da) 800 in silico theory Fig. 2. Comparison between analytical expression and in silico digestion for E. Coli proteome over a 100 Da window. Note that the number of proline occurrences, n P, needs to be larger than or equal to ( i C n i) 1 in order to block the cleavages in the interior of a fragment. The conditional composition probability P {C(F ) = n L(F ) = k} can be derived by treating each case with a specific number of blocked cleavages separately. C. Accounting for partial digestions The presented results for deterministic digestion can be easily extended to account for partial digestions modelled with a simple probabilistic digestion model. Consider the case for C(1) models under which cleavages occurs after every amino acid in C. Assume now that each cleavage after an amino acid a C occurs only with probability q a (q a < 1 implies partial digestions). The presented results can be applied by expanding the set A to include one additional fictitious amino acid a + for each a C.Fora C, the probability is re-defined as q a p a. For the additional amino acid a +, the probability is (1 q a )p a. The mass of the additional amino acid a + is the same as the mass of its associated original amino acid a. With this expanded alphabet and modified protein sequence model, the presented result can be applied directly as if the digestions are deterministic (with the same C). Note that the same technique applies to models in C P (2) as well. IV. NUMERICAL RESULTS A comparison of analytic and in-silico results is shown in Figure 2. We use an E. coli proteome taken from the SWISSPROT database. The proteins were shuffled to destroy sequential correlations. The analytic calculation is faster than 1000

5 the corresponding in silico digestion. Except for unexplained excess dispersion at 450 Da, the residuals between the in-silco and analytic results are consistent with Poisson statistics. It is clear from this figure that the mass distribution can vary by several orders of magnitude over just few Daltons. This result leads us to believe that heuristically derived mass distributions are likely to result in biased p-values in hypothesis tests. This is especially true for small peptides. It becomes a minor effect in heavier peptides. It is typically recommended that protein identification be performed using peptides heavier than 500 Da due to lack of selectivity of the lighter peptides. Our results suggest that, a more careful treatment of the mass distribution below 500Da, could make this mass range more informative. ACKNOWLEDGMENT This work was supported by the JHU/APL Independent Research and Development Program. REFERENCES [1 D. J. C. Pappin, P. Hojrup, and A. J. Bleasby, Rapid Identification of Proteins by Peptide-mass Fingerprinting, Current Biology, vol. 3, no. 6, 1993, pp [2 P. James, M. Quadroni, E. Carafoli, and G. Gonnet, Protein Identification by Mass Profile Fingerprinting, Biochemical and Biophysical Research Communications, 195(1), August 1993, pp [3 D. N. Perkins, D. J. C. Pappin, D. M. Creasy, and J. S. Cottrell, Probability-based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data, Electrophoresis, 20, 1999, pp [4 A. Wald, Sequential Tests of Statistical Hypotheses, Annals of Mathematical Statistics, 16, 1945, pp [5 G.V. Moustakides, Extensions of Wald s first lemma to Markov processes, Journal of Applied Probability, 36(1), 1999, pp [6 I-J. Wang, C. P. Diehl, and F. J. Pineda, A statistical model of proteolytic digestion, conference poster, IEEE Computer Society Bioinformatics Conference, Stanford, CA, August 2003.

Identification of proteins by enzyme digestion, mass

Identification of proteins by enzyme digestion, mass Method for Screening Peptide Fragment Ion Mass Spectra Prior to Database Searching Roger E. Moore, Mary K. Young, and Terry D. Lee Beckman Research Institute of the City of Hope, Duarte, California, USA

More information

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA LECTURE-25 Quantitative proteomics: itraq and TMT TRANSCRIPT Welcome to the proteomics course. Today we will talk about quantitative proteomics and discuss about itraq and TMT techniques. The quantitative

More information

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of LECTURE-13 Peptide Mass Fingerprinting HANDOUT PREAMBLE Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of proteins, drugs and many biological moieties to elucidate

More information

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were developed to allow the analysis of large intact (bigger than

More information

Package cleaver. August 2, 2013

Package cleaver. August 2, 2013 Package cleaver August 2, 2013 Version 0.99.5 Date 2013-07-24 Title Cleavage of polypeptide sequences Maintainer Sebastian Gibb Depends R (>= 3.0.0), methods, Biostrings (>= 1.29.8)

More information

Tutorial 1: Setting up your Skyline document

Tutorial 1: Setting up your Skyline document Tutorial 1: Setting up your Skyline document Caution! For using Skyline the number formats of your computer have to be set to English (United States). Open the Control Panel Clock, Language, and Region

More information

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10

MS&E 321 Spring Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 MS&E 321 Spring 12-13 Stochastic Systems June 1, 2013 Prof. Peter W. Glynn Page 1 of 10 Section 3: Regenerative Processes Contents 3.1 Regeneration: The Basic Idea............................... 1 3.2

More information

Modeling Mass Spectrometry-Based Protein Analysis

Modeling Mass Spectrometry-Based Protein Analysis Chapter 8 Jan Eriksson and David Fenyö Abstract The success of mass spectrometry based proteomics depends on efficient methods for data analysis. These methods require a detailed understanding of the information

More information

Package cleaver. December 13, 2018

Package cleaver. December 13, 2018 Version 1.20.0 Date 2017-06-08 Title Cleavage of Polypeptide Sequences Package cleaver December 13, 2018 Maintainer Sebastian Gibb Depends R (>= 3.0.0), methods, Biostrings (>=

More information

4 Examples of enzymes

4 Examples of enzymes Catalysis 1 4 Examples of enzymes Adding water to a substrate: Serine proteases. Carbonic anhydrase. Restrictions Endonuclease. Transfer of a Phosphoryl group from ATP to a nucleotide. Nucleoside monophosphate

More information

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4 High-Resolution Mass spectrometry (HR-MS, HRAM-MS) (FT mass spectrometry) MS that enables identifying elemental compositions (empirical formulas) from accurate m/z data 9.05.2017 1 Atomic masses (atomic

More information

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems Protein Identification Using Tandem Mass Spectrometry Nathan Edwards Informatics Research Applied Biosystems Outline Proteomics context Tandem mass spectrometry Peptide fragmentation Peptide identification

More information

Proteomics. November 13, 2007

Proteomics. November 13, 2007 Proteomics November 13, 2007 Acknowledgement Slides presented here have been borrowed from presentations by : Dr. Mark A. Knepper (LKEM, NHLBI, NIH) Dr. Nathan Edwards (Center for Bioinformatics and Computational

More information

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1. Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists

More information

Computational Methods for Mass Spectrometry Proteomics

Computational Methods for Mass Spectrometry Proteomics Computational Methods for Mass Spectrometry Proteomics Eidhammer, Ingvar ISBN-13: 9780470512975 Table of Contents Preface. Acknowledgements. 1 Protein, Proteome, and Proteomics. 1.1 Primary goals for studying

More information

BENG 183 Trey Ideker. Protein Sequencing

BENG 183 Trey Ideker. Protein Sequencing BENG 183 Trey Ideker Protein Sequencing The following slides borrowed from Hong Li s Biochemistry Course: www.sb.fsu.edu/~hongli/4053notes Introduction to Proteins Proteins are of vital importance to biological

More information

SERVA ICPL Kit (Cat.-No )

SERVA ICPL Kit (Cat.-No ) INSTRUCTION MANUAL SERVA ICPL Kit (Cat.-No. 39230.01) SERVA Electrophoresis GmbH Carl-Benz-Str. 7 D-69115 Heidelberg Phone +49-6221-138400, Fax +49-6221-1384010 e-mail: info@serva.de http://www.serva.de

More information

Prof. Jason Kahn Your Signature: Exams written in pencil or erasable ink will not be re-graded under any circumstances.

Prof. Jason Kahn Your Signature: Exams written in pencil or erasable ink will not be re-graded under any circumstances. 1 Biochemistry 461 February 16, 1995 Exam #1 Prof. Jason Kahn Your Printed Name: Your SS#: Your Signature: You have 75 minutes for this exam. Exams written in pencil or erasable ink will not be re-graded

More information

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion s s Key questions of proteomics What proteins are there? Bioinformatics 2 Lecture 2 roteomics How much is there of each of the proteins? - Absolute quantitation - Stoichiometry What (modification/splice)

More information

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent concentrations of PcTS (100 µm, blue; 500 µm, green; 1.5 mm,

More information

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry Ting Chen Department of Genetics arvard Medical School Boston, MA 02115, USA Ming-Yang Kao Department of Computer

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics

A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics A Kernel-Based Case Retrieval Algorithm with Application to Bioinformatics Yan Fu,2, Qiang Yang 3, Charles X. Ling 4, Haipeng Wang, Dequan Li, Ruixiang Sun 2, Hu Zhou 5, Rong Zeng 5, Yiqiang Chen, Simin

More information

Probabilistic Arithmetic Automata

Probabilistic Arithmetic Automata Probabilistic Arithmetic Automata Applications of a Stochastic Computational Framework in Biological Sequence Analysis Inke Herms PhD thesis defense Overview 1 Probabilistic Arithmetic Automata 2 Application

More information

High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning

High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning High-Accuracy Peptide Mass Fingerprinting Using Peak Intensity Data with Machine Learning Dongmei Yang, Kevin Ramkissoon, Eric Hamlett, and Morgan C. Giddings*,, Department of Microbiology and Immunology,

More information

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry 17 th European Symposium on Computer Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 De Novo Peptide Identification Via Mixed-Integer Linear

More information

Chapter 2 What are the Common Mass Spectrometry-Based Analyses Used in Biology?

Chapter 2 What are the Common Mass Spectrometry-Based Analyses Used in Biology? Chapter 2 What are the Common Mass Spectrometry-Based Analyses Used in Biology? Abstract Mass spectrometry is used in many field of research, such as biology, chemistry, geology, etc. The focus of this

More information

BioSystems 109 (2012) Contents lists available at SciVerse ScienceDirect. BioSystems

BioSystems 109 (2012) Contents lists available at SciVerse ScienceDirect. BioSystems BioSystems 9 (22) 79 85 Contents lists available at SciVerse ScienceDirect BioSystems journa l h o me pa g e: www.elsevier.com/locate/biosystems Periodic patterns in distributions of peptide masses Shane

More information

An Analysis of Top Trading Cycles in Two-Sided Matching Markets

An Analysis of Top Trading Cycles in Two-Sided Matching Markets An Analysis of Top Trading Cycles in Two-Sided Matching Markets Yeon-Koo Che Olivier Tercieux July 30, 2015 Preliminary and incomplete. Abstract We study top trading cycles in a two-sided matching environment

More information

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

PC235: 2008 Lecture 5: Quantitation. Arnold Falick PC235: 2008 Lecture 5: Quantitation Arnold Falick falickam@berkeley.edu Summary What you will learn from this lecture: There are many methods to perform quantitation using mass spectrometry (any method

More information

Methods for proteome analysis of obesity (Adipose tissue)

Methods for proteome analysis of obesity (Adipose tissue) Methods for proteome analysis of obesity (Adipose tissue) I. Sample preparation and liquid chromatography-tandem mass spectrometric analysis Instruments, softwares, and materials AB SCIEX Triple TOF 5600

More information

Yifei Bao. Beatrix. Manor Askenazi

Yifei Bao. Beatrix. Manor Askenazi Detection and Correction of Interference in MS1 Quantitation of Peptides Using their Isotope Distributions Yifei Bao Department of Computer Science Stevens Institute of Technology Beatrix Ueberheide Department

More information

MULTI-RESTRAINED STIRLING NUMBERS

MULTI-RESTRAINED STIRLING NUMBERS MULTI-RESTRAINED STIRLING NUMBERS JI YOUNG CHOI DEPARTMENT OF MATHEMATICS SHIPPENSBURG UNIVERSITY SHIPPENSBURG, PA 17257, U.S.A. Abstract. Given positive integers n, k, and m, the (n, k)-th m- restrained

More information

Change-point models and performance measures for sequential change detection

Change-point models and performance measures for sequential change detection Change-point models and performance measures for sequential change detection Department of Electrical and Computer Engineering, University of Patras, 26500 Rion, Greece moustaki@upatras.gr George V. Moustakides

More information

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra Xiaowen Liu Department of BioHealth Informatics, Department of Computer and Information Sciences, Indiana University-Purdue

More information

Rapid and Sensitive Fluorescent Peptide Quantification Using LavaPep

Rapid and Sensitive Fluorescent Peptide Quantification Using LavaPep Rapid and Sensitive Fluorescent Peptide Quantification Using LavaPep Purpose To develop a fast, sensitive and robust fluorescent-based assay, to quantify peptides that is compatible with downstream proteomics

More information

Chapter 4. strategies for protein quantitation Ⅱ

Chapter 4. strategies for protein quantitation Ⅱ Proteomics Chapter 4. strategies for protein quantitation Ⅱ 1 Multiplexed proteomics Multiplexed proteomics is the use of fluorescent stains or probes with different excitation and emission spectra to

More information

MS-MS Analysis Programs

MS-MS Analysis Programs MS-MS Analysis Programs Basic Process Genome - Gives AA sequences of proteins Use this to predict spectra Compare data to prediction Determine degree of correctness Make assignment Did we see the protein?

More information

Identifying the proteome: software tools David Fenyö

Identifying the proteome: software tools David Fenyö 391 Identifying the proteome: software tools David Fenyö The interest in proteomics has recently increased dramatically and proteomic methods are now applied to many problems in cell biology. The method

More information

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it? Proteomics What is it? Reveal protein interactions Protein profiling in a sample Yeast two hybrid screening High throughput 2D PAGE Automatic analysis of 2D Page Yeast two hybrid Use two mating strains

More information

Lecture 8 (9/29/17) Lecture 8 (9/29/17)

Lecture 8 (9/29/17) Lecture 8 (9/29/17) Reading: Ch3; 97-102 Lecture 8 (9/29/17) Problems: Ch3 (text); 18, 19, 20, 23 Ch3 (Study guide); 9 Ch4 (Study guide); 3 NEXT Reading: Ch4; 119-122, 125-126, 131-133 Ch4; 123-124, 130-131, 133, 137-138

More information

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics GENOME Bioinformatics 2 Proteomics protein-gene PROTEOME protein-protein METABOLISM Slide from http://www.nd.edu/~networks/ Citrate Cycle Bio-chemical reactions What is it? Proteomics Reveal protein Protein

More information

I will post Homework 1 soon, probably over the weekend, due Friday, September 30.

I will post Homework 1 soon, probably over the weekend, due Friday, September 30. Random Variables Friday, September 09, 2011 2:02 PM I will post Homework 1 soon, probably over the weekend, due Friday, September 30. No class or office hours next week. Next class is on Tuesday, September

More information

Alternative Characterization of Ergodicity for Doubly Stochastic Chains

Alternative Characterization of Ergodicity for Doubly Stochastic Chains Alternative Characterization of Ergodicity for Doubly Stochastic Chains Behrouz Touri and Angelia Nedić Abstract In this paper we discuss the ergodicity of stochastic and doubly stochastic chains. We define

More information

14 Random Variables and Simulation

14 Random Variables and Simulation 14 Random Variables and Simulation In this lecture note we consider the relationship between random variables and simulation models. Random variables play two important roles in simulation models. We assume

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

A Markov chain approach to quality control

A Markov chain approach to quality control A Markov chain approach to quality control F. Bartolucci Istituto di Scienze Economiche Università di Urbino www.stat.unipg.it/ bart Preliminaries We use the finite Markov chain imbedding approach of Alexandrou,

More information

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data by Han Liu A thesis submitted in conformity with the requirements for the degree of Master of Science

More information

Enzymes I. Dr. Mamoun Ahram Summer semester,

Enzymes I. Dr. Mamoun Ahram Summer semester, Enzymes I Dr. Mamoun Ahram Summer semester, 2017-2018 Resources Mark's Basic Medical Biochemistry Other resources NCBI Bookshelf: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=books The Medical Biochemistry

More information

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

Chem 250 Unit 1 Proteomics by Mass Spectrometry

Chem 250 Unit 1 Proteomics by Mass Spectrometry Chem 250 Unit 1 Proteomics by Mass Spectrometry Article #1 Quantitative MS for proteomics: teaching a new dog old tricks. MacCoss MJ, Matthews DE., Anal Chem. 2005 Aug 1;77(15):294A-302A. 1. Synopsis 1.1.

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry by Xi Han A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree

More information

Mixture Mode for Peptide Mass Fingerprinting ASMS 2003

Mixture Mode for Peptide Mass Fingerprinting ASMS 2003 Mixture Mode for Peptide Mass Fingerprinting ASMS 2003 1 Mixture Mode: New in Mascot 1.9 All peptide mass fingerprint searches now test for the possibility that the sample is a mixture of proteins. Mascot

More information

General Theory of Large Deviations

General Theory of Large Deviations Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations

More information

Basic Definitions: Indexed Collections and Random Functions

Basic Definitions: Indexed Collections and Random Functions Chapter 1 Basic Definitions: Indexed Collections and Random Functions Section 1.1 introduces stochastic processes as indexed collections of random variables. Section 1.2 builds the necessary machinery

More information

Adventures in Stochastic Processes

Adventures in Stochastic Processes Sidney Resnick Adventures in Stochastic Processes with Illustrations Birkhäuser Boston Basel Berlin Table of Contents Preface ix CHAPTER 1. PRELIMINARIES: DISCRETE INDEX SETS AND/OR DISCRETE STATE SPACES

More information

A Criterion for the Stochasticity of Matrices with Specified Order Relations

A Criterion for the Stochasticity of Matrices with Specified Order Relations Rend. Istit. Mat. Univ. Trieste Vol. XL, 55 64 (2009) A Criterion for the Stochasticity of Matrices with Specified Order Relations Luca Bortolussi and Andrea Sgarro Abstract. We tackle the following problem:

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring High-Throughput Protein Quantitation Using Multiple Reaction Monitoring Application Note Authors Ning Tang, Christine Miller, Joe Roark, Norton Kitagawa and Keith Waddell Agilent Technologies, Inc. Santa

More information

HOWTO, example workflow and data files. (Version )

HOWTO, example workflow and data files. (Version ) HOWTO, example workflow and data files. (Version 20 09 2017) 1 Introduction: SugarQb is a collection of software tools (Nodes) which enable the automated identification of intact glycopeptides from HCD

More information

TUTORIAL EXERCISES WITH ANSWERS

TUTORIAL EXERCISES WITH ANSWERS TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129

More information

Notes on Paths, Trees and Lagrange Inversion

Notes on Paths, Trees and Lagrange Inversion Notes on Paths, Trees and Lagrange Inversion Today we are going to start with a problem that may seem somewhat unmotivated, and solve it in two ways. From there, we will proceed to discuss applications

More information

MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data

MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data Timothy Lee 1, Rahul Singh 1, Ten-Yang Yen 2, and Bruce Macher 2 1 Department

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Jiří Novák and David Hoksza

Jiří Novák and David Hoksza ParametrisedHausdorff HausdorffDistance Distanceas asa Non-Metric a Non-Metric Similarity Similarity Model Model for Tandem for Tandem Mass Mass Spectrometry Spectrometry Jiří Novák and David Hoksza Jiří

More information

C a h p a t p e t r e r 6 E z n y z m y e m s

C a h p a t p e t r e r 6 E z n y z m y e m s Chapter 6 Enzymes 1. An Introduction to Enzymes Enzymes are catalytically active biological macromolecules Enzymes are catalysts of biological systems Almost every biochemical reaction is catalyzed by

More information

Comprehensive support for quantitation

Comprehensive support for quantitation Comprehensive support for quantitation One of the major new features in the current release of Mascot is support for quantitation. This is still work in progress. Our goal is to support all of the popular

More information

arxiv: v1 [stat.ml] 23 Oct 2016

arxiv: v1 [stat.ml] 23 Oct 2016 Formulas for counting the sizes of Markov Equivalence Classes Formulas for Counting the Sizes of Markov Equivalence Classes of Directed Acyclic Graphs arxiv:1610.07921v1 [stat.ml] 23 Oct 2016 Yangbo He

More information

Spectrum-to-Spectrum Searching Using a. Proteome-wide Spectral Library

Spectrum-to-Spectrum Searching Using a. Proteome-wide Spectral Library MCP Papers in Press. Published on April 30, 2011 as Manuscript M111.007666 Spectrum-to-Spectrum Searching Using a Proteome-wide Spectral Library Chia-Yu Yen, Stephane Houel, Natalie G. Ahn, and William

More information

Parameter Discovery for Stochastic Biological Models against Temporal Behavioral Specifications using an SPRT based Metric for Simulated Annealing

Parameter Discovery for Stochastic Biological Models against Temporal Behavioral Specifications using an SPRT based Metric for Simulated Annealing Parameter Discovery for Stochastic Biological Models against Temporal Behavioral Specifications using an SPRT based Metric for Simulated Annealing Faraz Hussain, Raj Gautam Dutta, Sumit Kumar Jha Electrical

More information

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!) Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!) James A. Mobley, Ph.D. Director of Research in Urology Associate Director of Mass Spectrometry (contact: mobleyja@uab.edu)

More information

Under strongly acidic conditions at ph = 1 every functional group in phosphoserine that can pick up a proton, does.

Under strongly acidic conditions at ph = 1 every functional group in phosphoserine that can pick up a proton, does. 1. (48 pts) a. L-Phosphoserine is a modified amino acid that is generated in proteins by phosphorylation of serine residues. The amino acid side chain has two acidic protons, which exhibit different pk

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Peptides And Proteins

Peptides And Proteins Kevin Burgess, May 3, 2017 1 Peptides And Proteins from chapter(s) in the recommended text A. Introduction B. omenclature And Conventions by amide bonds. on the left, right. 2 -terminal C-terminal triglycine

More information

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011

Solutions to Homework Discrete Stochastic Processes MIT, Spring 2011 Exercise 1 Solutions to Homework 6 6.262 Discrete Stochastic Processes MIT, Spring 2011 Let {Y n ; n 1} be a sequence of rv s and assume that lim n E[ Y n ] = 0. Show that {Y n ; n 1} converges to 0 in

More information

Advanced Computer Networks Lecture 2. Markov Processes

Advanced Computer Networks Lecture 2. Markov Processes Advanced Computer Networks Lecture 2. Markov Processes Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/28 Outline 2/28 1 Definition

More information

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM

A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM J. Appl. Prob. 49, 876 882 (2012 Printed in England Applied Probability Trust 2012 A NEW PROOF OF THE WIENER HOPF FACTORIZATION VIA BASU S THEOREM BRIAN FRALIX and COLIN GALLAGHER, Clemson University Abstract

More information

MASS SPECTROMETRY. Topics

MASS SPECTROMETRY. Topics MASS SPECTROMETRY MALDI-TOF AND ESI-MS Topics Principle of Mass Spectrometry MALDI-TOF Determination of Mw of Proteins Structural Information by MS: Primary Sequence of a Protein 1 A. Principles Ionization:

More information

Quantitative Proteomics

Quantitative Proteomics Quantitative Proteomics Quantitation AND Mass Spectrometry Condition A Condition B Identify and quantify differently expressed proteins resulting from a change in the environment (stimulus, disease) Lyse

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University matthias.trost@ncl.ac.uk Previously Proteomics Sample prep 144 Lecture 5 Quantitation techniques Search Algorithms Proteomics

More information

First Problem Set, Due 30 Sep. (1st midterm is 7 Oct.) If you want it graded by 6 Oct, turn in on or before 30 Sep. at 5 pm.

First Problem Set, Due 30 Sep. (1st midterm is 7 Oct.) If you want it graded by 6 Oct, turn in on or before 30 Sep. at 5 pm. First Problem Set, Due 30 Sep. (1st midterm is 7 ct.) If you want it graded by 6 ct, turn in on or before 30 Sep. at 5 pm. Show your work calculations / steps!! Problems from Chapter 1 1 EM of a eukaryotic

More information

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry Jon Hao, Rong Ye, and Mason Tao Poochon Scientific, Frederick, Maryland 21701 Abstract Background:

More information

When do diffusion-limited trajectories become memoryless?

When do diffusion-limited trajectories become memoryless? When do diffusion-limited trajectories become memoryless? Maciej Dobrzyński CWI (Center for Mathematics and Computer Science) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands Abstract Stochastic description

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Information Theory and Statistics Lecture 3: Stationary ergodic processes Information Theory and Statistics Lecture 3: Stationary ergodic processes Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Measurable space Definition (measurable space) Measurable space

More information

Bayesian Modeling and Classification of Neural Signals

Bayesian Modeling and Classification of Neural Signals Bayesian Modeling and Classification of Neural Signals Michael S. Lewicki Computation and Neural Systems Program California Institute of Technology 216-76 Pasadena, CA 91125 lewickiocns.caltech.edu Abstract

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 * 1 Department of Chemistry, Pomona College, Claremont, California

More information

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS DIANNE P. O LEARY AND STEPHEN S. BULLOCK Dedicated to Alan George on the occasion of his 60th birthday Abstract. Any matrix A of dimension m n (m n)

More information

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.

More information

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulation Ulm University Institute of Stochastics Lecture Notes Dr. Tim Brereton Summer Term 2015 Ulm, 2015 2 Contents 1 Discrete-Time Markov Chains 5 1.1 Discrete-Time Markov Chains.....................

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

Model Complexity of Pseudo-independent Models

Model Complexity of Pseudo-independent Models Model Complexity of Pseudo-independent Models Jae-Hyuck Lee and Yang Xiang Department of Computing and Information Science University of Guelph, Guelph, Canada {jaehyuck, yxiang}@cis.uoguelph,ca Abstract

More information

Computing and Communicating Functions over Sensor Networks

Computing and Communicating Functions over Sensor Networks Computing and Communicating Functions over Sensor Networks Solmaz Torabi Dept. of Electrical and Computer Engineering Drexel University solmaz.t@drexel.edu Advisor: Dr. John M. Walsh 1/35 1 Refrences [1]

More information

ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY

ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY PEDRO ALVES, 1 RANDY J. ARNOLD, 2 MILOS V. NOVOTNY, 2 PREDRAG RADIVOJAC, 1 JAMES P. REILLY, 2 HAIXU TANG 1, 3* 1) School

More information