MS-MS Analysis Programs

Similar documents
PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

TOMAHAQ Method Construction

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra

STATISTICAL METHODS FOR THE ANALYSIS OF MASS SPECTROMETRY- BASED PROTEOMICS DATA. A Dissertation XUAN WANG

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Properties of Average Score Distributions of SEQUEST

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

Computational Methods for Mass Spectrometry Proteomics

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Proteomics. November 13, 2007

CSE182-L8. Mass Spectrometry

MS/MS of Peptides Manual Sequencing of Protonated Peptides

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

Modeling Mass Spectrometry-Based Protein Analysis

The Pitfalls of Peaklist Generation Software Performance on Database Searches

QuasiNovo: Algorithms for De Novo Peptide Sequencing

Spectrum-to-Spectrum Searching Using a. Proteome-wide Spectral Library

Introduction to pepxmltab

TUTORIAL EXERCISES WITH ANSWERS

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Tutorial 1: Setting up your Skyline document

HOWTO, example workflow and data files. (Version )

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data

Learning Score Function Parameters for Improved Spectrum Identification in Tandem Mass Spectrometry Experiments

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry

Introduction to Proteomics: Fragmentation of protonated peptides and manual sequencing

De novo peptide sequencing methods for tandem mass. spectra

A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra

Tandem Mass Spectrometry: Generating function, alignment and assembly

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Improved Validation of Peptide MS/MS Assignments. Using Spectral Intensity Prediction

Protein Sequencing and Identification by Mass Spectrometry

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra

An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry

Improved Classification of Mass Spectrometry Database Search Results Using Newer Machine Learning Approaches*

Tandem MS = MS / MS. ESI-MS give information on the mass of a molecule but none on the structure

SRM assay generation and data analysis in Skyline

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Intensity-based protein identification by machine learning from a library of tandem mass spectra

Tutorial 2: Analysis of DIA data in Skyline

PRIDE Cluster: building the consensus of proteomics data

Last updated: Copyright

Analysis of Peptide MS/MS Spectra from Large-Scale Proteomics Experiments Using Spectrum Libraries

1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s).

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Supplementary Material for: Clustering Millions of Tandem Mass Spectra

HMMatch: Peptide Identification by Spectral Matching of Tandem Mass Spectra Using Hidden Markov Models ABSTRACT

ASCQ_ME: a new engine for peptide mass fingerprint directly from mass spectrum without mass list extraction

Chapter 5. Complexation of Tholins by 18-crown-6:

PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search

Jiří Novák and David Hoksza

MALDI-HDMS E : A Novel Data Independent Acquisition Method for the Enhanced Analysis of 2D-Gel Tryptic Peptide Digests

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

Peptide Sequence Tags for Fast Database Search in Mass-Spectrometry

De Novo Peptide Sequencing

Mass spectrometry in proteomics

Protein identification problem from a Bayesian pointofview

A TMT-labeled Spectral Library for Peptide Sequencing

Proteome Informatics. Brian C. Searle Creative Commons Attribution

Biological Mass Spectrometry

A graph-based filtering method for top-down mass spectral identification

A New Hybrid De Novo Sequencing Method For Protein Identification

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

Welcome to Organic Chemistry II

A review of statistical methods for protein identification using tandem mass spectrometry

Purdue-UAB Botanicals Center for Age- Related Disease

Proteomics: the first decade and beyond. (2003) Patterson and Aebersold Nat Genet 33 Suppl: from

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of

Non-Targeted Identification of Reactive Metabolite Protein Adducts. [Supplementary Information]

Efficiency of Database Search for Identification of Mutated and Modified Proteins via Mass Spectrometry

ALIGNMENT OF LC-MS DATA USING PEPTIDE FEATURES. A Thesis XINCHENG TANG

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

Speeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPUs

Was T. rex Just a Big Chicken? Computational Proteomics

Effective desalting and concentration of in-gel digest samples with Vivapure C18 Micro spin columns prior to MALDI-TOF analysis.

Methods for proteome analysis of obesity (Adipose tissue)

TANDEM mass spectrometry (MS/MS) is an essential and

MASS SPECTROMETRY. Topics

On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering

X!TandemPipeline (Myosine Anabolisée) validating, filtering and grouping MSMS identifications

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

Supplementary Figure 1

WADA Technical Document TD2015IDCR

Quantitative Proteomics

Quality Assessment of Tandem Mass Spectra Based on Cumulative Intensity Normalization

QTOF-based proteomics and metabolomics for the agro-food chain.

Statistical analysis of isobaric-labeled mass spectrometry data

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

Transcription:

MS-MS Analysis Programs Basic Process Genome - Gives AA sequences of proteins Use this to predict spectra Compare data to prediction Determine degree of correctness Make assignment Did we see the protein? Correctness Number of sequences Observed in different experiments Repeats Replicates Quantity and Quality of sequences 1

Data Listing of ion masses Amplitude Precursor ion mass Knowledge of modifications Knowledge of ionization probabilities y ions will be most obvious b and a ions can be missing some amino acids disrupt cleavage Addition of or removal of water or OH ETC Parameters that affect searches Several parameters are important in searching data bases for peptide matches Data quality Signal to noise Spurious peaks Mass tolerance How much can the mass vary Usually 0.3-5 amu, <=1 is most common. Both parent and fragment mass tolerances important Modifications allowed Missed cleavages Data base size 2

Mass Tolerance / Error ppm (parts per million) Formula: Theoretical value X 10 6 = ppm error OR % Formula: Theoretical value X 100 = % error where = Theoretical m/z value - Experimental m/z value Example = 1479.63 m/z (theoretical) - 1480.10 m/z (experimental) = 0.47 ppm error: 0.47 X 10 1479.63 6 = 317 ppm error (theor) OR % error: = 0.0317 % Slide provided by the U of Minn Mass Spectrometry Center Absolute Values of Mass Tolerance As a Function of Mass for Two Different % Tolerance Setting Mass (Da) % Tolerance Error (+/-) 1200 0.03% 0.4 3500 0.03% 1 12000 0.03% 4 25000 0.1% 25 60000 0.1% 60 80000 0.1% 80 Reminder: 0.03% = 300 ppm Slide provided by the U of Minn Mass Spectrometry Center 3

Peptide Identification Programs Categories Precursor approaches: most often used Sequest Mascot X!Tandem Sequence approaches: newer MS-Tag* Protein Pilot Sequest From a database it will choose peptide candidates based on PRECURSOR MASS. This is the mass of the ion before fragmentation Take know sequences and predict precursor mass Chooses all the sequences that fit the mass Create a model spectrum for each theoretical candidate. Determine and sum peaks that overlap between theoretical and experimental data, S p score. 4

Sequest - S p score is based on the number of ions that fit - Value is dependent on the peptide size - Spectra are shifted to get different S p scores Sum of intensities of matching fragments Number of matching fragments Bonus for consecutive fragmentation ions = 0.075 Bonus for presence of immonium ions =0.15 Total Number of predicted sequence ions Sequest The Sp scores are tallied The best 500 Sp score sequences are used The Sp score is dependent on the size of the peptide. For 20 aa a score of 1000 is good For 6 aa a score of 500 is good The best 500 Sp score sequences are used to calculate the Xcorr value 5

Sequest Shift the experimental spectrum back and forth and sum overlapping peaks Generates the Auto-correlation information (background noise calculation) Calculate cross correlation score (XCorr) C xy is the correlation score is the displacement between the spectra x(t) and y(t) are the original and predicted spectra Sequest This is a common procedure Is used as measure of correctness The larger the value the better Value is dependent on the peptide size Bigger the peptide the larger the value Value is dependent on the charge 6

SEQUEST Scoring: XCorr Cross Correlation (direct comparison) Correlation Score Auto Correlation (background) Offset (AMU) XCorr = http://www.proteomesoftware.com/proteome_software_proteomics.html Adapted from: Interpreting_MSMS_results_cartoon.ppt (Brian Searle) Gentzel M. et al Proteomics 3 (2003) 1597-1610 Sequest 7

Sequest Other measure of fit Ion score. Number or percent of ions that match Values of 70 to 80% good X!Tandem Uses precursor ion to pick candidates Compares spectrum to predicted spectrum Uses only peaks that show up in both Calculate y/b value for preliminary assignment I is ion intensity P is whether theoretical peak matches 1 or 0 8

Fragment ion mass tolerance affects identification P is a vector of n intensities that represents the masses calculate for all potential sequence specific ions (a, b, y and -17 (OH) and -18 (H 2 O) neutral loss products). This is the theoretical spectrum n is the parent ion mass divided by the mass accuracy. P i has values of 1 for present and 0 for absent Vector I i represents the actual spectrum that correspond to the intensity, normalized to 100 or 0 for missing fragment X!Tandem Calculates hyperscore N y! is the factorial of the y ions identified N b! Is the factorial of the b ions identified Assume that highest hyperscore is the correct assignment Makes histogram of hyperscore from candidate peptides Used for fitness calculation 9

X!Tandem Takes log of histogram values greater than peak value and plots log of number versus hyperscore. A straight line indicates that these assignments are random Takes linear regression of log plot of all random data and extends out to value of chosen assignment. This is the E value. http://www.proteomesoftware.com/pdf_files/xtandem_edited.pdf http://h.thegpm.org/tandem/thegpm_tandem.html Hyperscores Significant # results 70 60 50 40 30 20 10 Log # results 3.5 3.0 2.5 2.0 1.5 1.0 0.5 20 40 60 80 100 hyperscore 20 40 60 80 100 hyperscore 10

E-Value Log # results 4 2 0-2 -4-6 -8 E value is determined by taking the slope due to random assignments and extrapolating to the hyperscore of the best assignment. -10 20 40 60 80 100 hyperscore E-value= e -9 Which Program is better How closely does the spectrum match the model What is the distance from the top match from the other possibilities X!Tandem Hyperscore E-value <0.01 Better measure Sequest XCorr >2.5 Better measure Cn >0.1 Neither program has all the best features 11

Viewers and additional analysis Scaffold Protein Prophet Mass Sieve Progroup etc Protein pilot Scaffold Scaffold uses both X!Tandem and Sequest Takes identified spectra and adds additional analysis to assign probability of correct protein identification 12

Protein Prophet Discriminant Function Discriminant score for Sequest First segment corrects for peptide length Accounts for distance from next closest assignment. Farther the better. includes top ranked sequence Penalizes for poor mass accuracy Protein Prophet Makes histogram of discriminant scores Curve fits histogram for correct and incorrect assignments Uses Bayesian statistics to compute probability of correct match This gives better sensitivity and lower errors to assignments 13

http://www.proteomesoftware.com/proteo me_software_prod_scaffold_tour.html http://appliedbiosystems.cnpg.com/lsca/we binar/proteinpilot/20060516/ 14