The Pitfalls of Peaklist Generation Software Performance on Database Searches

Similar documents
HOWTO, example workflow and data files. (Version )

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

MassHunter Software Overview

MS-MS Analysis Programs

Last updated: Copyright

Mixture Mode for Peptide Mass Fingerprinting ASMS 2003

Computational Methods for Mass Spectrometry Proteomics

Tutorial 1: Setting up your Skyline document

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

Improved Validation of Peptide MS/MS Assignments. Using Spectral Intensity Prediction

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

All Ions MS/MS: Targeted Screening and Quantitation Using Agilent TOF and Q-TOF LC/MS Systems

1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s).

Cerno Application Note Extending the Limits of Mass Spectrometry

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Effective desalting and concentration of in-gel digest samples with Vivapure C18 Micro spin columns prior to MALDI-TOF analysis.

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra

Comprehensive support for quantitation

PC 235: Mass Spectrometry and Proteomics

Proteomics. November 13, 2007

Agilent MassHunter Profinder: Solving the Challenge of Isotopologue Extraction for Qualitative Flux Analysis

SRM assay generation and data analysis in Skyline

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

MassHunter TOF/QTOF Users Meeting

Modeling Mass Spectrometry-Based Protein Analysis

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

WADA Technical Document TD2015IDCR

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics

A Description of the CPTAC Common Data Analysis Pipeline (CDAP)

Information Dependent Acquisition (IDA) 1

Mass spectrometry-based proteomics has become

TOMAHAQ Method Construction

Supplementary Figure 1

Analysis of Peptide MS/MS Spectra from Large-Scale Proteomics Experiments Using Spectrum Libraries

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

Designed for Accuracy. Innovation with Integrity. High resolution quantitative proteomics LC-MS

MALDI-HDMS E : A Novel Data Independent Acquisition Method for the Enhanced Analysis of 2D-Gel Tryptic Peptide Digests

A Workflow Approach for the Identification and Structural Elucidation of Impurities of Quetiapine Hemifumarate Drug Substance

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

High-Field Orbitrap Creating new possibilities

Identification and Characterization of an Isolated Impurity Fraction: Analysis of an Unknown Degradant Found in Quetiapine Fumarate

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

Rapid and Accurate Forensics Analysis using High Resolution All Ions MS/MS

Tandem MS = MS / MS. ESI-MS give information on the mass of a molecule but none on the structure

NEW TOOLS FOR FINDING AND IDENTIFYING METABOLITES IN A METABOLOMICS WORKFLOW

TUTORIAL EXERCISES WITH ANSWERS

Self-assembling covalent organic frameworks functionalized. magnetic graphene hydrophilic biocomposite as an ultrasensitive

Making Sense of Differences in LCMS Data: Integrated Tools

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of

Tutorial 2: Analysis of DIA data in Skyline

An Effective Workflow for Impurity Analysis Incorporating High Quality HRAM LCMS & MSMS with Intelligent Automated Data Mining

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)

Identification of proteins by enzyme digestion, mass

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

CycloBranch. Tutorials

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Site-specific Identification of Lysine Acetylation Stoichiometries in Mammalian Cells

UCD Conway Institute of Biomolecular & Biomedical Research Graduate Education 2009/2010

X!TandemPipeline (Myosine Anabolisée) validating, filtering and grouping MSMS identifications

Spectronaut Pulsar. User Manual

Agilent All Ions MS/MS

Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment

Correction of Errors in Tandem Mass Spectrum Extraction Enhances Phosphopeptide Identification

CSE182-L8. Mass Spectrometry

Thermo Scientific LTQ Orbitrap Velos Hybrid FT Mass Spectrometer

Properties of Average Score Distributions of SEQUEST

pparse: A method for accurate determination of monoisotopic peaks in high-resolution mass spectra

Key Words Q Exactive, Accela, MetQuest, Mass Frontier, Drug Discovery

Mass spectroscopy ( Mass spec )

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

PosterREPRINT COMPARISON OF PEAK PARKING VERSUS AUTOMATED FRACTION ANALYSIS OF A COMPLEX PROTEIN MIXTURE. Introduction

MS/MS of Peptides Manual Sequencing of Protonated Peptides

Accurate, High-Throughput Protein Identification Using the Q TRAP LC/MS/MS System and Pro ID Software

Analyst Software. Peptide and Protein Quantitation Tutorial

Purdue-UAB Botanicals Center for Age- Related Disease

A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

De Novo Peptide Sequencing

MasSPIKE (Mass SPectrum Interpretation and Kernel Extraction) for Biological Samples p.1

Tutorial 1: Library Generation from DDA data

Spectrum-to-Spectrum Searching Using a. Proteome-wide Spectral Library

LogViewer: A Software Tool to Visualize Quality Control Parameters to Optimize Proteomics Experiments using Orbitrap and LTQ-FT Mass Spectrometers

Targeted Proteomics Environment

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

Peter A. DiMaggio, Jr., Nicolas L. Young, Richard C. Baliban, Benjamin A. Garcia, and Christodoulos A. Floudas. Research

WADA Technical Document TD2003IDCR

Application Note LCMS-116 What are we eating? MetaboScape Software; Enabling the De-replication and Identification of Unknowns in Food Metabolomics

Metabolic Phenotyping Using Atmospheric Pressure Gas Chromatography-MS

What s New in NIST11 (April 3, 2011)

The new Water Screening PCDL

Introduction to the Q Trap LC/MS/MS System

A Strategy for an Unknown Screening Approach on Environmental Samples using HRAM Mass Spectrometry

Transcription:

Proceedings of the 56th ASMS Conference on Mass Spectrometry and Allied Topics, Denver, CO, June 1-5, 2008 The Pitfalls of Peaklist Generation Software Performance on Database Searches Aenoch J. Lynn, Robert J. Chalkley, Peter R. Baker, Katalin F. Medzihradszky, Shenheng Guan, and A.L. Burlingame Department of Pharmaceutical Chemistry University of California, San Francisco San Francisco, California There are Many Steps Where Peaklist Generation Can Go Wrong Determination of precursor ion monoisotopic mass and charge Determination of fragment ion monoisotopic mass and charge Removal of the isotope peaks (deisotoping) Adjusting monoisotopic intensities for the removal of isotope peaks Summation of spectra with the same precursor m/z Duplication of peaklists and assignment of charge states when the precursor charge is unknown Here we show some of the possible errors in generated peaklists and compare several methods to generating peaklists. 1

An Example of an Incorrect Precursor Monoisotopic Peak Selection Precursor scan from a QStar wiff file. relative intensity m/z 695.3415 +2 695.8645 +2 Centroiding program (Mascot.dll in AnalystQS 2.0) assigned 695.8645 as the precursor m/z, but this is the wrong precursor mass. 695.3415 +2 Correct monoisotopic m/z as determined from a mass modification search in Protein Prospector v5.0.1. Using this precursor mass resulted in a Protein Prospector score of 42.5 for the MSMS from this precursor. 695.8645 +2 Peaklist monoisotopic m/z. Incorrect isotope distribution as extracted from the QStar file during peaklist generation. Database search using this precursor mass resulted in a Protein Prospector score of 11.4 for the best matching sequence. An Example of Co-eluting Precursor Ions Precursor scan from a QStar wiff file. Relative intensity 585.2941 +3 m/z 586.6250 +3 Centroiding program (Mascot.dll in AnalystQS 2.0) failed to assign precursor mass of a coeluting ion with a strong signal. 585.2941 +3 Precursor m/z from generated peaklist. Probably the m/z that was targeted by the instrument for MSMS. Protein Prospector score of 1 using this precursor m/z. 586.6250 +3 Precursor m/z of a co-eluting ion with an overwhelming ion signal. In a Protein Prospector mass modification search this resulted in a -4 neutral loss on 585.2941 +3. The peptide sequence had a Protein Prospector score of 44.8. 2

Which Peak is the Monoisotopic Peak? It s not easy determining the monoisotopic peak. A precursor scan taken by an LTQOrbitrap shows a complicated series of peaks. Any software is going to have a hard time determining which is the monoiostopic peak. precursor mass in peaklist When the MSMS peaklist is searched using the mass modification search in Protein Prospector, the best scoring peptide has a precursor mass of 927.93 m/z whereas the peaklist had 927.49. Red peaks are matched sequence or loss ions, black are unmatched. Theoretical isotope distribution for the ion 927.93 +2 aligned with the experimental data. They are dissimilar. Comparison of Software Approaches To illustrate problems and mistakes peaklist generation software makes: Use of three different software for analysis of LTQ MSMS data: (Matrix Science) Attempts peak detection from profile raw data, then de-isotopes data. Pava (UCSF in-house software) Streamlined peak detection, de-isotoping ReAdW (opensource program from Seattle Proteome Center) Raw data with no post-processing Peaklists were searched using Protein Prospector New version 5.0 with batch MSMS searching released. Available for public use. http://prospector2.ucsf.edu/ 3

If Data are Acquired in Centroid Mode then the Generates Poor Peaklists The converts centroid data into profile data, then centroids the profile data. Peaklists were generated by UCSF s Pava software and Mascot Distiller from LTQOrbitrap RAW data files and searched with Protein Prospector v5.0.1. For data saved in centroid mode, a plot of peptide expectation values shows that peptide assignments for the peaklist is significantly worse than the scores for the peaklist generated by UCSF s Pava. For data saved in profile mode, a plot of peptide expectation values shows similar scores for UCSF s Pava and the Mascot Distiler, but a larger variance. The has many parameters that can be changed; for this study the default LCQ parameters were used. Relative Performance for Centroid and Raw Data for UCSF s Pava and the RawDistiller Pava 1.00E-01 Raw Pava Distiller 1.00E-01 Data saved in centroid mode Data saved in profile mode On centroided data, the Mascot Distiller generated peaklists have peptide scores systematically worse than UCSF s Pava generated peaklists. On profile data, the on average performs as well as UCSF s Pava, but the scores have a larger variance. 4

Relative Performance for Centroid and Raw Data for UCSF s Pava and the RawDistiller Pava 1.00E-01 Raw Pava Distiller 1.00E-01 Effect of Peak Detection and De-isotoping Comparing the expectation values from database searches with peaklists generated by UCSF s Pava and ReAdW. Raw Pava Distiller 1.00E-01 1.00E-01 ReAdW Pava (with peak detection and de-isotoping) gives better expectation values than ReAdW peaklist (raw centroided data). 5

Unique Peptides Found with Each Software Protein Prospector search results peptides found using peaklists generated by the and UCSF s Pava can differ. generated peaklist UCSF s Pava generated peaklist generated peaklist UCSF s Pava generated peaklist Conclusions Some level of processing of the data (as in UCSF s Pava) is beneficial for improving data analysis. The is much more effective when operating on profile data rather than centroid data. Pava is effective on both profile and centroided data. The various centroiding programs each have their own strengths and weaknesses. Using peaklists generated by different programs allows the identification of more peptides. A manuscript on UCSF s Pava is in preparation. This work was supported by NIH NCRR 01614 and the Vincent J. Coates Foundation. 6