De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

Similar documents
SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

Modeling Mass Spectrometry-Based Protein Analysis

Computational Methods for Mass Spectrometry Proteomics

De novo peptide sequencing methods for tandem mass. spectra

Protein Sequencing and Identification by Mass Spectrometry

CSE182-L8. Mass Spectrometry

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

De Novo Sequencing of Top-Down Tandem Mass Spectra: A Next Step towards Retrieving a Complete Protein Sequence

High-Field Orbitrap Creating new possibilities

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Tandem Mass Spectrometry: Generating function, alignment and assembly

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

Thermo Scientific LTQ Orbitrap Velos Hybrid FT Mass Spectrometer

Electron Transfer Dissociation of N-linked Glycopeptides from a Recombinant mab Using SYNAPT G2-S HDMS

Proteomics. November 13, 2007

Advanced Fragmentation Techniques for BioPharma Characterization

HOWTO, example workflow and data files. (Version )

DE NOVO PEPTIDE SEQUENCING FOR MASS SPECTRA BASED ON MULTI-CHARGE STRONG TAGS

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

De Novo Peptide Sequencing

MS-based proteomics to investigate proteins and their modifications

De Novo Peptide Identification Via Mixed-Integer Linear Optimization And Tandem Mass Spectrometry

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

A graph-based filtering method for top-down mass spectral identification

Rapid Distinction of Leucine and Isoleucine in Monoclonal Antibodies Using Nanoflow. LCMS n. Discovery Attribute Sciences

MS Based Proteomics: Recent Case Studies Using Advanced Instrumentation

Supplementary Figure 1

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Mass spectrometry in proteomics

Tutorial 1: Setting up your Skyline document

Introduction to spectral alignment

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

MS-MS Analysis Programs

A New Hybrid De Novo Sequencing Method For Protein Identification

QuasiNovo: Algorithms for De Novo Peptide Sequencing

pnovo: De novo Peptide Sequencing and Identification Using HCD Spectra

Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry

Yifei Bao. Beatrix. Manor Askenazi

Lecture 15: Realities of Genome Assembly Protein Sequencing

Data pre-processing in liquid chromatography mass spectrometry-based proteomics

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

Was T. rex Just a Big Chicken? Computational Proteomics

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)

Spectrum-to-Spectrum Searching Using a. Proteome-wide Spectral Library

Protein analysis using mass spectrometry

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

MaSS-Simulator: A highly configurable MS/MS simulator for generating test datasets for big data algorithms.

Research Article High Mass Accuracy Phosphopeptide Identification Using Tandem Mass Spectra

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Improving Intact Antibody Characterization by Orbitrap Mass Spectrometry

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra

Reagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure.

UCD Conway Institute of Biomolecular & Biomedical Research Graduate Education 2009/2010

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Learning Score Function Parameters for Improved Spectrum Identification in Tandem Mass Spectrometry Experiments

Self-assembling covalent organic frameworks functionalized. magnetic graphene hydrophilic biocomposite as an ultrasensitive

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

Properties of Average Score Distributions of SEQUEST

UC San Diego UC San Diego Electronic Theses and Dissertations

The Power of LC MALDI: Identification of Proteins by LC MALDI MS/MS Using the Applied Biosystems 4700 Proteomics Analyzer with TOF/TOF Optics

Improved Peptide Identification for Proteomic Analysis Based on Comprehensive Characterization of Electron Transfer Dissociation Spectra

Quantitative Proteomics

6 x 5 Ways to Ensure Your LC-MS/MS is Healthy

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Quantitation of TMT-Labeled Peptides Using Higher-Energy Collisional Dissociation on the Velos Pro Ion Trap Mass Spectrometer

Methods for proteome analysis of obesity (Adipose tissue)

Top-down proteomics: applications, recent developments and perspectives

Interpreting raw biological mass spectra using isotopic mass-to-charge ratio and envelope fingerprinting

Computationally Analyzing Mass Spectra of Hydrogen Deuterium Exchange Experiments Kevin S. Drew University of Chicago May 21, 2005

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering

Translational Biomarker Core

LECTURE-11. Hybrid MS Configurations HANDOUT. As discussed in our previous lecture, mass spectrometry is by far the most versatile

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Supplementary Material for: Clustering Millions of Tandem Mass Spectra

Other Methods for Generating Ions 1. MALDI matrix assisted laser desorption ionization MS 2. Spray ionization techniques 3. Fast atom bombardment 4.

An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra

Relative Quantitation of TMT-Labeled Proteomes Focus on Sensitivity and Precision

Jiří Novák and David Hoksza

X!TandemPipeline (Myosine Anabolisée) validating, filtering and grouping MSMS identifications

Mass Spectrometry and Proteomics - Lecture 2 - Matthias Trost Newcastle University

PC 235: Mass Spectrometry and Proteomics

MSc Chemistry Analytical Sciences. Advances in Data Dependent and Data Independent Acquisition for data analysis in proteomic research

SERVA ICPL Kit (Cat.-No )

(Refer Slide Time 00:09) (Refer Slide Time 00:13)

Electronic Supplementary Information

Increasing the Multiplexing of Protein Quantitation from 6- to 10-Plex with Reporter Ion Isotopologues

Computational Analysis of Mass Spectrometric Data for Whole Organism Proteomic Studies

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia

SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry

Transcription:

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra Xiaowen Liu Department of BioHealth Informatics, Department of Computer and Information Sciences, Indiana University-Purdue University Indianapolis Center for Computational Biology and Bioinformatics, Indiana University School of Medicine 1

Top-Down Proteomics Becomes Reality Early proteomics methods used enzymes to digest proteins into pieces that could be easily analyzed by mass spectrometry. Those methods are now mature and routinely detect peptides from thousands of proteins in a single run But the great strength of those methods is also their greatest weakness. What s being analyzed is no longer the actual biological actors but the pieces left after they ve been broken apart By starting with intact proteins, rather than their pieces, top-down analysis more accurately reflects the structure and properties of actual biological systems than does bottomup proteomics. 2

Top-Down vs. Bottom-Up MS Bottom-up: Tryptic Digestion MS/MS Protein Peptides 5-30aa Top-down: No digestion and MS/MS Protein 3K-50K Da, 30-400aa 3

Top-Down vs Bottom-Up MS Measurable m/z values Commercial iontrap/ Orbitrap mass spectrometers: up to 4000 m/z Bottom-up mass spectra Small masses: 500 Da 4000 Da Low charge Top-down mass spectra Large masses, i.e., 20k Da High charge ions 4

Large Masses Make Spectra Complex Isotopes C 12 : Mass: 12.000, frequency: 98.93% C 13 : Mass: 13.003, frequency: 1.07% 100-carbon molecules The proportion of molecules is with all 100 carbons being C 12 is 0.9893 100 31.54% Theoretical isotopomer envelope for Lysozyme (14303.88 Da) 100 carbons 40 35 30 31.54 34.10 25 20 18.24 15 10 5 0 6.41 1.71 0.34 0.07 1 2 3 4 5 6 7 5

An Example of Top-Down Mass Spectra 6756.58Da (Charge +4) 6771.58Da (Charge +4) Top-down spectra usually have order(s) of magnitude more peaks and complex pattern of isotopomer envelopes. 6

Why Top-Down Proteomics Becomes Reality? High accuracy, high resolution, and high-throughput mass spectrometers: Orbitrap, FTICR. Kellie et al. Molecular BioSystems 2010 7

Protein Sequencing: Database Bottom-up MS Search Database search Match Protein sequence Peptide Database Top-down MS Peptide sequence Protein Database Top-down spectrum Deconvolution Mass list Database Search Protein sequence Protein Database 8

Antibodies An antibody is a large Y-shaped protein produced by B-cells. The antibody recognizes and binds to an antigen, a unique part of the foreign target. The variable domains of antibodies are highly mutated. Indispensable reagents for biomedical research and as diagnostic and therapeutic agents. The sequences of most antibodies are unknown. 9

De Novo Peptide Sequencing Bottom-up MS Tryptic Digestion MS/MS De novo sequencing D I Q M R P D S L S K Protein Peptides 5-30aa The order of peptides is missing. Which sequence is correct? Candidate 1: D I Q M R P D S L S K MCDSEFK V T I T C K R MCDSEFK V T I T C K R Peptide sequences Candidate 2: P D S L S K D I Q M R V T I T C K R MCDSEFK Available tools PEAKS Ma et. al. RCMS 2003 PepNovo Frank et al. JPR 2005 pnovo Chi et. al. JPR 2010 10

De Novo Protein Sequencing by Bottom-Up MS Multiple enzyme digestion Trypsin: after residues R and K GluC: after residues D and E Example Target protein (unknown): D I Q M R Q K P S D L S K S V G D R V T I T C K R S Q Bottom-up spectra (trypsin): Bottom-up spectra (GluC): De novo result: D I Q M R P S D L S K S V G D V T I T C K R Challenges Overlaps may be short Very short peptides Bandeira et al. Nature Biotechnology 2008 11

De Novo Protein Sequencing by Top-Down MS Top-down tandem mass spectra cover whole proteins. Example Target protein (unknown): D I Q M R Q K P S D L S K S V G D R V T I T C K R S Q Top-down spectra: De novo result: [ ]Q M R Q K [ ] D L S K S V[ ] V T I T C[ ]S Q Missing peaks Resulting sequences contain gaps 12

De Novo Protein Sequencing by Combining Top-Down and Bottom-Up MS (TBNovo) Complementary information Use bottom-up spectra to fill gaps in top-down spectra Use top-down spectra to find the order of bottom-up spectra Target protein (unknown): D I Q M R Q K P S D L S K S V G D R V T I T C K R S Q Bottom-up spectra: Top-down spectra: De novo result: D I Q M R Q R P S D L S K S V G D R V T I T C K R S Q 13

Data Sets Light chain of alemtuzumab (MabCampath) Top-down Thermo LTQ Orbitrap Velos and Thermo Q-Exactive ETD: 12134 spectra; CID: 7686; and HCD: 4931 Bottom-up Thermo LTQ Orbitrap XL HCD spectra Trypsin: 2716 spectra, chymotrypsin: 4328, proteinase K: 1616 and pepsin: 1910 Carbonic anhydrase 2 (CAH2 BOVIN) Top-down ETD: 3045; CID: 3363; HCD: 3437. Bottom-up Trypsin: 47536 spectra 14

Preprocessing Prefix residue masses corresponds to neutral b-ion masses. Convert all spectra to lists of candidate prefix residue masses. Bottom-up spectra De novo peptide sequencing (PEAKS) Represented by prefix residue masses of the peptides Top-down spectra Spectral deconvolution (MS-Deconv) Convert neutral masses to candidate prefix residue masses. Merge multiple top-down spectra to one. Ma et al. RCMS 2003, Liu et al. MCP 2010 15

Candidate Prefix Residue Masses 1. Preprocessing: parent mass: M=1111 Da Tandem mass spectrum from peptide PRTEINSTRING PRTEINSTRING Neutral mass list: 253 Da 457 Da 483 Da 569 Da Add complementary masses PR PRTE PRTEINS RING M-253 Da M-457 Da M-483 Da M-596 Da TEINSTRING PRTEINST INSTRING TRING Prefix residue masses: 253 483, M-457 Candidate prefix residue masses: 253, 457, 483,, 569, M-253,, M-59616

Spectral Mapping Mass count score: number of prefix residue masses shared by a top-down spectrum and a bottom-up spectrum, Shifted bottom-up spectra: adding a shift value to each prefix residue mass Optimal shift: the shift that maximizes the mass count score between a top-down spectrum and a bottom-up spectrum. Shifted mass count score: the best mass count score between a topdown spectrum and a shift bottom-up spectrum. 17

Spectral Mapping Keep only bottom-up spectra with a shifted mass count score >= 7. Keep only prefix residue masses supported by two bottom-up spectra or the top-down spectrum + a bottomup spectrum Result: combined prefix residue mass list 18

Gap Filling Shift bottom-up spectra to possible cleavage sites. Map bottom-up spectra to the combined prefix residue mass list. Combined prefix residue mass list Shifted bottom-up spectrum Masses used to fill the gap 19

Gap Filling Compute possible peptide masses Find bottom-up spectra with similar precursor masses Combined prefix residue mass list bottom-up spectrum Masses used to fill the gap 20

Spectral Graph Compute best shift for mapping bottom-up spectrum to the combined prefix residue masses. Update the list of combined prefix residue masses. Convert the list of prefix residue masses to a spectral graph. Find a heaviest path corresponding a protein sequence that best explains the experimental spectra. A C L V A G P C W W E T L L W R V T G E P K WD T L D T R K AVGELTK 21

Results Light chain of alemtuzumab (MabCampath) 214 amino acids Tbnovo reported 188 prefix residue masses, 184 were correct. Coverage 86.9%, accuracy 97.8% Carbonic anhydrase 2 (CAH2 BOVIN) 258 amino acids Tbnovo reported 229 prefix residue masses, 194 were correct. Coverage 75.2%, accuracy 84.7% 22

De novo sequencing result of MabCampath light chain 23

Software Tools TBNovo: Protein sequencing by combing top-down and bottom-up tandem mass spectra. MS-Deconv: Top-down spectral deconvolution. MS-Align+/TopPIC: Protein identification by top-down tandem mass spectra. http://mypage.iu.edu/~xwliu/ 24

Acknowledgements UCSD Pavel A. Pevzner Erasmus MC, Netherlands Lennard J. M. Dekker, Martijn M. Vanduijn Theo M. Luider PNNL Si Wu Ljiljana Paša-Tolić Saint Petersburg Academic University Mikhail Dvorkin Sonya Alexandrova Kira Vyatkina Nikola Tolić 25