Proteomics. November 13, 2007

Similar documents
Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

Lecture 15: Realities of Genome Assembly Protein Sequencing

Proteome Informatics. Brian C. Searle Creative Commons Attribution

BENG 183 Trey Ideker. Protein Sequencing

Proteins: Characteristics and Properties of Amino Acids

(Bio)chemical Proteomics. Alex Kentsis October, 2013

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

Computational Methods for Mass Spectrometry Proteomics

LC-MS Based Metabolomics

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Tutorial 1: Setting up your Skyline document

Sequence comparison: Score matrices

Collision Cross Section: Ideal elastic hard sphere collision:

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of

ProbSeq a fragmentation model for interpretation of electrospray tandem mass spectrometry data

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

Properties of amino acids in proteins

Discussion Section (Day, Time): TF:

Periodic Table. 8/3/2006 MEDC 501 Fall

SRM assay generation and data analysis in Skyline

Range of Certified Values in Reference Materials. Range of Expanded Uncertainties as Disseminated. NMI Service

INTRODUCTION. Amino acids occurring in nature have the general structure shown below:

A rapid and highly selective colorimetric method for direct detection of tryptophan in proteins via DMSO acceleration

Protein Secondary Structure Prediction

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

EXAM 1 Fall 2009 BCHS3304, SECTION # 21734, GENERAL BIOCHEMISTRY I Dr. Glen B Legge

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

MS-based proteomics to investigate proteins and their modifications

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Protein Structure Bioinformatics Introduction

Protein analysis using mass spectrometry

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Protein Sequencing and Identification by Mass Spectrometry

Separation of Large and Small Peptides by Supercritical Fluid Chromatography and Detection by Mass Spectrometry

for the Novice Mass Spectrometry (^>, John Greaves and John Roboz yc**' CRC Press J Taylor & Francis Group Boca Raton London New York

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)

Lecture 14 - Cells. Astronomy Winter Lecture 14 Cells: The Building Blocks of Life

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

C h a p t e r 2 A n a l y s i s o f s o m e S e q u e n c e... methods use different attributes related to mis sense mutations such as

1. Amino Acids and Peptides Structures and Properties

MS/MS of Peptides Manual Sequencing of Protonated Peptides

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Studies Leading to the Development of a Highly Selective. Colorimetric and Fluorescent Chemosensor for Lysine

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Lecture 8: Mass Spectrometry

Other Methods for Generating Ions 1. MALDI matrix assisted laser desorption ionization MS 2. Spray ionization techniques 3. Fast atom bombardment 4.

MS-MS Analysis Programs

CSE182-L8. Mass Spectrometry

Chemistry Chapter 22

BCH 4053 Exam I Review Spring 2017

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Biological Mass Spectrometry: Proteomics. Table of Contents Introduction 2 Information on how to use the module 3

Mass Spectrometry. Hyphenated Techniques GC-MS LC-MS and MS-MS

Amino Acids and Peptides

Lecture 8: Mass Spectrometry

TANDEM MASS SPECTROSCOPY

Fundamentals of Mass Spectrometry. Fundamentals of Mass Spectrometry. Learning Objective. Proteomics

Discussion Section (Day, Time):

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

HOWTO, example workflow and data files. (Version )

The Select Command and Boolean Operators

Enzyme Catalysis & Biotechnology

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Part 4 The Select Command and Boolean Operators

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

A Plausible Model Correlates Prebiotic Peptide Synthesis with. Primordial Genetic Code

Potentiometric Titration of an Amino Acid. Introduction

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics

12/6/12. Dr. Sanjeeva Srivastava IIT Bombay. Primary Structure. Secondary Structure. Tertiary Structure. Quaternary Structure.

Module No. 31: Peptide Synthesis: Definition, Methodology & applications

A Logic-Based Approach to Polymer Sequence Analysis

1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s).

TUTORIAL EXERCISES WITH ANSWERS

Principles of Biochemistry

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Structures in equilibrium at point A: Structures in equilibrium at point B: (ii) Structure at the isoelectric point:

Hypergraphs, Metabolic Networks, Bioreaction Systems. G. Bastin

Modeling Mass Spectrometry-Based Protein Analysis

Quantitative Proteomics

Introduction to Proteomics: Fragmentation of protonated peptides and manual sequencing

A New Hybrid De Novo Sequencing Method For Protein Identification

All Proteins Have a Basic Molecular Formula

Translation. A ribosome, mrna, and trna.

BioSystems 109 (2012) Contents lists available at SciVerse ScienceDirect. BioSystems

Solutions In each case, the chirality center has the R configuration

(Refer Slide Time 00:09) (Refer Slide Time 00:13)

Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

How did they form? Exploring Meteorite Mysteries

Lecture 15: Introduction to mass spectrometry-i

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

Dehua Hang, Matthew Rupp, and Eric Torng Departments of Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA

Methods for proteome analysis of obesity (Adipose tissue)

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Transcription:

Proteomics November 13, 2007

Acknowledgement Slides presented here have been borrowed from presentations by : Dr. Mark A. Knepper (LKEM, NHLBI, NIH) Dr. Nathan Edwards (Center for Bioinformatics and Computational Biology, University of Maryland, College Park) Dr. Catherine Fenselau (Dept. of Chemistry, University of Maryland, College Park)

What Is Proteomics? Simultaneous analysis of all members of a defined set of proteins.

Identification & Analysis of Proteins Sample preparation and handling Determination of amino acid sequence Protein identification and quantification Cell mapping

What is Mass Spectrometry? Measures masses of individual molecules Molecules are converted into ions in the gas phase Molecules must be charged (+ or -) Measure mass-to-charge ratio (m/z) of ions and its charged fragments Mass-to-charge ratio (m/z) of ions detected

Mass Spectrometry Components Ion Source Mass Analyzer Detector Inlet

Mass Spectrometry Components Ion Source Mass Analyzer Detector Inlet Vacuum

Mass Spectrometry Components Electronics Ion Source Mass Analyzer Detector Inlet Vacuum

Example Mass Spectrum

Example Mass Spectrum H 2 0

Ionization Methods Matrix-assisted laser desorption/ionization (MALDI) Electrospray Ionization (ESI)

MALDI Creates ions by excitation with a laser of a sample that is mixed with an excess amount of a matrix component Singly charged ions are guided to mass analyzer and detector Used with time-of-flight (TOF) analyzer

Protein Identification by MALDI- TOF Mass Spectrometry

ESI Creates gas-phase ions by applying a potential to a flowing liquid that contains the analyte and solvent molecules. Yields multiply charged ions Used with quadrupole or ion-trap mass analyzers in tandem mass spectrometry

Sample Preparation for Mass Spectrometry Enzymatic Digest and Fractionation Trypsin cleaves at Arg & Lys

Single Stage MS MS

Tandem Mass Spectrometry (MS/MS) MS/MS

Peptide Fragmentation N-terminus Peptides consist of amino-acids arranged in a linear backbone. H -HN-CH-CO-NH-CH-CO-NH-CH-CO- OH R i-1 R i R i+1 C-terminus AA residue i-1 AA residue i AA residue i+1

Peptide Fragmentation

Peptide Fragmentation y n-i y n-i-1 -HN-CH-CO-NH-CH-CO-NH- R i CH-R i+1 b i R i+1 b i+1

Peptide Fragmentation Peptide: S-G-F-L-E-E-D-E-L-K MW 88 145 292 405 534 663 778 907 1020 ion ion MW b 1 S GFLEEDELK y 9 1080 b 2 SG FLEEDELK y 8 1022 b 3 SGF LEEDELK y 7 875 b 4 SGFL EEDELK y 6 762 b 5 SGFLE EDELK y 5 633 b 6 SGFLEE DELK y 4 504 b 7 SGFLEED ELK y 3 389 b 8 SGFLEEDE LK 260 SGFLEEDEL K 147 b 9 y 2 y 1

Peptide Fragmentation 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity 0 250 500 750 1000 m/z

Peptide Fragmentation 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 y 6 % Intensity y 5 y 7 y 2 y 3 y 4 y 8 y 9 0 250 500 750 1000 m/z

Peptide Fragmentation 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 y 6 % Intensity y 2 y 3 y 4 y 5 b 3 b 4 b5 b6 b7 y 7 b y 8 b 8 9 y 9 0 250 500 750 1000 m/z

Peptide Identification Given: The mass of the parent ion, and The MS/MS spectrum Output: The amino-acid sequence of the peptide

Peptide Identification Two methods: De novo interpretation Sequence database search

De Novo Interpretation 100 % Intensity 0 250 500 750 1000 m/z

De Novo Interpretation 100 % Intensity E L 0 250 500 750 1000 m/z

De Novo Interpretation 100 % Intensity SGF L E E L F G KL E D E E D E L 0 250 500 750 1000 m/z

De Novo Interpretation Residual MW Amino-Acid Residual MW Amino-Acid 163.06333 Tyrosine Y 113.08407 Leucine L 186.07932 Tryptophan W 128.09497 Lysine K 99.06842 Valine V 113.08407 Isoleucine I 101.04768 Threonine T 137.05891 Histidine H 87.03203 Serine S 57.02147 Glycine G 156.10112 Arginine R 147.06842 Phenylalanine F 128.05858 Glutamine Q 129.04260 Glutamic acid E 97.05277 Proline P 115.02695 Aspartic acid D 114.04293 Asparagine N 103.00919 Cysteine C 131.04049 Methionine M 71.03712 Alanine A

De Novo Interpretation from Lu and Chen (2003), JCB 10:1

De Novo Interpretation

De Novo Interpretation from Lu and Chen (2003), JCB 10:1

De Novo Interpretation Find good paths in spectrum graph Can t use same peak twice Forbidden pairs Nested forbidden pairs Simple peptide fragmentation model Usually many apparently good solutions Needs better fragmentation model Needs better path scoring

De Novo Interpretation Amino-acids have duplicate masses! Incomplete ladders create ambiguity. Noise peaks and unmodeled fragments create ambiguity Best de novo interpretation may have no biological relevance Current algorithms cannot model many aspects of peptide fragmentation Identifies relatively few peptides in high-throughput workflows

Sequence Database Search Compares peptides from a protein sequence database with spectra Filter peptide candidates by Parent mass Digest motif Score each peptide against spectrum Generate all possible peptide fragments Match putative fragments with peaks Score and rank

Sequence Database Search S G F L E E D E L K 100 % Intensity 0 250 500 750 1000 m/z

Sequence Database Search 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 % Intensity 0 250 500 750 1000 m/z

Sequence Database Search 88 145 292 405 534 663 778 907 1020 1166 b ions S G F L E E D E L K 1166 1080 1022 875 762 633 504 389 260 147 y ions 100 y 6 % Intensity y 2 y 3 y 4 y 5 b 3 b 4 b5 b6 b7 y 7 b y 8 b 8 9 y 9 0 250 500 750 1000 m/z

Sequence Database Search No need for complete ladders Possible to model all known peptide fragments Sequence permutations eliminated All candidates have some biological relevance Practical for high-throughput peptide identification Correct peptide might be missing from database!

Peptide Candidate Filtering >ALBU_HUMAN MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIA FAQYLQQCPFEDHVKLVNEVTEFAK No missed cleavage sites MK WVTFISLLFLFSSAYSR GVFR R DAHK SEVAHR FK DLGEENFK ALVLIAFAQYLQQCPFEDHVK LVNEVTEFAK

Peptide Candidate Filtering >ALBU_HUMAN MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIA FAQYLQQCPFEDHVKLVNEVTEFAK One missed cleavage site MKWVTFISLLFLFSSAYSR WVTFISLLFLFSSAYSRGVFR GVFRR RDAHK DAHKSEVAHR SEVAHRFK FKDLGEENFK DLGEENFKALVLIAFAQYLQQCPFEDHVK ALVLIAFAQYLQQCPFEDHVKLVNEVTEFAK

Peptide Candidate Filtering Peptide molecular weight Only have m/z value Need to determine charge state Ion selection tolerance Mass for each amino-acid symbol? Monoisotopic vs. Average Default residual mass Depends on sample preparation protocol Cysteine almost always modified

Peptide Scoring Peptide fragments vary based on The instrument The peptide s amino-acid sequence The peptide s charge state Etc Search engines model peptide fragmentation to various degrees. Speed vs. sensitivity tradeoff y-ions & b-ions occur most frequently

Sequence Database Search Traps and Pitfalls Search options may eliminate the correct peptide Parent mass tolerance too small Fragment m/z tolerance too small Incorrect parent ion charge state Non-tryptic or semi-tryptic peptide Incorrect or unexpected modification Sequence database too conservative

Sequence Database Search Traps and Pitfalls Search options can cause infinite search times Variable modifications increase search times exponentially Non-tryptic search increases search time by two orders of magnitude Large sequence databases contain many irrelevant peptide candidates

Sequence Database Search Traps and Pitfalls Best available peptide isn t necessarily correct! Score statistics are essential What is the chance a peptide could score this well by chance alone? The wrong peptide can look correct if the right peptide is missing! Need scores that are invariant to spectrum quality and peptide properties

Sequence Database Search Traps and Pitfalls Search engines often make incorrect assumptions about sample prep Proteins with lots of identified peptides are not more likely to be present Peptide identifications do not represent independent observations All proteins are not equally interesting to report

Sequence Database Search Traps and Pitfalls Good spectral processing can make a big difference Poorly calibrated spectra require large m/z tolerances Poorly baselined spectra make small peaks hard to believe Poorly de-isotoped spectra have extra peaks and misleading charge state assignments