Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Similar documents
Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics

Biological Mass Spectrometry

Modeling Mass Spectrometry-Based Protein Analysis

Computational Analysis of Mass Spectrometric Data for Whole Organism Proteomic Studies

Proteomics: the first decade and beyond. (2003) Patterson and Aebersold Nat Genet 33 Suppl: from

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

A TMT-labeled Spectral Library for Peptide Sequencing

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Computational Methods for Mass Spectrometry Proteomics

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra. Andrew Keller

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Supplementary Figure 1

An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra

Quantitative Proteomics

Spectrum-to-Spectrum Searching Using a. Proteome-wide Spectral Library

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

LECTURE-13. Peptide Mass Fingerprinting HANDOUT. Mass spectrometry is an indispensable tool for qualitative and quantitative analysis of

MaSS-Simulator: A highly configurable MS/MS simulator for generating test datasets for big data algorithms.

Tutorial 1: Setting up your Skyline document

Proteomics. November 13, 2007

Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!)

MS-based proteomics to investigate proteins and their modifications

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Lecture 15: Realities of Genome Assembly Protein Sequencing

PeptideProphet: Validation of Peptide Assignments to MS/MS Spectra

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

Targeted Proteomics Environment

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data

Transferred Subgroup False Discovery Rate for Rare Post-translational Modifications Detected by Mass Spectrometry* S

Protein identification problem from a Bayesian pointofview

Tandem MS = MS / MS. ESI-MS give information on the mass of a molecule but none on the structure

HOWTO, example workflow and data files. (Version )

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

Relative quantification using TMT11plex on a modified Q Exactive HF mass spectrometer

The Power of LC MALDI: Identification of Proteins by LC MALDI MS/MS Using the Applied Biosystems 4700 Proteomics Analyzer with TOF/TOF Optics

Identification of proteins by enzyme digestion, mass

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Comprehensive support for quantitation

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Protein analysis using mass spectrometry

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

iprophet: Multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates

Quantitative Proteomics

Yifei Bao. Beatrix. Manor Askenazi

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

ADVANCEMENT IN PROTEIN INFERENCE FROM SHOTGUN PROTEOMICS USING PEPTIDE DETECTABILITY

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of

Learning Score Function Parameters for Improved Spectrum Identification in Tandem Mass Spectrometry Experiments

MS-MS Analysis Programs

MALDI-HDMS E : A Novel Data Independent Acquisition Method for the Enhanced Analysis of 2D-Gel Tryptic Peptide Digests

Mass spectrometry in proteomics

Self-assembling covalent organic frameworks functionalized. magnetic graphene hydrophilic biocomposite as an ultrasensitive

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra

Mass Spectrometry Based De Novo Peptide Sequencing Error Correction

Data pre-processing in liquid chromatography mass spectrometry-based proteomics

Reagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure.

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

BENG 183 Trey Ideker. Protein Sequencing

Parallel Algorithms For Real-Time Peptide-Spectrum Matching

Methods for proteome analysis of obesity (Adipose tissue)

A statistical approach to peptide identification from clustered tandem mass spectrometry data

SeqAn and OpenMS Integration Workshop. Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI)

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

Chem 250 Unit 1 Proteomics by Mass Spectrometry

Jiří Novák and David Hoksza

TANDEM MASS SPECTROSCOPY

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Disulfide Linkage Characterization of Disulfide Bond-Containing Proteins and Peptides by Reducing Electrochemistry and Mass Spectrometry

Database search of tandem mass spectra is a central component

(Bio)chemical Proteomics. Alex Kentsis October, 2013

WADA Technical Document TD2015IDCR

PRIDE Cluster: building the consensus of proteomics data

A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry

STATISTICAL METHODS FOR THE ANALYSIS OF MASS SPECTROMETRY- BASED PROTEOMICS DATA. A Dissertation XUAN WANG

A New Hybrid De Novo Sequencing Method For Protein Identification

Structure of the α-helix

Quan%ta%on with XPRESS. and. ASAPRa%o

SRM assay generation and data analysis in Skyline

Was T. rex Just a Big Chicken? Computational Proteomics

Introduction to Proteomics & Bottom-up Proteomics

MS2DB: An Algorithmic Approach to Determine Disulfide Linkage Patterns in Proteins by Utilizing Tandem Mass Spectrometric Data

Analysis of Peptide MS/MS Spectra from Large-Scale Proteomics Experiments Using Spectrum Libraries

An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry

Properties of Average Score Distributions of SEQUEST

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

Chapter 4. strategies for protein quantitation Ⅱ

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Bruker Daltonics. EASY-nLC. Tailored HPLC for nano-lc-ms Proteomics. Nano-HPLC. think forward

Chapter 2 What are the Common Mass Spectrometry-Based Analyses Used in Biology?

Transcription:

s s Key questions of proteomics What proteins are there? Bioinformatics 2 Lecture 2 roteomics How much is there of each of the proteins? - Absolute quantitation - Stoichiometry What (modification/splice) state are the proteins in? Which proteins interact with each other or with other molecules (DNA, RNA)? Juri Rappsilber Wellcome Trust Centre for Cell Biology, UoE http://rappsilber.bio.ed.ac.uk/ How does all of the above change with time/stimulation/mutation of a key protein/? Foundation of proteomics Mass spectrometry Algorithms DNA sequencing What proteins are there? rotein identification is achieved by roteolysis of the proteins into peptides Mass spectrometric detection of the peptides (shortcut to protein identification: peptide mass fingerprinting) Mass spectrometric fragmentation of the peptides Database search to identify the peptides rotein digestion eptide mass fingerprinting Isolated protein 1. DTT 2. Iodacetamid 3. Trypsin Digestion K/R MALDI MS 999.99 1111.11 1222.22 1333.33 1444.44 1555.55 1666.66 Reduction of cysteines by DTT Alkylation of cysteins by IAA Digestion of the protein by trypsin (cleaves after lysine and arginine) Database Query (compare with list of in-silico digests) 999.99 1111.11 1222.22 1333.33 1444.44 1555.55 1666.66 1

eptide Fragmentation (Low-Energy Collision induced fragmentation) MS of a eptide Mixture eptides fragment preferentially between amino acids The chemical bond that cleaves depends on the fragmentation method. Low-Energy Collision Induced Dissociation (CID) is most common. Leads to b and y ions Electron Transfer Dissociation (ETD) is up and coming. Leads to c and z ions. 899.13 MS/MS of a eptide (low collision energy) MS/MS of a eptide (high collision energy) 899.13 986.593 1442.865 899.13 1796.1 E E V V LC-MS interface HV (16 V) solvent split HLC waste For the analysis of complex mixtures peptides are separated by liquid chromatography that is on-line coupled to a mass spectrometer. => Big datasets (2, spectra in 2h analysis, 1,, for an entire experiment possible.) 2 nl/min MS/MS Column (75!m)/spray tip (8!m)! Many programs available for this matching of fragmentation spectra with peptide sequences from databases (Mascot, Sequest, OMSSA, XTandem!)! Each program has its own score.! None of the scores is truly statistical.! Results for the same dataset vary (overlap between any two ca. 5-6%). 2

How to find the rate of incorrect assignments => confidence? Number of spectra Database iloc.e Target Database search score Targets and decoys v score Targets and s v /H.sapiens target-decoy DB Target FR calculation methods 35 3 25 False positive rate = count Target count eptide count 2 15 1 5 Tb Db Locally (within a window around a given score) or cumulative (everything above a given score) -1 1 2 3 4 5 Mascot score (score - threshold) Test the impact of FR calculation Two methods for counting the false positives ossibly correct iloc.e cumulative 5% FR Local 5% FR Definitely wrong namuh Target The addition of the human sequences allows us to check if our decoy based approach correctly models our incorrectly identified target peptides. 3

eptide counts for cumulative and local methods eptides accepted Cumulative and local methods Single and multiple peptide hits 9 8 Number of peptides accepted 7 6 5 4 3 d ht et rotein sequence Single peptide hit (SH) 2 1 Multiple peptide hits (MH) cumul local Method used More peptides identified by cumulative method, but also more false peptides included. Significance of SH and MH Improved confidence by SH/MH /Target curves.25 MHs confer additional corroboration to each-other SH are often disregarded in practice What if we treat MH and SH separately? MH and SH show very different curves. Local MH cutoff is similar to cumulative cutoff. Local SH cutoff is much higher than cumulative cutoff => cumulative method overestimates confidence in SH leading to high false discovery rates for SH proteins and their rejection. count / target count count/target count.2.15.1.5-1 -5 5 1 15 2 25 3 score Score (mascot: score - homol) Cumulative Local Local MH Local SH Final comparison: peptide counts Final comparison: protein counts eptide counts for cumulative, local, and split methods 25 rotein counts for cumulative, local, and split methods 9 8 2 7 eptide count 6 5 4 3 d_s d_m ht_s ht_m et_s et_m rotein count 15 1 5 d_s d_m ht_s ht_m et_s et_m 2 1 cum loc spl method Split method optimizes the number of correct peptides while minimizing the number of incorrect identifications. cum loc spl method The impact of the FR method is MUCH more severe on protein level as most peptides match to proteins together with other peptides. Few peptides match to a protein alone. However, essentially all incorrectly identified peptides match alone to a protein. The protein list grows hence by a protein per false peptide. Therfore, with current approach (cum) proteins cannot be identified reliably with a single peptide. This is possible using local split method (spl). 4

Non-statistical component of peptide-spectra matches SVM approach E.g.: Observed fragments do not scatter randomly among the calculated fragments. Collect long list of features characterizing the peptide-spectrum match (this includes the score but also other parameters) Use decoy matches as false positives Train the SVM with each dataset new Gives significant improvement (2-4%) over search program alone or alternative procedures. Käll L, Canterbury JD, Weston J, Noble WS, MacCoss MJ. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods. 27 Nov;4(11):923-5. Epub 27 Oct 21. Sequence space in the cell Single gene (part of the genome) Organism Intron Exon What does the peptide based analysis mean for identifying proteins? Gene expression roteins derived form this gene locus (part of the proteome) alternative transcripts/translations modification of amino acids proteolytic processing What does it mean to identify a protein in proteomics? Rappsilber J, Mann M. Trends Biochem Sci. 22 Feb;27(2):74-8. Scientific approach urification based on a specific function/property Labor Modified peptides Include modification as possibility in the database search For informatics the same problem as peptide identification peptides sequenced by mass spectrometry Identification Computer rotein sequence database sequencing error mispredicted ORF EST database including alternative transcripts and sequencing errors Genomic sequence database including SNs and sequencing errors Steen H, Mann M. The ABC's (and XYZ's) of peptide sequencing. Nat Rev Mol Cell Biol. 24 Sep;5(9):699-711. Review. 5

Quantitation in MS Quantitation in MS Absolute quantitation possible by using a labelled peptide as reference standard. Differential analysis possible by labelling on sample and not labelling the other. Both can then be mixed and analyzed together. Intensity mass difference Stable isotopes D 13 C m/z In vivo labeling with SILAC Analysis of proteins from stressed cells Cell cultures grown in stable isotope containing media State A State B 3 No/little change in protein abundance upon stress Stress induced change in protein abundance Combine and digest with trypsin Intensity, counts Quantitation by MS 5 6 7 8 9 1 11 m/z Stoichiometry rotein-protein interactions All peptides of a protein are stoichiometric but not observed with identical intensity. Can be analyzed using same tools as for protein identification (mass spectrometry and database searching). Need to cross-link proteins to maintain their proximity also after proteolysis. Intensity in mass spectrum not direct consequence of abundance but influenced by many molecule-specific factors => Apple-orange problem Approximation possible by summing up the mass spectrometric evidence gathered for a protein and normalizing this by the expected volume of evidence Example: number of observed peptides / number of observable peptides 6

Normal peptide identification I MS Mass match II MS/MS Fragment match VCLLINKLLR GSTKDVK III Computes THE SCORE I Normal peptide identification Cross-linked MS Mass match (n 2 +n)/2 times peptides 6 x 1 6 -> 6 x 1 12 II MS/MS Fragment match 1 -> 5.4 x 1 8 III usually low quality spectra Computes THE SCORE more theoretical fragments Maiolica A, Cittaro D, Borsotti D, Sennels L, Ciferri C, Tarricone C, Musacchio A, Rappsilber J. Structural analysis of multiprotein complexes by cross-linking, mass spectrometry, and database searching. Mol Cell roteomics. 27 Dec;6(12):22-11. Epub 27 Oct 5. 7