Statistical analysis of isobaric-labeled mass spectrometry data

Similar documents
Quantitative Proteomics

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

Designed for Accuracy. Innovation with Integrity. High resolution quantitative proteomics LC-MS

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics

Computational Methods for Mass Spectrometry Proteomics

Modeling Mass Spectrometry-Based Protein Analysis

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Biological Mass Spectrometry

Aplicació de la proteòmica a la cerca de Biomarcadors proteics Barcelona, 08 de Juny 2010

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

BST 226 Statistical Methods for Bioinformatics David M. Rocke. January 22, 2014 BST 226 Statistical Methods for Bioinformatics 1

Atomic masses. Atomic masses of elements. Atomic masses of isotopes. Nominal and exact atomic masses. Example: CO, N 2 ja C 2 H 4

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

SILAC and TMT. IDeA National Resource for Proteomics Workshop for Graduate Students and Post-docs Renny Lan 5/18/2017

Statistical mass spectrometry-based proteomics

Amine specific Labeling Reagents for Multiplexed Relative and Absolute Protein Quantitation

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Systems Biology Exp. Methods

Chapter 2 What are the Common Mass Spectrometry-Based Analyses Used in Biology?

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data

Workshop: SILAC and Alternative Labeling Strategies in Quantitative Proteomics

MS-based proteomics to investigate proteins and their modifications

Comprehensive support for quantitation

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

for the Novice Mass Spectrometry (^>, John Greaves and John Roboz yc**' CRC Press J Taylor & Francis Group Boca Raton London New York

A TMT-labeled Spectral Library for Peptide Sequencing

Chem 250 Unit 1 Proteomics by Mass Spectrometry

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

UCD Conway Institute of Biomolecular & Biomedical Research Graduate Education 2009/2010

Protein analysis using mass spectrometry

HR/AM Targeted Peptide Quantification on a Q Exactive MS: A Unique Combination of High Selectivity, High Sensitivity, and High Throughput

Effective Strategies for Improving Peptide Identification with Tandem Mass Spectrometry

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

Quantitative Proteomics

Reagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure.

MS-MS Analysis Programs

CSE182-L8. Mass Spectrometry

Data pre-processing in liquid chromatography mass spectrometry-based proteomics

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

TANDEM MASS SPECTROSCOPY

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

SeqAn and OpenMS Integration Workshop. Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn The Center for Integrative Bioinformatics (CIBI)

PROTEIN SEQUENCING AND IDENTIFICATION USING TANDEM MASS SPECTROMETRY

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Analysis of Polar Metabolites using Mass Spectrometry

The Power of LC MALDI: Identification of Proteins by LC MALDI MS/MS Using the Applied Biosystems 4700 Proteomics Analyzer with TOF/TOF Optics

Proteomics. November 13, 2007

TOMAHAQ Method Construction

Relative quantification using TMT11plex on a modified Q Exactive HF mass spectrometer

Chemical Labeling Strategy for Generation of Internal Standards for Targeted Quantitative Proteomics

WADA Technical Document TD2003IDCR

Tandem mass spectra were extracted from the Xcalibur data system format. (.RAW) and charge state assignment was performed using in house software

Yifei Bao. Beatrix. Manor Askenazi

Spectronaut Pulsar. User Manual

Analysis of Labeled and Non-Labeled Proteomic Data Using Progenesis QI for Proteomics

Targeted protein quantification

STATISTICAL METHODS FOR THE ANALYSIS OF MASS SPECTROMETRY- BASED PROTEOMICS DATA. A Dissertation XUAN WANG

PROTEOMICS IN VASCULAR BIOLOGY

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

Targeted Proteomics Environment

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

TUTORIAL EXERCISES WITH ANSWERS

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

Mass spectrometry in proteomics

Identification of Human Hemoglobin Protein Variants Using Electrospray Ionization-Electron Transfer Dissociation Mass Spectrometry

Tutorial 1: Setting up your Skyline document

Protocol. Product Use & Liability. Contact us: InfoLine: Order per fax: www:

Peptide Targeted Quantification By High Resolution Mass Spectrometry A Paradigm Shift? Zhiqi Hao Thermo Fisher Scientific San Jose, CA

Proteome Informatics. Brian C. Searle Creative Commons Attribution

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Statistical Issues in Preprocessing Quantitative Bottom-Up LCMS Proteomics Data

Mass spectrometry-based proteomics has become

De Novo Peptide Sequencing: Informatics and Pattern Recognition applied to Proteomics

MS Based Proteomics: Recent Case Studies Using Advanced Instrumentation

Bayesian Clustering of Multi-Omics

De novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. Xiaowen Liu

Methods for proteome analysis of obesity (Adipose tissue)

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

Development and Evaluation of Methods for Predicting Protein Levels from Tandem Mass Spectrometry Data. Han Liu

Novel quadrupole time-of-flight mass spectrometry for shotgun proteomics

Quantitative analysis of the proteome

Isotope Dilution Mass Spectrometry

Chapter 4. strategies for protein quantitation Ⅱ

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

6 x 5 Ways to Ensure Your LC-MS/MS is Healthy

On Optimizing the Non-metric Similarity Search in Tandem Mass Spectra by Clustering

Statistical approach to protein quantification

Canonical Correlation, an Approximation, and the Prediction of Protein Abundance

QTOF-based proteomics and metabolomics for the agro-food chain.

Protein Sequencing and Identification by Mass Spectrometry

Quantitation of TMT-Labeled Peptides Using Higher-Energy Collisional Dissociation on the Velos Pro Ion Trap Mass Spectrometer

Supplementary Materials for

Proteomics and Mass Spectrometry

Transcription:

Statistical analysis of isobaric-labeled mass spectrometry data Farhad Shakeri July 3, 2018 Core Unit for Bioinformatics Analyses Institute for Genomic Statistics and Bioinformatics University Hospital Bonn

Outliers Proteomics, general overview Mass Spectrometry-based Quantitative proteomics Stable isotope labeling mass spectrometry Peptide identification Data structure Data analysis workflow

What is proteomics? Genomics: what can the cell potentially do? Transcriptomics: what is currently being turned on? Proteomics: what enzymes are currently active? which signals are being transduced? Omics Technologies Metabolomics: what is being produced/consumed? Definition: The proteome is the entire set of proteins in a given cell, tissue or biological sample, at a precise developmental or cellular phase. http://en.wikipedia.org/wiki/file:metabolomics_schema.png, accessed 2014-03-10, 11:42:00 UTC

What is proteomics? Transcriptomics Most features known Most features measured Signal correlates with abundance MS-Based Proteomics All possible features not known Sample is dynamic during analysis 20-50% of features measured Signal not detected means either that feature not present or feature present but not detected LCMS Source: Science Learning Hub, University of Waikato Credit: Steve Carr, Broad Institute of MIT and Harvard

Mass Spectrometry-based Quantitative proteomics LC/MSMS: Liquid Chromatography Tandem Mass Spectrometry Source: Emmanuel Barillot, Laurence Calzone, Philippe Hupé, Jean-Philippe Vert, Andrei Zinovyev, Computational Systems Biology of Cancer Chapman & Hall/CRC Mathematical & Computational Biology, 2012 5

Mass spectrometry-based quantitative proteomics LC/MSMS: Liquid Chromatography Tandem Mass Spectrometry Two main Mass-Spec categories: 1. Data-Dependent Aquisition DDA 2. Selected Reaction Monitoring SRM Data-Independent Aquisition DIA Specific to DDA: Allows relative quantification of peptides through chemical labeling. - isotopic (stable isotope labeling by amino acids in cell culture, or SILAC) - isobaric (isobaric tags for relative and absolute quantitation, or itraq) - isobaric (Tandem Mass Tags, or TMT) 6

Stable isotope labeling mass spectrometry LC/MSMS DDA Multiple samples can be quantitated simultaneously using isobaric tags. Tandem Mass Tags (TMT) 1. Isobaric Labeling Peptides being tagged 2. Pooling (Mixture) 3. Fractionation MS Runs 4. Mass Spec Roberto Romero et. al. 2010

All figures from: Steve Carr, Broad Institute of MIT and Harvard Stable isotope labeling mass spectrometry Multiple samples can be quantitated simultaneously using isobaric tags. itraq DMSO Kinase Inhib 1 Kinase Inhib 2 Kinase Inhib 3 Lyse and Digest Label Tags consist of reporter, balance, and reactive regions. Pool 114 115 116 117 ts Lighter reporter regions are paired with heavier balance regions. Entire tag attached to the peptide adds the same mass shift. (MS1)

All figures from: Steve Carr, Broad Institute of MIT and Harvard Stable isotope labeling mass spectrometry Multiple samples can be quantitated simultaneously using isobaric tags. Entire tag attached to the peptide adds the same mass shift. (MS1) Quantitative information regarding the relative amount of the peptide in the samples Peptide #1: No effect Relative Abundance 100 80 60 40 20 116.1111 114.1108 the peptides appear as a single -shifted- precursor. 0 112 114 116 118 m/z 100 114.1107 Peptide #2: Sensitive to all inhibitors Relative Abundance 80 60 40 20 115.1077 117.1146 Fragmentation in MS2 9060 8040 Relative Abundance 7020 600 112 114 116 118 50 m/z 40 100 116.1111 80 114.1108 291.2149 117.1145 reporter ions (MS2) 390.2832 503.3672 Reporter regions dissociate to produce ion signals Mix Peptides from a 720.4188 30 703.2882 20 116.1111 218.0594 404.3024 614.2397 792.3369 200.1014 462.1813 10 145.1086 240.1341 331.1429 352.1475 549.2076 774.3190833.5016 904.5338 561.3007 0 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 m/z M 0 112 114 116 118 m/z

Peptide structure & MS spectra Peptide Sequence: CLCYDGFMASEDMK y ions carboxyl terminus C-terminus b ions amino terminus N-terminus y1_2 Corresponding (Theoretical) Ion Spectra Intensity 0 2000 4000 6000 8000 10000 12000 278.15 288.14 393.18 462.18 522.22 617.29 680.29 740.27 y1_3 y1_4 y1_5 y1_6 y2_12 y1_7 y2_14 y1_8 y1_9 y1_10 y1_11 y1_12 811.33 797.37 944.37 958.41 1015.42 1130.45 1293.51 1467.56 0 20 40 60 80 100 b1_2 b1_3 b2_10 b2_11 b1_5 b1_6 b2_13 b2_14 b1_7 m/z CLCYDGFMASEDMK / 10

Peptide structure & MS spectra Example Peptide spectrum from MS experiment. source: Tobias Kind/FiehnLab Complete MS map is a jungle of hundreds of thousands of Peptide spectra.

Peptide identification Observed Spectrum from Mass Spectrometer Theoretical Peptide Spectra Either Stored in Spectral Libraries or Calculated for each candidate Peptide. These two are being matched by Maximum Likelihood algorithms. SCORE how well each observed spectrum matches a number of candidate peptides. Estimate Likelihood (E-Value) Log(# of Matches) Hyper Score source: Tobias Kind/FiehnLab Peptide Spectrum Match PSM Expected Number Of Random Matches Best Hit Credit: Brian Searls

Mass Spec data output matrix of PSM intensities Mixture 1 Mixture 2 Mixture 3 Run 1 Run2 Run 12 Run 1 Run 2 Run 12 Run 1 Run 2 Run 12 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Protein 1 Peptide 1 PSM 1 Protein 2 Protein 3 Protein 4 Protein 5 Peptide 1 PSM 1 Peptide 2 PSM 1 PSM 1* PSM 3 PSM 4 Peptide 3 PSM 1 Peptide 1 PSM 1 Peptide 2 PSM 1 Peptide 1 PSM 1 Peptide 2 PSM 1 Peptide 1 PSM 1 PSM 3 Peptide 2 PSM 1 Peptide 3 PSM 1 Feature (Every single row in the matrix): Protein + PSM PSM (Peptide Spectrum Match) : Peptide + Charge Channel Condition Mixture 126 WT.0h 2 3 127 WT.24h 2 3 128 WT.48h 2 3 129 Cblb.0h 2 3 130 Cblb.24h 2 3 131 Cblb.48h 2 3 Bio- Replicate 126 WT.0h 3 7 127 Cblb.0h 3 7 128 WT.24h 3 7 129 WT.0h 3 8 130 Cblb.0h 3 8 131 WT.24h 3 8 126 Cblb.24h 4 7 127 WT.48h 4 7 128 Cblb.48h 4 7 129 Cblb.24h 4 8 130 WT.48h 4 8 131 Cblb.48h 4 8

Data analysis workflow - Non-Complete Channel-sets. - Repeated PSM measurements within Fractions - Repeated PSM measurements across Fractions - Unique Peptides per Protein. Pre-Processing Transformation/ Normalization Missing data Imputation - Log2 + Median/Quantile - Variance Stabilisation Normalisation (VSN) Not Applicable to TMT (yet!) Summarization Tukey s median polish Significance Inference Moderated t-test (Limma) 14

Data analysis workflow Pre-Processing 12 Intensity boxplot before normalization fractions combined 2 3 4 9 Transformation/ Normalization Summarization Significance Inference log2(intensity) 6 3 0 12 9 6 3 0 12 9 Channel Conditio 6 Mixture Bion Replicate 126 WT.0h 2 3 3 127 WT.24h 2 3 128 WT.48h 2 3 0 129 Cblb.0h 2 3 126 127 128 129 130 Cblb.24h 2 3 131 Cblb.48h 2 3 130 126 WT.0h 3 7 127 Cblb.0h 3 7 128 WT.24h 3 7 129 WT.0h 3 8 130 Cblb.0h 3 8 131 WT.24h 3 8 126 Cblb.24h 4 7 127 WT.48h 4 7 128 Cblb.48h 4 7 129 Cblb.24h 4 8 130 WT.48h 4 8 131 Cblb.48h 4 8 15 131 126 127 128 129 Channel 130 131 126 127 128 129 130 131 Rep.3 Rep.7 Rep.8 Channel 126 127 128 129 130 131

0 3 6 9 0 3 6 9 0 3 6 9 Data analysis workflow Intensity boxplot VSN normalization fractions combined 12 2 3 4 Pre-Processing 9 Transformation/ Normalization Summarization Variance Stabilisation Normalisation (VSN) Abundance 6 3 0 12 9 6 3 Abundance Abundance.Norm Channel 126 127 128 129 130 131 Significance Inference 0 126 127 128 129 130 131 126 Check density for Normality VSN Normalization Channel fractions combined 127 128 129 130 131 126 127 128 129 130 131 2 3 4 0.3 density 0.2 0.1 0.0 0.3 0.2 0.1 Abundance Abundance.Norm Channel 126 127 128 129 130 131 0.0 16 12 Abundance 12 12

Data analysis workflow Pre-Processing Transformation/ Normalization Summarization Significance Inference Tukey s Median Polish Yij = µ + αi + βj + ϵij log-tintensity = grand median + row median + column median + use case: - combine features rolling-up from Peptide to Protein level overall intensity of protein i for sample j: Yi = µ + αi Yi = µ + median(ϵij) - Impute missing data miss Yij = µ + αi + βj residuals Protein 5 PSM 1 Peptide 1 PSM 3 Peptide 2 PSM 1 Peptide 3 PSM 1 Mixture 1 Mixture 2 Mixture 3 Run 1 Run2 Run 12 Run 1 Run 2 Run 12 Run 1 Run 2 Run 12 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 What else!? 17

18 Cblb.0h Cblb.24h Cblb.48h WT.0h WT.24h WT.48h Cblb.0h WT.0h WT.24h Cblb.24h Cblb.48h WT.48h 2 3 4 0 5 10 15 MS runs Log2 intensities # peptide: 20 atplsptvtlsmsadvplvveyk_2 atplsptvtlsmsadvplvveyk_3 cagnediitlr_2 cdrnlamgvnltsmsk_3 cdrnlamgvnltsmsk_3 cdrnlamgvnltsmsk_3 dlshigdavviscak_3 fsasgelgngnik_2 iadmghlk_2 iadmghlk_3 iadmghlk_3 icrdlshigdavviscak_3 mpsgefar_2 nlamgvnltsmsk_2 nlamgvnltsmsk_2 nlamgvnltsmsk_3 segfdtyr_2 segfdtyrcdr_3 vsdyemk_2 vsdyemk_2 vsdyemk_3 ylnfftk_2 yylapkiedeeas_2 P17918 Cblb.0h Cblb.24h Cblb.48h WT.0h WT.24h WT.48h Cblb.0h WT.0h WT.24h Cblb.24h Cblb.48h WT.48h 2 3 4 0 5 10 15 MS runs Log2 intensities Processed feature level data Run summary P17918 Profile plots & Summarization Mixture 1 Mixture 2 Mixture 3 Run 1 Run2 Run 12 Run 1 Run 2 Run 12 Run 1 Run 2 Run 12 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 Protein 5 Peptide 1 PSM 1 PSM 3 Peptide 2 PSM 1 Peptide 3 PSM 1 Mixture 1 Mixture 2 Mixture 3 Mixture 1 Mixture 2 Mixture 3

Issue with normalization and summarization Pre-Processing Transformation/ Normalization Summarization Current approach 1. Combine Fractions 2. Summarize for every PROTEIN, WITHIN MIXTURE 3. Between channel normalization (VSN) Significance Inference Alternative 1. Within MIXTURE & between FRACTION normalization. 2. Fraction combination within Mixture 3. Between Channel normalization (VSN) 4. Protein Summarization, within Mixture. 19

WT.24h Cblb.24h Statistical Inference p 1 p 2 ± z /2 s p1 (1 p 1 ) n 1 + p 2(1 p 2 ) n 2 Pre-Processing apple p 1 p2 r q p 1 (1 p 1 ) n + p 2(1 p 2 ) 1 2 /1 var{ŷ}, ŷ + t n 2 /1 s p1 (1 p p 1 p 2 ± z 1 ) /2 + p 2(1 p 2 ) 2 n 1 n ŷ t n 2 q var{ŷ} + s 2, ŷ + t n 2 q var{ŷ} 2 + s 2 F6ZQA3 A0A0R4J2A1 P83741 3 Q9QXL1 A0A075B5P2 F8WGM5 Q8BMS9 Q9CQR4 P61924 LIMMA: Q9CS00 apple BORROW INFO ACROSS PROTEINS Q9D6J6 B1ASP2 Q9D0Q7 P24668 ŷ t n 2 q /1 var{ŷ}, ŷ + t n 2 q /1 var{ŷ} Traditional approach: approach, for one protein: x Diab 1 x Control Student d di erence of group means s constant t = = Ȳ1 Ȳ 2 q Student d apple estimate ŷ t n 2 q of variation 1 s /1 var{ŷ} + s 2, ŷ + t n 1 + 1 2 q /1 n 2 x Diab var{ŷ} x Control + s 2 protein-specific degree of freedom, reflects x Diab x Student( Control Student d s constant d) 6-1 variance the number 0 of replicates s constant Solution by linear models for microarrays (limma): x Diab x Control Student( Smyth, 2005 s constant d) s 2 = d 0 s 2 0 + d s 2 di erence of group means d 0 + d t = ŷ t n 2 Transformation/ Normalization apple /1 Summarization Significance Inference estimate of variation s 2 = d 0 s 2 0 + d s 2 d 0 + d /1 = Ȳ1 q Ȳ 2 s q var{ŷ} Log 10 adjusted P Student d 1 n 1 + 1 n 2 3 P23492 d = d 0 + d 6-1 F8WJ93 Q8C405 Q3TE40 NS LogFC> 1 P.adj<0.05 & LogFC> 1 WT-24h v. KO-24h 9 Q9CS42 Q9CZD3 Q8R409 Q8R164 A0A0G2JH17 Q9EQ28 Q8K0D0 Q8BK72 P97304 Q8BXV2 Q9D1C9 Q3TKY6 P09055 Q66JS6 Q9Z0W3 Q8BH24 A0A0N4SV40 O88627 Q64261 A0A0R4J275 Q9CXY6 Q9CQE1 Q8K4P0 P58044 Q9WTK5 Q04207 E9QMP6 P70677 Q99M51 2.5 0.0 2.5 5.0 Log 2 fold change consensus variance over all proteins consensus degree of freedom over all proteins

Outlook Between channel v. between run normalization (order!) Alternative methods for fraction combination. Mixed-effect models instead of median-polish and t-test. Model missingness. (metric for assessing data quality?) Imputing missing values prior to summarization.

Acknowledgments This work has been done in collaboration with: Dr. Marc Sylvester Mass Spectrometery Core Facility Institut for Biochemistry and Molecular Biology University of Bonn Dr. Andreas Buness Core Unit for Bioinformatics Analyses Institute for Genomic Statistics and Bioinformatics University Hospital Bonn

Liquid Chromatography Tandem Mass Spectrometry 1D or 2D gel electrophoresis proteins are digested into peptides by means of chemical or enzymatic digestion LC: separation of peptides in time such that the mass spectrometer is provided with only a small portion at a time. Liquid from LC undergoes electrospray ionisation to form molecular ions Ion mixture sorted according to their m/z ratio. Survey scan (precursor ion selection) Fragmentation of the precursor ion to Product ions. Sorting of product ions according to their m/z ratio in the second analyser. Source: Emmanuel Barillot, Laurence Calzone, Philippe Hupé, Jean-Philippe Vert, Andrei Zinovyev, Computational Systems Biology of Cancer Chapman & Hall/CRC Mathematical & Computational Biology, 2012 23

Mass Spectrometry-based Quantitative proteomics LC/MSMS: Liquid Chromatography Tandem Mass Spectrometry In common Two main categories: Discovery proteomics (Shotgun) Targeted proteomics Proteins digested into peptides. Peptides separated by liquid chromatography (LC). Operated in: PROs Data-Dependant Acquisition (DDA) Discovering the maximal number of proteins from one or a few samples. Selected Reaction Monitoring (SRM) Accurate quantification of sets of specific proteins in many samples. CONs limited quantification capabilities on large sample sets. Limited measurements of a few thousands transitions. Specific to DDA: Allows relative quantification of peptides through chemical labeling. isotopic (stable isotope labeling by amino acids in cell culture, or SILAC) isobaric (isobaric tags for relative and absolute quantitation, or itraq) isobaric (Tandem Mass Tags, or TMT) 24

Data analysis workflow Pre-Processing Transformation/ Normalization Missing data Imputation Summarization Significance Inference Repeated Feature measurements within Fraction (Run). Protein 1 Peptide 1 PSM 1 Protein 2 Peptide 1 PSM 1 Peptide 2 Mixture 1 Run 1 Run2 Run 12 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 PSM 1 x x x x x x - - - - - - - - - - - - PSM x - x x - x - - - - - - - - - - - - PSM 3 PSM 4 Peptide 3 PSM 1 25

Data analysis workflow Pre-Processing Transformation/ Normalization Missing data Imputation Summarization Significance Inference Repeated Feature measurements within Fractions (Runs). Protein 1 Peptide 1 PSM 1 Protein 2 Peptide 1 PSM 1 Peptide 2 Mixture 1 Run 1 Run2 Run 12 Ion Score 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 PSM 1 46 x x x x x x - - - - - - - - - - - - PSM 1* 53 x - x x - x - - - - - - - - - - - - PSM 3 PSM 4 Peptide 3 PSM 1 26

Data analysis workflow Pre-Processing Transformation/ Normalization Missing data Imputation Summarization Significance Inference Repeated Feature measurements across Fractions (Runs). Protein 1 Peptide 1 PSM 1 Protein 2 Peptide 1 PSM 1 Peptide 2 Mixture 1 Run 1 Run2 Run 12 Ion Score 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 PSM 1 46 x x x x x x - - - - - - - - - - - - PSM 1* 53 x - x x - x - - - - - - - - - - - - PSM 3 PSM 4 Peptide 3 PSM 1 x x x x x x - - - - - - x x x x x - 27

Data analysis workflow Pre-Processing Transformation/ Normalization Missing data Imputation Summarization Significance Inference Use Features with Complete Channel-set Protein 1 Peptide 1 PSM 1 Protein 2 Peptide 1 PSM 1 Peptide 2 Mixture 1 Run 1 Run2 Run 12 Ion Score 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 PSM 1 46 x x x x x x - - - - - - - - - - - - PSM 1* 53 x - x x - x - - - - - - - - - - - - PSM 3 x x x - x x - - - - - - - - - - - - PSM 4 Peptide 3 PSM 1 x x x x x x - - - - - - x x x x x - 28

Data analysis workflow Pre-Processing Transformation/ Normalization Missing data Imputation Summarization Significance Inference Remove single-shot Proteins (at least 2 peptides per Protein) Protein 1 Peptide 1 PSM 1 Protein 2 Peptide 1 PSM 1 Peptide 2 Mixture 1 Run 1 Run2 Run 12 Ion Score 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 x x x x x x PSM 1 46 x x x x x x - - - - - - - - - - - - PSM 1* 53 x - x x - x - - - - - - - - - - - - PSM 3 PSM 4 Peptide 3 PSM 1 x x x x x x - - - - - - x x x x x - 29