Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July PDF Free Download

Proteome-wide label-free quantification with MaxQuant Jürgen Cox Max Planck Institute of Biochemistry July 2011

MaxQuant MaxQuant Feature detection Data acquisition Initial Andromeda search Statistics & systems biology Raw data Recalibration MQ output tables Main Andromeda search Consolidation / protein quantification Perseus Inspection of raw data Viewer

Supported input data Labeling methods Mass spectrometers SILAC Label free Di-methyl 18O ICAT ICPL Thermo Fisher Orbitrap and FT Work in progress: itraq Work in progress: SCIEX Triple TOF

Search engine MaxQuant Feature detection Initial Andromeda search Recalibration Peak lists Parameters Standalone Andromeda Single spectrum Peak list Andromeda web server Main Andromeda search Peptides Proteins Consolidation / protein quantification Visualization Cox et al, Andromeda a peptide search engine integrated into the MaxQuant environment. JPR (2011)

Mascot vs. Andromeda score Mascot score 0 20 40 60 80 100 120 140 160 180 0 100 200 300 400 500 Andromeda score 95% 90% 75% 50% 0%

Identification of co-fragmented peptides

Absolute vs. relative quantification Absolute quantification: copy numbers for each protein Relative quantification: compare same protein in different sample

XIC vs. spectral count 80.4 80.6 80.8 81 81.2 81.4 81.6 769 770 771 m/z 1

3D peak detection 2D peaks are assembled into 3D peaks Two 2D peaks in adjacent scans are connected when Δm < 7ppm Also next to nearest scan is checked

De-isotoping

Calculation of precise peptide masses Calculate precise mean and standard deviation for each peptide mass

8 Nonlinear mass recalibration 6 Without lock mass Mass error [ppm] 4 2 0 With lock mass -2 300 500 700 900 1100 1300 m/z [Th] 0. 8 Relative frequency 0. 6 0. 4 0. 2-2 0 2 4 6 8 Mass error [ppm]

Nonlinear mass recalibration 8 6 Mass error [ppm] 4 2 0-2 20 40 60 80 100 120 Retention time [min]

Nonlinear mass recalibration 8 6 Mass error [ppm] 4 2 0-2 76 78 80 82 84 86 88 90 Retention time [min]

ΔM [ppm] Nonlinear mass recalibration 4 2 0 First Andromeda search with 20ppm mass tolerance and score threshold 80-2 300 500 m/z 700 [Th] 900 1100 ΔM [ppm] 4 2 0-2 300 500 m/z 700 [Th] 900 1100 Represent Δm as functions of m/z and t 4 2 0-2 2 0-2 76 78 80 82 84 86 88 90 t [min] Determine x positions for piecewise linear approximation. Initialize y values with 0. 4 76 78 80 82 84 86 88 90 t [min] 4 Minimize residual error 4 ΔM [ppm] 2 0-2 300 500 m/z 700 [Th] 900 1100 2 0-2 76 78 80 82 84 86 88 90 t [min] Subtract recalibration functions from all measured peptides Perform the actual Andromeda search with small individualized mass tolerances

Mass error [ppm] Mass error [ppm] Mass error [ppm] Mass error [ppm] 2 0 2 0 Nonlinear mass recalibration -2 300 500 700 900 1100 1300 m/z [Th] 2 0-2 300 500 700 900 1100 1300 m/z [Th] 2 0-2 20 40 60 80 100 120 Retention time [min] -2 20 40 60 80 100 120 Retention time [min]

Nonlinear mass recalibration a. 9 Mass error [ppm] 6 3 0-3 b. 2 Mass error [ppm] 0-2 20 40 60 80 Retention time 100 120 [min] 20 40 60 80 Retention time 100 120 [min]

Improvement in mass accuracy

Problems in label free quantification Irreproducibility of retention time Incompatibility with pre-fractionation Quantification in a sample relies on MS/MS identification Identified peptides can be different in different samples

Two LC-MS runs Retention time alignment Peptides are matched by mass and retention time (only preliminary) Retention time difference between second and first LC-MS run Retention time in first LC-MS run Estimate of false positives from point densities in different regions

Retention time alignment Retention time difference between second and first LC-MS run Retention time in first LC-MS run

Matching between runs Identifcation transfer only between same or adjacend slices/fractions Transfering identifications after alignment increases base for quantitation by >100%

Label-free quantification: normalization Fraction A B C D E F : 5 6 7 8 9 : 13 14 15 16 : 19 20 21 22 : Peptide P: I P,A (N) = N A,6 XIC A,6 + N A,7 XIC A,7 + N A,8 XIC A,8 I P,B (N) = N B,5 XIC B,5 + N B,6 XIC B,6 + N B,7 XIC B,7 + N B,8 XIC B,8 I P,C (N) = N C,7 XIC C,7 + N C,8 XIC C,8 + N C,9 XIC C,9 I P,D (N) = N D,5 XIC D,5 + N D,6 XIC D,6 + N D,7 XIC D,7 I P,E (N) = N E,6 XIC E,6 + N E,7 XIC E,7 I P,F (N) = N F,7 XIC F,7 + N F,8 XIC F,8 Peptide Q: I Q,A (N) = N A,14 XIC A,14 + N A,15 XIC A,15 + N A,16 XIC A,16 I Q,B (N) = N B,13 XIC B,13 + N B,14 XIC B,14 + N B,15 XIC B,15 + N B,16 XIC B,16 I Q,C (N) = N C,13 XIC C,13 + N C,14 XIC C,14 + N C,15 XIC C,15 I Q,D (N) = N D,14 XIC D,14 + N D,15 XIC D,15 I Q,E (N) = N E,14 XIC E,14 + N E,15 XIC E,15 + N E,16 XIC E,16 I Q,F (N) = N F,14 XIC F,14 + N F,15 XIC F,15 Peptide R: I R,A (N) = N A,21 XIC A,21 + N A,22 XIC A,22 I R,B (N) = N B,19 XIC B,19 + N B,20 XIC B,20 + N B,21 XIC B,21 I R,C (N) = N C,20 XIC C,20 + N C,21 XIC C,21 + N C,22 XIC C,22 I R,D (N) = N D,20 XIC D,20 + N D,21 XIC D,21 I R,E (N) = N E,19 XIC E,19 + N E,20 XIC E,20 + N E,21 XIC E,21 I R,F (N) = N F,20 XIC F,20 + N F,21 XIC F,21 I P,A 2 ( I P,B (N)) ( ) ( ) HP (N) = I log P,A (N) 2 I + log I P,C (N) + P,A (N) log I P,D (N) I ( Q,A (N) 2 ) log I Q,B (N) ( ) ( ) HQ (N) = I Q,A (N) 2 I + log I Q,C (N) + Q,A (N) log I Q,D (N) I ( R,A (N) 2 ) log I R,B (N) ( ) + ( ) HR (N) = I R,A (N) 2 I log I R,C (N) + R,A (N) log I R,D (N) H(N) = H P (N) + H Q (N) + H R (N) + other peptides 2 + other sample pairs 2 + other sample pairs 2 + other sample pairs

a. >P63208 MPSIKLQSSDGEIFEVDVEIAKQSVTIKTMLEDLGMKDEGDD DPVPLPNVNAAILKKVIQWCTHHKDDPPPPEDDENKEKRTDD IPVWDQEFLKVDQGTLFELILAANYLDIKGLLDVTCKTVANM IKGKTPEEIRKTFNIKNDFTEEEEAQVRKENQWCEEK Protein quantification d. A B r BA C r CA r CB b. Peptide P 1 P 2 P 3 P 4 P 5 P 6 P 7 Sequence LQSSDGEIFEVDVEIAK TMLEDLGMK VIQWCTHHK RTDDIPVWDQEFLK TVANMIK TPEEIRK NDFTEEEEAQVR D r DA r DB r DC E r EA r EB r EC r ED F r FA r FB r FC r FD r FE A B C D E F e. r BA = I B / I A r CA = I C / I A r CB = I C / I B r DA = I D / I A r DB = I D / I B r DC = I D / I C r EC = I E / I C r ED = I E / I D I F = 0 c. f. Sample P 1 P 2 P 3 P 4 P 5 P 6 P 7 A + + B + + + C + + + + + + D + + + + + E + + + F + + Intensity 0 A B C D E F

Label-free quantification Benchmark dataset HeLa and E. coli cell lysates are mixed Proteins were digested with trypsin. In three replicates peptides were separated by isoelectric focusing in 24 fractions. This was repeated with the same amount of HeLa, but E. coli lysate tripled. This results in six samples for which all human proteins have constant protein profiles, while E. coli proteins have a ratio of three between replicate groups. LC-MS on an LTQ-Orbitrap mass spectrometer. Data: Christian Luber

Identification results 1,234,125 MS isotope patterns identified by MS/MS 1,852,556 MS isotope patterns identified by matching between runs 3,086,681 MS isotope patterns in total 6,577 proteins 5,161 proteins in at least 3/6 samples 4,589 proteins in 6/6 samples 46,839 peptide sequences

Label-free quantification results Log(intensity) Log(ratio)

Dynamic range benchmark dataset

Comparison to SILAC 10 Protein ratio 1 0.1 1e5 1e6 1e7 1e8 1e9 1e10 1e11 Summed peptide intensity

Precision vs. recall 1 0.98 Precision = TP / (TP + FP) 0.9 0.8 0.95 0.7 0.72 t-test Welch modified t-test Wilcoxon-Mann-Whitney test ratio 0.88 0 0.2 0.4 0.6 0.8 1 Recall = TP / (TP + FN)

Pulldowns

Imputation Log(intensity)

www.maxquant.org

groups.google.com/group/maxquant-list

Usability, documentation, software quality

Acknowledgements Matthias Mann Nadin Neuhauser Richard Scheltema Christoph Schaab Christian Luber All Mann lab members Thank you for your attention http://www.maxquant.org http://groups.google.com/group/maxquant-list http://groups.google.com/group/andromeda-list http://groups.google.com/group/perseus-list cox@biochem.mpg.de

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011