Proteome-wide label-free quantification with MaxQuant Jürgen Cox Max Planck Institute of Biochemistry July 2011
MaxQuant MaxQuant Feature detection Data acquisition Initial Andromeda search Statistics & systems biology Raw data Recalibration MQ output tables Main Andromeda search Consolidation / protein quantification Perseus Inspection of raw data Viewer
Supported input data Labeling methods Mass spectrometers SILAC Label free Di-methyl 18O ICAT ICPL Thermo Fisher Orbitrap and FT Work in progress: itraq Work in progress: SCIEX Triple TOF
Search engine MaxQuant Feature detection Initial Andromeda search Recalibration Peak lists Parameters Standalone Andromeda Single spectrum Peak list Andromeda web server Main Andromeda search Peptides Proteins Consolidation / protein quantification Visualization Cox et al, Andromeda a peptide search engine integrated into the MaxQuant environment. JPR (2011)
Mascot vs. Andromeda score Mascot score 0 20 40 60 80 100 120 140 160 180 0 100 200 300 400 500 Andromeda score 95% 90% 75% 50% 0%
Identification of co-fragmented peptides
Absolute vs. relative quantification Absolute quantification: copy numbers for each protein Relative quantification: compare same protein in different sample
XIC vs. spectral count 80.4 80.6 80.8 81 81.2 81.4 81.6 769 770 771 m/z 1
3D peak detection 2D peaks are assembled into 3D peaks Two 2D peaks in adjacent scans are connected when Δm < 7ppm Also next to nearest scan is checked
De-isotoping
Calculation of precise peptide masses Calculate precise mean and standard deviation for each peptide mass
8 Nonlinear mass recalibration 6 Without lock mass Mass error [ppm] 4 2 0 With lock mass -2 300 500 700 900 1100 1300 m/z [Th] 0. 8 Relative frequency 0. 6 0. 4 0. 2-2 0 2 4 6 8 Mass error [ppm]
Nonlinear mass recalibration 8 6 Mass error [ppm] 4 2 0-2 20 40 60 80 100 120 Retention time [min]
Nonlinear mass recalibration 8 6 Mass error [ppm] 4 2 0-2 76 78 80 82 84 86 88 90 Retention time [min]
ΔM [ppm] Nonlinear mass recalibration 4 2 0 First Andromeda search with 20ppm mass tolerance and score threshold 80-2 300 500 m/z 700 [Th] 900 1100 ΔM [ppm] 4 2 0-2 300 500 m/z 700 [Th] 900 1100 Represent Δm as functions of m/z and t 4 2 0-2 2 0-2 76 78 80 82 84 86 88 90 t [min] Determine x positions for piecewise linear approximation. Initialize y values with 0. 4 76 78 80 82 84 86 88 90 t [min] 4 Minimize residual error 4 ΔM [ppm] 2 0-2 300 500 m/z 700 [Th] 900 1100 2 0-2 76 78 80 82 84 86 88 90 t [min] Subtract recalibration functions from all measured peptides Perform the actual Andromeda search with small individualized mass tolerances
Mass error [ppm] Mass error [ppm] Mass error [ppm] Mass error [ppm] 2 0 2 0 Nonlinear mass recalibration -2 300 500 700 900 1100 1300 m/z [Th] 2 0-2 300 500 700 900 1100 1300 m/z [Th] 2 0-2 20 40 60 80 100 120 Retention time [min] -2 20 40 60 80 100 120 Retention time [min]
Nonlinear mass recalibration a. 9 Mass error [ppm] 6 3 0-3 b. 2 Mass error [ppm] 0-2 20 40 60 80 Retention time 100 120 [min] 20 40 60 80 Retention time 100 120 [min]
Improvement in mass accuracy
Problems in label free quantification Irreproducibility of retention time Incompatibility with pre-fractionation Quantification in a sample relies on MS/MS identification Identified peptides can be different in different samples
Two LC-MS runs Retention time alignment Peptides are matched by mass and retention time (only preliminary) Retention time difference between second and first LC-MS run Retention time in first LC-MS run Estimate of false positives from point densities in different regions
Retention time alignment Retention time difference between second and first LC-MS run Retention time in first LC-MS run
Matching between runs Identifcation transfer only between same or adjacend slices/fractions Transfering identifications after alignment increases base for quantitation by >100%
Label-free quantification: normalization Fraction A B C D E F : 5 6 7 8 9 : 13 14 15 16 : 19 20 21 22 : Peptide P: I P,A (N) = N A,6 XIC A,6 + N A,7 XIC A,7 + N A,8 XIC A,8 I P,B (N) = N B,5 XIC B,5 + N B,6 XIC B,6 + N B,7 XIC B,7 + N B,8 XIC B,8 I P,C (N) = N C,7 XIC C,7 + N C,8 XIC C,8 + N C,9 XIC C,9 I P,D (N) = N D,5 XIC D,5 + N D,6 XIC D,6 + N D,7 XIC D,7 I P,E (N) = N E,6 XIC E,6 + N E,7 XIC E,7 I P,F (N) = N F,7 XIC F,7 + N F,8 XIC F,8 Peptide Q: I Q,A (N) = N A,14 XIC A,14 + N A,15 XIC A,15 + N A,16 XIC A,16 I Q,B (N) = N B,13 XIC B,13 + N B,14 XIC B,14 + N B,15 XIC B,15 + N B,16 XIC B,16 I Q,C (N) = N C,13 XIC C,13 + N C,14 XIC C,14 + N C,15 XIC C,15 I Q,D (N) = N D,14 XIC D,14 + N D,15 XIC D,15 I Q,E (N) = N E,14 XIC E,14 + N E,15 XIC E,15 + N E,16 XIC E,16 I Q,F (N) = N F,14 XIC F,14 + N F,15 XIC F,15 Peptide R: I R,A (N) = N A,21 XIC A,21 + N A,22 XIC A,22 I R,B (N) = N B,19 XIC B,19 + N B,20 XIC B,20 + N B,21 XIC B,21 I R,C (N) = N C,20 XIC C,20 + N C,21 XIC C,21 + N C,22 XIC C,22 I R,D (N) = N D,20 XIC D,20 + N D,21 XIC D,21 I R,E (N) = N E,19 XIC E,19 + N E,20 XIC E,20 + N E,21 XIC E,21 I R,F (N) = N F,20 XIC F,20 + N F,21 XIC F,21 I P,A 2 ( I P,B (N)) ( ) ( ) HP (N) = I log P,A (N) 2 I + log I P,C (N) + P,A (N) log I P,D (N) I ( Q,A (N) 2 ) log I Q,B (N) ( ) ( ) HQ (N) = I Q,A (N) 2 I + log I Q,C (N) + Q,A (N) log I Q,D (N) I ( R,A (N) 2 ) log I R,B (N) ( ) + ( ) HR (N) = I R,A (N) 2 I log I R,C (N) + R,A (N) log I R,D (N) H(N) = H P (N) + H Q (N) + H R (N) + other peptides 2 + other sample pairs 2 + other sample pairs 2 + other sample pairs
a. >P63208 MPSIKLQSSDGEIFEVDVEIAKQSVTIKTMLEDLGMKDEGDD DPVPLPNVNAAILKKVIQWCTHHKDDPPPPEDDENKEKRTDD IPVWDQEFLKVDQGTLFELILAANYLDIKGLLDVTCKTVANM IKGKTPEEIRKTFNIKNDFTEEEEAQVRKENQWCEEK Protein quantification d. A B r BA C r CA r CB b. Peptide P 1 P 2 P 3 P 4 P 5 P 6 P 7 Sequence LQSSDGEIFEVDVEIAK TMLEDLGMK VIQWCTHHK RTDDIPVWDQEFLK TVANMIK TPEEIRK NDFTEEEEAQVR D r DA r DB r DC E r EA r EB r EC r ED F r FA r FB r FC r FD r FE A B C D E F e. r BA = I B / I A r CA = I C / I A r CB = I C / I B r DA = I D / I A r DB = I D / I B r DC = I D / I C r EC = I E / I C r ED = I E / I D I F = 0 c. f. Sample P 1 P 2 P 3 P 4 P 5 P 6 P 7 A + + B + + + C + + + + + + D + + + + + E + + + F + + Intensity 0 A B C D E F
Label-free quantification Benchmark dataset HeLa and E. coli cell lysates are mixed Proteins were digested with trypsin. In three replicates peptides were separated by isoelectric focusing in 24 fractions. This was repeated with the same amount of HeLa, but E. coli lysate tripled. This results in six samples for which all human proteins have constant protein profiles, while E. coli proteins have a ratio of three between replicate groups. LC-MS on an LTQ-Orbitrap mass spectrometer. Data: Christian Luber
Identification results 1,234,125 MS isotope patterns identified by MS/MS 1,852,556 MS isotope patterns identified by matching between runs 3,086,681 MS isotope patterns in total 6,577 proteins 5,161 proteins in at least 3/6 samples 4,589 proteins in 6/6 samples 46,839 peptide sequences
Label-free quantification results Log(intensity) Log(ratio)
Dynamic range benchmark dataset
Comparison to SILAC 10 Protein ratio 1 0.1 1e5 1e6 1e7 1e8 1e9 1e10 1e11 Summed peptide intensity
Precision vs. recall 1 0.98 Precision = TP / (TP + FP) 0.9 0.8 0.95 0.7 0.72 t-test Welch modified t-test Wilcoxon-Mann-Whitney test ratio 0.88 0 0.2 0.4 0.6 0.8 1 Recall = TP / (TP + FN)
Pulldowns
Imputation Log(intensity)
www.maxquant.org
groups.google.com/group/maxquant-list
Usability, documentation, software quality
Acknowledgements Matthias Mann Nadin Neuhauser Richard Scheltema Christoph Schaab Christian Luber All Mann lab members Thank you for your attention http://www.maxquant.org http://groups.google.com/group/maxquant-list http://groups.google.com/group/andromeda-list http://groups.google.com/group/perseus-list cox@biochem.mpg.de