Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data RIPS Team Jake Marcus (Project Manager) Anne Eaton Melanie Kanter Aru Ray Faculty Mentors Shawn Cokus Matteo Pellegrini Industry Sponsors Parag Mallick Roland Luethy
Key terms Protein: a large biomolecule carrying out various functions of a cell Peptide: a fragment of a protein Digestion Protein Peptides
What is proteomics? Proteome: all the proteins expressed in an individual at a given time Proteomics: the study of proteins, their structure and function Replication Transcription Translation DNA RNA Protein Metabolic and bodily functions 20 25 thousand genes Millions of proteins
Why study proteomics? Diagnosis of disease Personalized medicine Analysis
Our sponsor Cedars-Sinai Health System Spielberg Family Center for Applied Proteomics dedicated to developing proteomic technologies to guide doctors in patient management decisions focus on identifying and quantifying proteins using liquid chromatography/mass spectrometry (LC/MS)
Liquid chromatography A method to separate substances based on their affinity to water Retention time (RT): amount of time a substance takes to pass through the chromatography column RT=1 RT=2 RT=3
Intensity Mass spectrometry A method to separate the components of a mixture according to molecular mass Molecules are ionized, separated according to mass/charge, and detected Sample Ionization and Acceleration Electromagnet Mass/Charge Mass/Charge
Intensity Intensity LC/MS: combining liquid chromatography and mass spectrometry Retention time 1 Sample Mass/Charge Retention time 2 Retention Time Separated by Retention Time Mass/Charge
Retention Time The data Mass/Charge List of Identifications Retention Time Mass/charge Peptide Protein Confidence... 246 725.4 K.ACSQRPR.W ADH 86% 0.86 793 432.87 R.IGYADIK.W EPO 12% 0.12 1075 5367.91 K.LGANAILK.W HB 99.45% 0.9945
Retention Time Intensity The problem Determine the relative abundance of peptides in the original samples based on LC/MS data Mass/Charge
Intensity Challenges Locate isotopes Identifications not centered Unknown spread along retention time Noise Isotopes Point of Identification
Peptide quantification modules Extract 2D Neighborhood Squish Isotopes Limit Retention Time Axis Fit Curve Quantify
Intensity (x10 5 ) Extract 2D neighborhood Pick 2D neighborhood around identified location Must be large enough to include entire feature Point of Identification Mass/Charge Extract 2D Neighborhood Squish Isotopes Limit Retention Time Fit Curve Quantify
Intensity (x10 5 ) Squish isotopes Isotopes have similar retention times Select relevant mass/charge values, extract corresponding data Mass/Charge Extract 2D Neighborhood Squish Isotopes Limit Retention Time Fit Curve Quantify
Retention Time Squish isotopes: quantize Mass/Charge Signal Noise Actual mass/charges Extract 2D Neighborhood Squish Isotopes Limit Retention Time Fit Curve Quantify
Intensity Squish isotopes: combine Mass/Charge Extract 2D Neighborhood Squish Isotopes Limit Retention Time Fit Curve Quantify
Intensity (x10 5 ) Limit retention time Find highest peak Search along retention time until 4 out of 5 consecutive data points are below threshold Threshold Retention Time (seconds) Extract 2D Neighborhood Squish Isotopes Limit Retention Time Fit Curve Quantify
Intensity (scaled) Fit curve Gamma curve fit to data by nonlinear regression 1.4 1.2 Gamma, R2=0.98137 Gamma, R 2 = 0.9814 Gamma, R 2 = 0.98 Right-skewed 1 Limited by liquid chromatography flow-rate 0.8 0.6 0.4 0.2 0 10 15 20 25 30 35 40 45 50 55 60 Retention Time (scaled) Extract 2D Neighborhood Squish Isotopes Limit Retention Time Fit Curve Quantify
Intensity (scaled) Quantify Area under curve corresponds to peptide abundance Retention Time (scaled) Extract 2D Neighborhood Squish Isotopes Limit Retention Time Fit Curve Quantify
Evaluation 6 protein mix (~150 peptides) Same amount in every sample 5 protein mix (~250 peptides) SILAC Different amounts in each sample Controlled proportion of two isotopes of each protein in a sample 1x 1x 1x 1x 2x 3x 1:2
Intensity Intensity Data filtering Remove peptides: Not derived from sample protein mix Identified with confidence < 0.99 Difference in retention time > 100 seconds 10 3 10 6 1750 1795 1840 2440 2490 2540 Retention Time Retention Time
Obs/Exp IQR Optimizing the algorithm Developed different versions of each module Evaluated combinations of different versions of modules 1.28 1.26 1.24 1.22 1.2 1.18 1.16 1.14 1.12 1.1 1 2 3 4 5 6 7 Module combinations Modules Final Version 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Final Version 1 2 3 4 5 6 7 Module Modules combinations
6 protein mix evaluations ~7 outliers per run not shown
5 protein mix evaluations ~3 outliers per run not shown
Median observed ratio Concentration dependence 30 25 20 15 10 5 0 0 5 10 15 20 25 30 Expected ratio
Intensity SILAC introduction SILAC: Stable Isotope Labeling of Amino Acids in Cell Culture Peptides labeled with isotopes Light Isotope Medium x:y Protein Isolation LC/MS Mass/Charge Heavy Isotope Medium A single retention time slice
Preliminary SILAC evaluations ~1 outlier per run excluded ~6 outliers per run excluded
Intensity Intensity Future directions: better data filtering User inputs a pair of matched features If mismatched, ratio is meaningless Potential to predict when features are mismatched Retention Time Retention Time
Future directions: better data filtering Report match confidence for every ratio Possible diagnostic variables: Confidence of identifications Difference in retention time Difference in maximum intensity
Future directions: peptides to proteins Combine data from peptides to estimate quantity of mutual parent protein Sample 1 x 50 Protein Sample 2 digestion Peptides 5:1 5:1 5:1 algorithm Output 5.2:1 4.9:1 5.1:1 x 10 Protein Expected ratio 5:1 Mean ratio: 5.07:1 estimation
Observed/Expected Future directions: peptides to proteins Preliminary results 10 1 1/10
Acknowledgements Faculty Mentors Shawn Cokus Matteo Pellegrini Industry Sponsors Parag Mallick Roland Luethy Jake Thank You! Aru Everyone at the Spielberg Center IPAM Anne Melanie