Qualitative Proteomics (how to obtain high-confidence high-throughput protein identification!) James A. Mobley, Ph.D. Director of Research in Urology Associate Director of Mass Spectrometry (contact: mobleyja@uab.edu)
The Birth of Proteomics Proteomics was not born as many will say, but was indeed ignited with the discovery of very soft ionization techniques. For those in the field at the time, this was huge,, but still didn t t really take of until the mid to late 9 s!! However, prior to this....both hard and soft ionization processes were well established! These include, electron ionization (EI), chemical ionization (CI), fast atom bombardment (FAB), liquid secondary ion MS (LSIMS), and plasma desorption MS (PDMS). The newer techniques were initiated by and still include matrix associated desorption ionization (MALDI)- [Hillenkamp,, et. al. 1988] and electrospray ionization (ESI)- [Fenn et. al. 1989] based sources.
Soft Ionization Techniques ESI- (multiple charge states) MALDI- (singly charged ions) Baldwin M.A., Mass spectrometers for the analysis of biomolecules, Methods in Enzymology, Vol 42
Protein Characterization So.There are just three simple points to remember in the business of protein identification/ characterization. 1) Generation of a peptide-based mass spectra. [MS 1 only]; peptide mass fingerprinting [PMF] (protein must be nearly pure) [MS 2 ]; sequence data (high purity is not necessarily required) 2) Analysis. Matching algorithms based on in-silico digestion (MS 1 ) & fragmentation (MS 2 ) of known genes Denovo Sequencing (MS 2 ) 3) Validation. Immuno-affinity techniques (Western analysis, Immuno- histochemistry,, ELISA, Luminex,, etc). Mass Spectrometry (standard heavy isotope tags)
Proteins to Peptides. Even today, we are highly limited by decreased detection, resolving power, and poor fragmentation of whole proteins! Therefore, we digest proteins to peptides prior to MS analysis. Many chemical and enzymatic techniques have been published; however, trypsin remains the most commonly utilized enzyme for use in proteomics! This enzyme cleaves at arginine and lysine,, yielding peptides that are easily detected and fragmented in the most common mass analyzers today. Keeping in mind that utilizing multiple digestion procedures carried out on the same sample can be very complementing! http://donatello.ucsf.edu/ A lot of information here Take a look at Protein Prospector.
Sensitivity is Inversely Proportional to Mass (MALDI-ToF Example) R e lativ e P eak Area c Relative Peak Area (1 % set equal to 1.6 fmole) ( 1 % = 1.6 fm o le ) 1. 25. 6.3 1.6.4.1 1.6 femptomole 3.9 pg/ μl 1.6 fmole 1 2 3 4 5 Molecular Weight (kda) So.digestion is necessary, but does increase complexity! Common Enzymes: Trypsin* K-X, R-X Chymotrypsin* X-L, -F, -Y, -W Lys-C* K-X Arg-C* R-X Asp-N X-D Glu-C* E-X Common Chemicals: Cyanogen Bromide X-M HCL X-X * a tailing proline inhibits digestion Student Lab Example!
Concept of PMF [MS 1 ] MS 1 approaches spectra containing peptide parent molecules only! This type of unambiguous protein ID is referred to as peptide mass fingerprinting (PMF) introduced ~199. In this case, no sequence information is generated,, but it is very sensitive when very little sample is availible! The downfall is that the sample must be very pure!! Highly complementing for 2D PAGE work. However, high mass accuracy is a must as well! Overall, these days unless absolutely necessary. PMF should not be used! There are simply too many matches possible with this technique even with access to high resolution instruments.
MS 1 and/or MS 2 PMF CID Baldwin M.A., Mass spectrometers for the analysis of biomolecules, Methods in Enzymology, Vol 42
Example of [MS 1 ] From an In-Gel Digest * (ICLDLQAPLYKK) * * (TTSQVRPR) (HITSLEVIK) Micromass MALDI-ToF MS (7-1 ppm mass accuracy) * (ICLDLQAPLYK) * * (AGPHCPTAQLIATLK) (EAEEDGDLQCLCVK) * Nearly Complete Coverage! (Single Protein Matched by Mascot)
PMF Match! Predicted and determined mass for CXC4 (~ 8 kda Protein) (1)EAEEDGDLQC LCVKTTSQVR PRHITSLEVI KAGPHCPTAQ LIATLKNGRK ICLDLQAPLY KKIIKKLLES (7) Measured m/z Theoretical Mass (mi) Peptide Sequence Modi. M C Δm (Da) Start End 944.5486 944.5278 TTSQVRPR +.21 15 22 139.5997 139.6152 HITSLEVIK -.16 23 31 1333.5878 1333.719 ICLDLQAPLYK CAM-C -.131 51 61 1347.628 1347.7346 KICLDLQAPLYK Acry 1 -.132 5 61 1461.618 1461.8139 ICLDLQAPLYK CAM-C -.196 51 61 1577.647 1577.8474 AGPHCPTAQLIATLK CAM-C -.27 32 46 1591.6497 1591.8474 AGPHCPTAQLIATLK Acry -.198 32 46 1665.483 1665.71 EAEEDGDLQCLCVK -.227 1 14
Protein Fragmentation [MS 2 ] Both ESI and MALDI based tandem instruments are common in most core settings, each with a combination of mass analyzers. Each source and combination of mass analyzers have their selective advantages worthy of a second talk! Fragmentation is generally similar,, primarily with the generation of either. b and y ions; collision induced decay (CID) & infrared multiphoton dissociation (IRMPD) z and c ions; electron transfer dissociation (ETD) & electron capture dissociation (ECD)
ESI or MALDI? Quad, ToF,, Ion Trap, and or FT? Domon & Aebersold,, Science, V312, 14 April, 26
Peptide Fragmentation CID/ IRMPD ETD/ECD
LC-ESI ESI-MS(MS) 2 1 9. 3 3 2. 9 LCMS(BasePeak) 65 minute RF Elution Profile 2. 4 1 2 1. 6 23.74 minutes 2 3. 4 2 3. 6 1 2. 6 1 2. 8 5 3. 6 2 3 3. 6 8 4 2. 6 9 6. 8 1 8. 3 4 4 3. 3 1 5 1. 4 8 1 2 3 4 5 T i m e ( m i n ) jm _ 4 1 1 2 d _ p y m t 3 4 # 1 4 1 9 R T : 2 3. 7 4 A V : 1 N L : 1. 3 3 E 8 T : + c N S I F u l l m s [ 4. - 1 8. ] 8 4 6. 9 7 1 9 Relative Abundance 8 7 6 5 4 799.76 m/z 7 9 9. 7 6 Single Spectrum (MS) @23.74 minutes R.T. 3 2 7 2. 9 9 5 5 1. 6 8 1 7. 8 9 8. 7 5 4. 2 2 6 8 5. 4 7 1 1 6. 8 1 3 1 6. 9 9 1 7 2 2. 5 7 1 1 2 4. 2 1 1 5 6 5. 6 1 3 9 3. 2 8 1 6 7 8. 6 2 1 7 9 2. 3 1 4 6 8 1 1 2 1 4 1 6 1 8 m / z Sequence: (K)YMCENQATISSK Single Spectrum (MS 2 ) @799.76 precursor peak
MALDI-MS(MS) MS(MS) 2 1 1182.59 8 832.31 114.57 Cluster Area 6 1233.6 131.68 In Gel Digest (MS) 4 177.77 2383.96 2 679.52 669. 789.4 746.9 696.9 759.42 882.89 1136.56 1765.73 1365.64 1493.71 959.53 1232.6 1487.74 87.26 1883.89 999.47 1118.58 132.6 1529.74 1641.77 1791.77 1716.85 2224.9 9.9 1172.56 1451.74 987.51 186.55 1265.58 1383.68 1831.93 1953. 252.92 288. 1584.67 1794.78 2281.8 817.49 141.73 1472.78 1664.78 1274.71 1336.64 1872.87 1994.2 119.53 256.8 2125.88 2194.33 917.28 1198.63 2254.6 947.52 163.83 1838.89 8 1 12 14 16 18 2 22 Mass MS/MS (1182 ion)
Directed or Non-Directed Proteomics? (the road to global proteomics!) Complex Protein Mixture Chemical or Enzymatic Digestion Multidimensional Separations Affinity, CaP-LC Pure Protein Multidimensional Separations Affinity, 2D, HPLC Chemical or Enzymatic Digestion MS Analysis Computational Time Minimum MS/MS Analysis Computational Time Extensive Protein ID
Multi ltidimensional imensional Protein Identification Technologies (MuDPIT( MuDPIT) [+/-] Treated Cells Secreted or Cellular Proteins (Combined and Pre-fractionated) Tryptic Digestion Stepwise Elution SCX Column F1 F2F3 F4 F5 Gradient Elution (LC/MS) C18 Column Peptide Pairs Relative Quantification and Identification Automated Spectral Analysis LCMS(MS) 2 MudPIT developed in John Yate s Lab, Scripps Research Institute
Data Analysis A standard 1D LC-ESI run may have as many as 4,-6, MS files!! A MuDPIT run may contain 25,-6, files!! While LC-MALDI runs generate far fewer data files, they still contain too-much data to analyze by hand! Therefore, automated data analysis is required!!
Data Analysis Common Matching Algorithms; Sequest,, MASCOT, XTandem Automated Denovo Sequence Tools; Peaks, Rapid Denovo, DenovoX,, Mascot Distiller, PepNovo,, others.. Statistical Software Scaffold, Protein Profit, Finnigan,, others Standardizing the Field! Trans Proteomic Pipeline (Sashimi Project; mzxml based universal software package)
Matching Algorithms All matching algorithms are based on generating a score based on closeness of fit between the peptides measured in the mass spectrometer and the in-silico digestion of known genes or proteins in a database. The two most commonly used databases include: NCBI-NR NR and Uniprot
Peptide Fragmentation 88 145 292 45 534 663 778 97 12 1166 b ions S G F L E E D E L K 1166 18 122 875 762 633 54 389 26 147 y ions 1 y 6 % Intensity b 3 b5 b6 b7 b 4 y 4 y 5 y 9 y 7 y 2 y 3 y 8 25 5 75 1 b 8 b 9 m/z
De Novo Interpretation 1 % Intensity SGF L E E L F G KL E D E E D E L 25 5 75 1 m/z
Generating a Universal Score Mixture of distributions Number of spectra in each bin 2 18 16 incorrect 14 12 1 correct 8 6 4 2-3.9-2.3 -.7.9 2.5 4.1 5.7 7.3 Discriminant score (D)
So.We Have ID s,, Now What? Validation. Validation.. Validation.
(CXC4 Western & ELISA) IU/mL (x 1 3 ) 15 1 5 4-fold Norm CaP-local CaP-Met Norm CaP-local CaP-Met Norm CaP-met Western Blot CXC4 7.8 kda #1 #2 #3 #4 #1 #2 #3 #4
HTP Quantitative Validation Multiplex Bead Assay for Cytokines The highlighted area represent populations of fluorescent beads, distinctively labeled, and carrying capture antibodies for sandwich assay of different cytokines. All detection antibodies carry the same fluorophore, which is read in a third channel to quantify sample cytokine concentration
Summary Whether or not you do the MS work yourself.. Know the specifics! Know the limitations..! Sample Prep is always important..but which instrument you have access to is also important! Similarly important.how is your data analyzed?? Denovo? Matching Algorithm?? Mix of techniques?? What are the cut off points and does it make sense? Keep in mind that validation must be part of your workflow!! So, make sure you have confidence in your choices before going forward!
Useful Links! i-mass.com spectroscopynow.com expasy.ch/tools cprmap.com psidev.sourceforge.net prospector.ucsf.edu jeolusa.com/ms/docs/ionize.html asms.org (become a member!) hupo.org matrixscience.com proteomecenter.org/software.php ionsource.com bruker.com thermo.com appliedbiosystems.com shimadzu.com luminexcorp.com
Questions??