Site-specific Identification of Lysine Acetylation Stoichiometries in Mammalian Cells

Similar documents
SRM assay generation and data analysis in Skyline

HOWTO, example workflow and data files. (Version )

TUTORIAL EXERCISES WITH ANSWERS

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Tutorial 1: Setting up your Skyline document

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Workshop: SILAC and Alternative Labeling Strategies in Quantitative Proteomics

Supplementary Figure 1

MassHunter Software Overview

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

NPTEL VIDEO COURSE PROTEOMICS PROF. SANJEEVA SRIVASTAVA

Improved 6- Plex TMT Quantification Throughput Using a Linear Ion Trap HCD MS 3 Scan Jane M. Liu, 1,2 * Michael J. Sweredoski, 2 Sonja Hess 2 *

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

HOW TO USE MIKANA. 1. Decompress the zip file MATLAB.zip. This will create the directory MIKANA.

Designed for Accuracy. Innovation with Integrity. High resolution quantitative proteomics LC-MS

MassHunter TOF/QTOF Users Meeting

1. Prepare the MALDI sample plate by spotting an angiotensin standard and the test sample(s).

Quantitation of a target protein in crude samples using targeted peptide quantification by Mass Spectrometry

Tutorial 2: Analysis of DIA data in Skyline

Analyst Software. Peptide and Protein Quantitation Tutorial

FRAGMENT SCREENING IN LEAD DISCOVERY BY WEAK AFFINITY CHROMATOGRAPHY (WAC )

Last updated: Copyright

LysinebasedTrypsinActSite. A computer application for modeling Chymotrypsin

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

TOMAHAQ Method Construction

The Pitfalls of Peaklist Generation Software Performance on Database Searches

Chemical Labeling Strategy for Generation of Internal Standards for Targeted Quantitative Proteomics

Isotopic-Labeling and Mass Spectrometry-Based Quantitative Proteomics

6 x 5 Ways to Ensure Your LC-MS/MS is Healthy

Supplementary Figure 1. SDS-PAGE analysis of GFP oligomer variants with different linkers. Oligomer mixtures were applied to a PAGE gel containing

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

SILAC and TMT. IDeA National Resource for Proteomics Workshop for Graduate Students and Post-docs Renny Lan 5/18/2017

X!TandemPipeline (Myosine Anabolisée) validating, filtering and grouping MSMS identifications

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

Comprehensive support for quantitation

Identification of proteins by enzyme digestion, mass

1 Introduction. command intended for command prompt

All Ions MS/MS: Targeted Screening and Quantitation Using Agilent TOF and Q-TOF LC/MS Systems

Applications of Mass Spectrometry for Biotherapeutic Characterization

A Description of the CPTAC Common Data Analysis Pipeline (CDAP)

Key Words Q Exactive, Accela, MetQuest, Mass Frontier, Drug Discovery

PC235: 2008 Lecture 5: Quantitation. Arnold Falick

The new Water Screening PCDL

Spectronaut Pulsar. User Manual

TMHMM2.0 User's guide

SUPPLEMENTARY INFORMATION

Protocol. Product Use & Liability. Contact us: InfoLine: Order per fax: www:

Skyline Small Molecule Targets

Methods for proteome analysis of obesity (Adipose tissue)

MS-MS Analysis Programs

MS-based proteomics to investigate proteins and their modifications

Department of Chemistry

profileanalysis Innovation with Integrity Quickly pinpointing and identifying potential biomarkers in Proteomics and Metabolomics research

Modeling Mass Spectrometry-Based Protein Analysis

Let s continue our discussion on the interaction between Fe(III) and 6,7-dihydroxynaphthalene-2- sulfonate.

All numbered readings are from Beck and Geoghegan s The art of proof.

Towards the Prediction of Protein Abundance from Tandem Mass Spectrometry Data

An Effective Workflow for Impurity Analysis Incorporating High Quality HRAM LCMS & MSMS with Intelligent Automated Data Mining

Protein Identification Using Tandem Mass Spectrometry. Nathan Edwards Informatics Research Applied Biosystems

CHEM 121: Chemical Biology

Developing Algorithms for the Determination of Relative Abundances of Peptides from LC/MS Data

NMR Assignments using NMRView II: Sequential Assignments

4. GIS Implementation of the TxDOT Hydrology Extensions

via Tandem Mass Spectrometry and Propositional Satisfiability De Novo Peptide Sequencing Renato Bruni University of Perugia

SPECTRA LIBRARY ASSISTED DE NOVO PEPTIDE SEQUENCING FOR HCD AND ETD SPECTRA PAIRS

Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of

Self-assembling covalent organic frameworks functionalized. magnetic graphene hydrophilic biocomposite as an ultrasensitive

Computational Methods for Mass Spectrometry Proteomics

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

B American Society for Mass Spectrometry, 2017 J. Am. Soc. Mass Spectrom. (2017) 29:866Y878 DOI: /s

The Theory of HPLC. Quantitative and Qualitative HPLC

BCMB/CHEM 8190 Lab Exercise Using Maple for NMR Data Processing and Pulse Sequence Design March 2012

Assignment 1 Physics/ECE 176

Protocol. Product Use & Liability. Contact us: InfoLine: Order per fax: www:

Nature Structural and Molecular Biology: doi: /nsmb Supplementary Figure 1

Computational Structural Biology and Molecular Simulation. Introduction to VMD Molecular Visualization and Analysis

Proteomics. November 13, 2007

BA, BSc, and MSc Degree Examinations

ProMass Deconvolution User Training. Novatia LLC January, 2013

Reagents. Affinity Tag (Biotin) Acid Cleavage Site. Figure 1. Cleavable ICAT Reagent Structure.

Peptide Targeted Quantification By High Resolution Mass Spectrometry A Paradigm Shift? Zhiqi Hao Thermo Fisher Scientific San Jose, CA

SUPPLEMENTARY INFORMATION

Supplementary Materials for R3P-Loc Web-server

Chemistry 224 Bioorganic Chemistry Friday, Sept. 29, This Exam is closed book and closed notes. Please show all your work!

Mass spectrometry has been used a lot in biology since the late 1950 s. However it really came into play in the late 1980 s once methods were

The Power of LC MALDI: Identification of Proteins by LC MALDI MS/MS Using the Applied Biosystems 4700 Proteomics Analyzer with TOF/TOF Optics

SUPPLEMENTARY INFORMATION

NIH Center for Macromolecular Modeling and Bioinformatics Developer of VMD and NAMD. Beckman Institute

CycloBranch. Tutorials

WADA Technical Document TD2015IDCR

Serine-7 but not serine-5 phosphorylation primes RNA polymerase II CTD for P-TEFb recognition

Build_model v User Guide

Improved Throughput and Reproducibility for Targeted Protein Quantification Using a New High-Performance Triple Quadrupole Mass Spectrometer

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

DADA17-69-C-9182 ; FEASIBILITY STUDY OF CYTOCHALASIN B ENUCLEATION OF CELLS FINAL REPORT. Richard D. Estensen, M. D. 15 June 1973

Introduction to FBDD Fragment screening methods and library design

ASEAN GUIDELINES FOR VALIDATION OF ANALYTICAL PROCEDURES

Making Sense of Differences in LCMS Data: Integrated Tools

Transcription:

Supplementary Information Site-specific Identification of Lysine Acetylation Stoichiometries in Mammalian Cells Tong Zhou 1, 2, Ying-hua Chung 1, 2, Jianji Chen 1, Yue Chen 1 1. Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, MN 55455, USA 2. These authors contributed equally to this work Correspondence: Dr. Yue Chen (YueChen@umn.edu) Supplementary Figure S1-S5 Supplementary Table T1-T2 Supplementary Note 1 S1

Supplementary Figures Figure S1. Ah-NHS synthesis and labeling reactions. (A) Synthesis reactions for Ah-NHS starting from sodium acetate- 13 CD 3. (B) The labeling reactions of Ac-NHS or AcOAc at lysine - NH2 or/and -NH2. Figure S2. Co-eluting profiles of four peptides with different isotope-labeled acetyl groups. Synthetic peptides (structure shown in Figure 4) containing different numbers of light acetyl groups were balanced with heavy acetyl groups. They were mixed and injected into LCMS after desalting. After balancing, peptide 1 has a molecular weight of 745.94, peptides 2 and 3 have MW of 748.46, peptides 4 and 5 have MW of 750.97, peptides 6 and 7 have MW of 753.48. Figure S3. BSA spike-in validation experiment. In vitro chemical acetylated BSA was mixed at different heavy-to-light ratios to mimic 50%, 10% and 1% stoichiometries and spiked-in with Hela whole cell lysate stoichiometry analysis at a roughly 1:50 ratio (BSA/Hela proteins, w/w). Three BSA peptides were detected and their acetylation stoichiometry were quantified with StoichAnalyzer software. Figure S4. Schematic illustration of false positive peak selection and quantification in acetylation stoichiometry analysis. (A) Precursor ion selection of light acetyl peptides may be interfered with the isotope peaks from other co-eluting species and the stoichiometry analysis data become unreliable. By implementing deconvolution and deisotoping, the software removes these false positive peak selections. (B) The identified heavy acetylated peptides may serve as internal standards for the software to correct the mass error in each spectrum and accurately select corresponding light acetylated peptide peaks. (C) The software applies a filtering step based on database search results to remove incorrectly selected peak pairs resulting from other co-eluting peptides. S2

Figure S5. Reproducibility of two biological replicate experiments for untreated HeLa protein stoichiometry analysis. Figure S6. MS and MS/MS spectra of a two-lys containing peptide. Precursor ion spectrum is shown in the upper panel. The annotated fragmentation spectrum of the middle precursor ion containing one heavy and one light acetyl group is shown in the lower panel. Figure S7. Clustering analysis of GO molecular function enrichment for Hela cell proteins identified with acetylation stoichiometries in four quantiles less than 1%, 1%~5%, 5%~20%, more than 20%. Supplementary Tables Table S1. Identification of Lysine acetylation stoichiometries in HeLa cells with no treatment. All lysines in each peptide are designated a site number that is sequentially ordered from the peptide N-terminal to the peptide C-terminal with the lysine closest to peptide N-terminal as K1. Table S2. Identification of Lysine acetylation stoichiometries in HeLa cells treated with sodium butyrate. All lysines in each peptide are designated a site number that is sequentially ordered from the peptide N-terminal to the peptide C-terminal with the lysine closest to peptide N- terminal as K1. Supplementary Note 1. Software usage and the specific description of the mathematical model. S3

Figure S1. S4

Figure S2. S5

Figure S3. S6

Figure S4. S7

Figure S5. S8

Figure S6. S9

Figure S7. S10

Supplementary Note 1. Software usage and specific description of the mathematical model. a. The usage of StoichAnalyzer software - Operating environment: Linux or Unix operating system with the following software installed Zlib, Perl, C++ compiler - Step by step instructions: 1. Download the latest release file (.tar) from https://github.com/achievemn01/ptmquant-analysis.git. 2. Uncompress the file to a Target Directory. Four directories ( MS1C, MS2C, MS1- perl, MS2-perl ) and a bash script (stoichanalyzer.sh) will be generated. 3. Compile the MS1C program by entering MS1C directory and executing command make. 4. Compile the MS2C program by entering MS2C directory and executing command make. 5. Create a Project Directory in the same Target Directory to store mzxml files and Maxquant search results. 6. Upload mzxml files as well as evidence.txt and msms.txt to the Project Directory. 7. Copy stoichanalyzer.sh to the project directory and execute directly. 8. The outputs are saved as Output-1K.txt, Output-2K.txt, Output-3K.txt and Output-4K.txt. b. Description of the algorithm and mathematical model. This document states the mathematical relation between intensities of fragments, the amounts of each acetyl isotope combinations, and the total stoichiometry of each acetyl site among the precursors of same mass. N-terminal acetyl sites are excluded out here. Only acetyl sites on Lysine are concerned in the following. The cases of peptides containing 2 to 4 acetyl sites are discussed separately. For a peptide containing 2 acetyl sites, the acetyl sites in order from N-terminal to C-terminal are named as α, β. Its three possible MS1 peaks in the acetyl isotope assembly are named as I 1, I 2, I 3 from light to heavy. Here I 1 maps to isotope composition α L β L ; I 2 maps to α L β H and α H β L ; I 3 maps to α H β H. The intensities of the fragments of I 2 formed by breaking precursors between two acetyl sites reveals the ratio of the amounts of α L β H, α H β L. Theoretically it can be expressed in mathematical form: the intensity of b (j) ion light the intensity of b (j) ion heavy the intensity of y (n j)ion heavy amount of α Lβ H r the intensity of y (n j) ion light amount of α H β 1 L where j is equal or larger than the position of site α but smaller than the position of site β. n is the total number of amino acids in the peptide. S11 eq.1

The first two fractions in eq.1 are the observances in the spectrum and can be pooled by putting these observances into the linear regression model and the slope of regression line is the ratio, more robust in statistics. The third fraction in the above is the ratio of the amount of each acetyl isotope combination. In other aspect, it can be changed to the stoichiometry of each acetyl site. stoi_α L,I2 stoi_β L,I2 amount of α L β H amount of α L β H + amount of α H β L r 1 1 + r 1 amount of α H β L 1 amount of α L β H + amount of α H β L 1 + r 1 The math forms represent conditional probabilities: in the group of peptides whose isotope composition is 1L1H, or say correspondent to MS1 peak I 2, the probability of finding site α lightisotope acetylated is stoi_α L,I2. Similarly for stoi_ β L,I2. With Bayesian theorem, the total occupancy of each site can be calculated out as: eq.2 stoi_α L,total I 1 + I 2 stoi_α L,I2 I 1 + I 2 + I 3 stoi_β L,total I 1 + I 2 stoi_β L,I2 I 1 + I 2 + I 3 For a peptide containing 3 acetyl sites, besides the lightest and heaviest MS1 peaks, there are another two in the middle: I 2 and I 3. With the intensities of the fragments of I 2 formed by breaking precursors between two acetyl sites, we may get the ratios of the amounts of α L β L γ H, α L β H γ L, α H β L γ L and the ratios of occupancies: α L /α H, β L /β H, γ L /γ H, and δ L /δ H via solving the joint equations (eq.4a & eq.4b) although the relation between the observances and the amount of each acetyl isotope combination turns more complicated than equation 1. the intensity of b (j2) ion light the intensity of b (j2) ion heavy the intensity of y (n j2)ion heavy r the intensity of y (n j2) ion light 1 amount of α Lβ L γ H + amount of α L β H γ L amount of α H β L γ L the intensity of b (j3) ion light the intensity of b (j3) ion heavy the intensity of y (n j3)ion heavy r the intensity of y (n j3) ion light 2 amount of α L β L γ H amount of α L β H γ L + amount of α H β L γ L eq.3 eq.4a & eq.4b S12

where j2 is equal or larger than the position of site α but smaller than the position of site β. j3 is equal or larger than the position of site β but smaller than the position of site γ. n is the total number of amino acids in the peptide. Another view on eq.4 is: amount of α L amount of α H r 1 amount of γ L 1 amount of γ H r 2 With simple algebra deduction we get: amount of β L amount of β H amount of α Lβ L γ H + amount of α H β L γ L amount of α L β H γ L r 1r 2 + 2r 2 + 1 r 1 r 2 Similar to equation 2, we obtain stoi_α L,I2, stoi_β L,I2 and stoi_γ L,I2 with eq.5. eq.5a & eq.5b eq.5c With the intensities of the fragments of I 3 formed by breaking precursors between two acetyl sites, we may get the ratio of the amounts of α L β H γ H, α H β L γ H, α H β H γ L via solving the joint equations. (eq.6a & eq.6b) the intensity of b (j2) ion light the intensity of b (j2) ion heavy the intensity of y (n j2)ion heavy r the intensity of y (n j2) ion light 1 amount of α L β H γ H amount of α H β L γ H + amount of α H β H γ L the intensity of b (j3) ion light the intensity of b (j3) ion heavy the intensity of y (n j3)ion heavy r the intensity of y (n j3) ion light 2 amount of α Lβ H γ H + amount of α H β L γ H amount of α H β H γ L where j2, j3, and n have the same definition as they have in eq.4 eq.6a & eq.6b The mathematical forms of eq.5a and eq.5b still hold in the case of I 3. However, eq. 5c does not hold here. Instead, we have: amount of β L amount of β H amount of α H β L γ H amount of α L β H γ H + amount of α H β H γ L r 2 r 1 r 1 r 2 + 2r 1 + 1 eq.7c S13

Similar to equation 2, we obtain stoi_α L,I3, stoi_β L,I3 and stoi_γ L,I3 with eq.7. (or say eq.5a, eq.5b and eq.7c.) With conditional probabilities: stoi_α L,I2, stoi_β L,I2, stoi_γ L,I2, stoi_α L,I3, stoi_β L,I3 and stoi_γ L,I3 and Bayesian theorem, the total occupancy of each site can be calculated out. Here we only list stoi_α L,total as an example. stoi_α L,total I 1 + I 2 stoi_α L,I2 + I 3 stoi_α L,I3 I 1 + I 2 + I 3 + I 4 For a peptide containing 4 acetyl sites, there are three MS1 peaks in the middle: I 2, I 3, and I 4. Considering more complexity in acetyl isotope combination symbols and mathematical forms, we simplify the notations by letting a,b,c,d to represent the amounts of α L β L γ L δ H, α L β L γ H δ L, α L β H γ L δ L, α H β L γ L δ L in the analysis of I 2 respectively; letting a,b,c,d,e,f to represent the amounts of α L β L γ H δ H, α L β H γ L δ H, α L β H γ H δ L, α H β L γ L δ H, α H β L γ H δ L, α H β H γ L δ L in the analysis of I 3 respectively; again a,b,c,d to represent the amounts of α L β H γ H δ H, α H β L γ H δ H, α H β H γ L δ H, α H β H γ H δ L in the analysis of I 4 respectively. In analyzing I 2, similar to equation 4, the ratios are obtained from the regression model of the observances. The ratios also can be expressed in term of the amounts of acetyl isotope combinations as follows. a + b + c d r 1 a + b c + d r 2 a b + c + d r 3 Here we have three known: r 1, r 2, r 3 ; four unknown: a, b, c, d; and three equations in the above. But what we really want to get is the ratios between the unknown instead of unknown themselves. By letting b b/a, c c/a, d d/a and arranging the three equations to make all unknown variables (except a) on the left of the equations, we obtain the three new joint equations in terms of fewer variables. eq.8 eq.9 1 r 1 (b + c ) + d 1 r 1 1 r 2 b + c + d 1 r 2 S14

b + c + d 1 r 3 eq.10 Then solve the joint equations by applying Kramer s rule. Similar to equation 5, we have amount of α L a + b + c r amount of α H d 1 amount of β L a + b + d 1 + b + d amount of β H c c amount of γ L a + c + d 1 + c + d amount of γ H b b amount of δ L b + c + d 1 amount of δ H a r 3 eq.11 Similar to equation 2, we obtain stoi_α L,I2, stoi_β L,I2, stoi_γ L,I2 and stoi_δ L,I2 with eq.11. In analyzing I 3, similar to equation 9, the ratios are obtained from the regression model of the observances. The difference is that there are three possible cases (containing 0, 1, 2 heavy isotopes) for b ions formed in breaking precursors between site β and site γ. These three mass levels make two equations. (In I 2 analysis, two mass levels make an equation.) The ratios are expressed in term of the amounts of acetyl isotope combinations as follows. the intensity of b (j2) ion light the intensity of b (j2) ion heavy the intensity of y (n j2)ion heavy a + b + c the intensity of y (n j2) ion light d + e + f r 1 the intensity of b (j3) ion lightest the intensity of b (j3) ion heaviest the intensity of y (n j3)ion heaviest the intensity of y (n j3) ion lightest a f r 2 the intensity of b (j3) ion medium the intensity of b (j3) ion heaviest the intensity of y (n j3)ion medium b + c + d + e r the intensity of y (n j3) ion lighest f 3 the intensity of b (j4) ion light the intensity of b (j4) ion heavy the intensity of y (n j4)ion heavy a + b + d the intensity of y (n j4) ion light c + e + f r 4 eq.12 where j2 is equal or larger than the position of site α but smaller than the position of site β. j3 is equal or larger than the position of site β but smaller than the position of site γ.. j4 is S15

equal or larger than the position of site γ but smaller than the position of site δ. n is the total number of amino acids in the peptide. Obviously, with 4 constraints (equations), we cannot specify (determine) the ratios of 6 unknown. In other words, the ratios of the amount of each combination correspondent to I 3 are theoretically insolvable with only MS2 information. However, the ratios of occupancies α L /α H, β L /β H, γ L /γ H, and δ L /δ H may still be solvable. We skip the deduction process here and list the results below. amount of α L a + b + c r amount of α H d 1 amount of β L a + d + e amount of β H b + c + f r 1r 2 + 2r 2 + r 3 r 1 r 1 r 3 r 2 + 2r 1 + 1 amount of γ L b + d + f amount of γ H a + c + e (r 3 + 1)(r 4 + 1) (r 2 + r 3 r 4 ) r 2 (r 4 + 1) + (r 2 + r 3 r 4 ) amount of δ L c + e + f amount of δ H a + b + d 1 r 4 Please notice that there are more than one correct math expressions for the above. Then we obtain stoi_α L,I3, stoi_β L,I3, stoi_γ L,I3 and stoi_δ L,I3 from eq.13. eq.13 The analysis of I 4 is similar to the analysis of I 2, the ratios are obtained from the regression model of the observances. The ratios also can be expressed in term of the amounts of acetyl isotope combinations as follows. a b + c + d r 1 a + b c + d r 2 a + b + c d r 3 eq.14 By letting b b/a, c c/a, d d/a and arranging the three equations to make all unknown variables (except a) on the left of the equations, we obtain the three new joint equations in terms of fewer variables. b + c + d 1 r 1 S16

1 r 2 b + c + d 1 r 2 1 r 3 (b + c ) + d 1 r 3 eq.15 Then solve the joint equations by applying Kramer s rule. Similar to equation 11, we have amount of α L a amount of α H b + c + d r 1 amount of β L amount of β H amount of γ L amount of γ H amount of δ L amount of δ H b a + c + d c a + b + d d b 1 + c + d c 1 + b + d a + b + c 1 r 3 eq.16 Then we obtain stoi_α L,I4, stoi_β L,I4, stoi_γ L,I4 and stoi_δ L,I4 from eq.16. Finally, with the obtained conditional probabilities and Bayesian theorem, the total occupancy of each site can be calculated out. Here we only list stoi_α L,total as an example. stoi_α L,total I 1 + I 2 stoi_α L,I2 + I 3 stoi_α L,I3 + I 4 stoi_α L,I4 I 1 + I 2 + I 3 + I 4 + I 5 eq.17 S17