Site-specific Identification of Lysine Acetylation Stoichiometries in Mammalian Cells

Supplementary Information Site-specific Identification of Lysine Acetylation Stoichiometries in Mammalian Cells Tong Zhou 1, 2, Ying-hua Chung 1, 2, Jianji Chen 1, Yue Chen 1 1. Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, MN 55455, USA 2. These authors contributed equally to this work Correspondence: Dr. Yue Chen (YueChen@umn.edu) Supplementary Figure S1-S5 Supplementary Table T1-T2 Supplementary Note 1 S1

Supplementary Figures Figure S1. Ah-NHS synthesis and labeling reactions. (A) Synthesis reactions for Ah-NHS starting from sodium acetate- 13 CD 3. (B) The labeling reactions of Ac-NHS or AcOAc at lysine - NH2 or/and -NH2. Figure S2. Co-eluting profiles of four peptides with different isotope-labeled acetyl groups. Synthetic peptides (structure shown in Figure 4) containing different numbers of light acetyl groups were balanced with heavy acetyl groups. They were mixed and injected into LCMS after desalting. After balancing, peptide 1 has a molecular weight of 745.94, peptides 2 and 3 have MW of 748.46, peptides 4 and 5 have MW of 750.97, peptides 6 and 7 have MW of 753.48. Figure S3. BSA spike-in validation experiment. In vitro chemical acetylated BSA was mixed at different heavy-to-light ratios to mimic 50%, 10% and 1% stoichiometries and spiked-in with Hela whole cell lysate stoichiometry analysis at a roughly 1:50 ratio (BSA/Hela proteins, w/w). Three BSA peptides were detected and their acetylation stoichiometry were quantified with StoichAnalyzer software. Figure S4. Schematic illustration of false positive peak selection and quantification in acetylation stoichiometry analysis. (A) Precursor ion selection of light acetyl peptides may be interfered with the isotope peaks from other co-eluting species and the stoichiometry analysis data become unreliable. By implementing deconvolution and deisotoping, the software removes these false positive peak selections. (B) The identified heavy acetylated peptides may serve as internal standards for the software to correct the mass error in each spectrum and accurately select corresponding light acetylated peptide peaks. (C) The software applies a filtering step based on database search results to remove incorrectly selected peak pairs resulting from other co-eluting peptides. S2

Figure S5. Reproducibility of two biological replicate experiments for untreated HeLa protein stoichiometry analysis. Figure S6. MS and MS/MS spectra of a two-lys containing peptide. Precursor ion spectrum is shown in the upper panel. The annotated fragmentation spectrum of the middle precursor ion containing one heavy and one light acetyl group is shown in the lower panel. Figure S7. Clustering analysis of GO molecular function enrichment for Hela cell proteins identified with acetylation stoichiometries in four quantiles less than 1%, 1%~5%, 5%~20%, more than 20%. Supplementary Tables Table S1. Identification of Lysine acetylation stoichiometries in HeLa cells with no treatment. All lysines in each peptide are designated a site number that is sequentially ordered from the peptide N-terminal to the peptide C-terminal with the lysine closest to peptide N-terminal as K1. Table S2. Identification of Lysine acetylation stoichiometries in HeLa cells treated with sodium butyrate. All lysines in each peptide are designated a site number that is sequentially ordered from the peptide N-terminal to the peptide C-terminal with the lysine closest to peptide N- terminal as K1. Supplementary Note 1. Software usage and the specific description of the mathematical model. S3

Figure S1. S4

Figure S2. S5

Figure S3. S6

Figure S4. S7

Figure S5. S8

Figure S6. S9

Figure S7. S10

Supplementary Note 1. Software usage and specific description of the mathematical model. a. The usage of StoichAnalyzer software - Operating environment: Linux or Unix operating system with the following software installed Zlib, Perl, C++ compiler - Step by step instructions: 1. Download the latest release file (.tar) from https://github.com/achievemn01/ptmquant-analysis.git. 2. Uncompress the file to a Target Directory. Four directories ( MS1C, MS2C, MS1- perl, MS2-perl ) and a bash script (stoichanalyzer.sh) will be generated. 3. Compile the MS1C program by entering MS1C directory and executing command make. 4. Compile the MS2C program by entering MS2C directory and executing command make. 5. Create a Project Directory in the same Target Directory to store mzxml files and Maxquant search results. 6. Upload mzxml files as well as evidence.txt and msms.txt to the Project Directory. 7. Copy stoichanalyzer.sh to the project directory and execute directly. 8. The outputs are saved as Output-1K.txt, Output-2K.txt, Output-3K.txt and Output-4K.txt. b. Description of the algorithm and mathematical model. This document states the mathematical relation between intensities of fragments, the amounts of each acetyl isotope combinations, and the total stoichiometry of each acetyl site among the precursors of same mass. N-terminal acetyl sites are excluded out here. Only acetyl sites on Lysine are concerned in the following. The cases of peptides containing 2 to 4 acetyl sites are discussed separately. For a peptide containing 2 acetyl sites, the acetyl sites in order from N-terminal to C-terminal are named as α, β. Its three possible MS1 peaks in the acetyl isotope assembly are named as I 1, I 2, I 3 from light to heavy. Here I 1 maps to isotope composition α L β L ; I 2 maps to α L β H and α H β L ; I 3 maps to α H β H. The intensities of the fragments of I 2 formed by breaking precursors between two acetyl sites reveals the ratio of the amounts of α L β H, α H β L. Theoretically it can be expressed in mathematical form: the intensity of b (j) ion light the intensity of b (j) ion heavy the intensity of y (n j)ion heavy amount of α Lβ H r the intensity of y (n j) ion light amount of α H β 1 L where j is equal or larger than the position of site α but smaller than the position of site β. n is the total number of amino acids in the peptide. S11 eq.1

The first two fractions in eq.1 are the observances in the spectrum and can be pooled by putting these observances into the linear regression model and the slope of regression line is the ratio, more robust in statistics. The third fraction in the above is the ratio of the amount of each acetyl isotope combination. In other aspect, it can be changed to the stoichiometry of each acetyl site. stoi_α L,I2 stoi_β L,I2 amount of α L β H amount of α L β H + amount of α H β L r 1 1 + r 1 amount of α H β L 1 amount of α L β H + amount of α H β L 1 + r 1 The math forms represent conditional probabilities: in the group of peptides whose isotope composition is 1L1H, or say correspondent to MS1 peak I 2, the probability of finding site α lightisotope acetylated is stoi_α L,I2. Similarly for stoi_ β L,I2. With Bayesian theorem, the total occupancy of each site can be calculated out as: eq.2 stoi_α L,total I 1 + I 2 stoi_α L,I2 I 1 + I 2 + I 3 stoi_β L,total I 1 + I 2 stoi_β L,I2 I 1 + I 2 + I 3 For a peptide containing 3 acetyl sites, besides the lightest and heaviest MS1 peaks, there are another two in the middle: I 2 and I 3. With the intensities of the fragments of I 2 formed by breaking precursors between two acetyl sites, we may get the ratios of the amounts of α L β L γ H, α L β H γ L, α H β L γ L and the ratios of occupancies: α L /α H, β L /β H, γ L /γ H, and δ L /δ H via solving the joint equations (eq.4a & eq.4b) although the relation between the observances and the amount of each acetyl isotope combination turns more complicated than equation 1. the intensity of b (j2) ion light the intensity of b (j2) ion heavy the intensity of y (n j2)ion heavy r the intensity of y (n j2) ion light 1 amount of α Lβ L γ H + amount of α L β H γ L amount of α H β L γ L the intensity of b (j3) ion light the intensity of b (j3) ion heavy the intensity of y (n j3)ion heavy r the intensity of y (n j3) ion light 2 amount of α L β L γ H amount of α L β H γ L + amount of α H β L γ L eq.3 eq.4a & eq.4b S12

where j2 is equal or larger than the position of site α but smaller than the position of site β. j3 is equal or larger than the position of site β but smaller than the position of site γ. n is the total number of amino acids in the peptide. Another view on eq.4 is: amount of α L amount of α H r 1 amount of γ L 1 amount of γ H r 2 With simple algebra deduction we get: amount of β L amount of β H amount of α Lβ L γ H + amount of α H β L γ L amount of α L β H γ L r 1r 2 + 2r 2 + 1 r 1 r 2 Similar to equation 2, we obtain stoi_α L,I2, stoi_β L,I2 and stoi_γ L,I2 with eq.5. eq.5a & eq.5b eq.5c With the intensities of the fragments of I 3 formed by breaking precursors between two acetyl sites, we may get the ratio of the amounts of α L β H γ H, α H β L γ H, α H β H γ L via solving the joint equations. (eq.6a & eq.6b) the intensity of b (j2) ion light the intensity of b (j2) ion heavy the intensity of y (n j2)ion heavy r the intensity of y (n j2) ion light 1 amount of α L β H γ H amount of α H β L γ H + amount of α H β H γ L the intensity of b (j3) ion light the intensity of b (j3) ion heavy the intensity of y (n j3)ion heavy r the intensity of y (n j3) ion light 2 amount of α Lβ H γ H + amount of α H β L γ H amount of α H β H γ L where j2, j3, and n have the same definition as they have in eq.4 eq.6a & eq.6b The mathematical forms of eq.5a and eq.5b still hold in the case of I 3. However, eq. 5c does not hold here. Instead, we have: amount of β L amount of β H amount of α H β L γ H amount of α L β H γ H + amount of α H β H γ L r 2 r 1 r 1 r 2 + 2r 1 + 1 eq.7c S13

Similar to equation 2, we obtain stoi_α L,I3, stoi_β L,I3 and stoi_γ L,I3 with eq.7. (or say eq.5a, eq.5b and eq.7c.) With conditional probabilities: stoi_α L,I2, stoi_β L,I2, stoi_γ L,I2, stoi_α L,I3, stoi_β L,I3 and stoi_γ L,I3 and Bayesian theorem, the total occupancy of each site can be calculated out. Here we only list stoi_α L,total as an example. stoi_α L,total I 1 + I 2 stoi_α L,I2 + I 3 stoi_α L,I3 I 1 + I 2 + I 3 + I 4 For a peptide containing 4 acetyl sites, there are three MS1 peaks in the middle: I 2, I 3, and I 4. Considering more complexity in acetyl isotope combination symbols and mathematical forms, we simplify the notations by letting a,b,c,d to represent the amounts of α L β L γ L δ H, α L β L γ H δ L, α L β H γ L δ L, α H β L γ L δ L in the analysis of I 2 respectively; letting a,b,c,d,e,f to represent the amounts of α L β L γ H δ H, α L β H γ L δ H, α L β H γ H δ L, α H β L γ L δ H, α H β L γ H δ L, α H β H γ L δ L in the analysis of I 3 respectively; again a,b,c,d to represent the amounts of α L β H γ H δ H, α H β L γ H δ H, α H β H γ L δ H, α H β H γ H δ L in the analysis of I 4 respectively. In analyzing I 2, similar to equation 4, the ratios are obtained from the regression model of the observances. The ratios also can be expressed in term of the amounts of acetyl isotope combinations as follows. a + b + c d r 1 a + b c + d r 2 a b + c + d r 3 Here we have three known: r 1, r 2, r 3 ; four unknown: a, b, c, d; and three equations in the above. But what we really want to get is the ratios between the unknown instead of unknown themselves. By letting b b/a, c c/a, d d/a and arranging the three equations to make all unknown variables (except a) on the left of the equations, we obtain the three new joint equations in terms of fewer variables. eq.8 eq.9 1 r 1 (b + c ) + d 1 r 1 1 r 2 b + c + d 1 r 2 S14

b + c + d 1 r 3 eq.10 Then solve the joint equations by applying Kramer s rule. Similar to equation 5, we have amount of α L a + b + c r amount of α H d 1 amount of β L a + b + d 1 + b + d amount of β H c c amount of γ L a + c + d 1 + c + d amount of γ H b b amount of δ L b + c + d 1 amount of δ H a r 3 eq.11 Similar to equation 2, we obtain stoi_α L,I2, stoi_β L,I2, stoi_γ L,I2 and stoi_δ L,I2 with eq.11. In analyzing I 3, similar to equation 9, the ratios are obtained from the regression model of the observances. The difference is that there are three possible cases (containing 0, 1, 2 heavy isotopes) for b ions formed in breaking precursors between site β and site γ. These three mass levels make two equations. (In I 2 analysis, two mass levels make an equation.) The ratios are expressed in term of the amounts of acetyl isotope combinations as follows. the intensity of b (j2) ion light the intensity of b (j2) ion heavy the intensity of y (n j2)ion heavy a + b + c the intensity of y (n j2) ion light d + e + f r 1 the intensity of b (j3) ion lightest the intensity of b (j3) ion heaviest the intensity of y (n j3)ion heaviest the intensity of y (n j3) ion lightest a f r 2 the intensity of b (j3) ion medium the intensity of b (j3) ion heaviest the intensity of y (n j3)ion medium b + c + d + e r the intensity of y (n j3) ion lighest f 3 the intensity of b (j4) ion light the intensity of b (j4) ion heavy the intensity of y (n j4)ion heavy a + b + d the intensity of y (n j4) ion light c + e + f r 4 eq.12 where j2 is equal or larger than the position of site α but smaller than the position of site β. j3 is equal or larger than the position of site β but smaller than the position of site γ.. j4 is S15

equal or larger than the position of site γ but smaller than the position of site δ. n is the total number of amino acids in the peptide. Obviously, with 4 constraints (equations), we cannot specify (determine) the ratios of 6 unknown. In other words, the ratios of the amount of each combination correspondent to I 3 are theoretically insolvable with only MS2 information. However, the ratios of occupancies α L /α H, β L /β H, γ L /γ H, and δ L /δ H may still be solvable. We skip the deduction process here and list the results below. amount of α L a + b + c r amount of α H d 1 amount of β L a + d + e amount of β H b + c + f r 1r 2 + 2r 2 + r 3 r 1 r 1 r 3 r 2 + 2r 1 + 1 amount of γ L b + d + f amount of γ H a + c + e (r 3 + 1)(r 4 + 1) (r 2 + r 3 r 4 ) r 2 (r 4 + 1) + (r 2 + r 3 r 4 ) amount of δ L c + e + f amount of δ H a + b + d 1 r 4 Please notice that there are more than one correct math expressions for the above. Then we obtain stoi_α L,I3, stoi_β L,I3, stoi_γ L,I3 and stoi_δ L,I3 from eq.13. eq.13 The analysis of I 4 is similar to the analysis of I 2, the ratios are obtained from the regression model of the observances. The ratios also can be expressed in term of the amounts of acetyl isotope combinations as follows. a b + c + d r 1 a + b c + d r 2 a + b + c d r 3 eq.14 By letting b b/a, c c/a, d d/a and arranging the three equations to make all unknown variables (except a) on the left of the equations, we obtain the three new joint equations in terms of fewer variables. b + c + d 1 r 1 S16

1 r 2 b + c + d 1 r 2 1 r 3 (b + c ) + d 1 r 3 eq.15 Then solve the joint equations by applying Kramer s rule. Similar to equation 11, we have amount of α L a amount of α H b + c + d r 1 amount of β L amount of β H amount of γ L amount of γ H amount of δ L amount of δ H b a + c + d c a + b + d d b 1 + c + d c 1 + b + d a + b + c 1 r 3 eq.16 Then we obtain stoi_α L,I4, stoi_β L,I4, stoi_γ L,I4 and stoi_δ L,I4 from eq.16. Finally, with the obtained conditional probabilities and Bayesian theorem, the total occupancy of each site can be calculated out. Here we only list stoi_α L,total as an example. stoi_α L,total I 1 + I 2 stoi_α L,I2 + I 3 stoi_α L,I3 + I 4 stoi_α L,I4 I 1 + I 2 + I 3 + I 4 + I 5 eq.17 S17