Nature Structural and Molecular Biology: doi: /nsmb Supplementary Figure 1

Similar documents
Networks & pathways. Hedi Peterson MTAT Bioinformatics

CSCE555 Bioinformatics. Protein Function Annotation

BMD645. Integration of Omics

DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Androgen-independent prostate cancer

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

Proteomics. Areas of Interest

Predicting Protein Functions and Domain Interactions from Protein Interactions

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

A Database of human biological pathways

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Computational methods for predicting protein-protein interactions

Tandem Mass Spectrometry: Generating function, alignment and assembly

Measuring quaternary structure similarity using global versus local measures.

SUPPLEMENTARY INFORMATION

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Proteome-wide label-free quantification with MaxQuant. Jürgen Cox Max Planck Institute of Biochemistry July 2011

SUPPLEMENTAL DATA - 1. This file contains: Supplemental methods. Supplemental results. Supplemental tables S1 and S2. Supplemental figures S1 to S4

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Hands-On Nine The PAX6 Gene and Protein

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Towards Detecting Protein Complexes from Protein Interaction Data

Supplementary Figure 1

Comparison of Human Protein-Protein Interaction Maps

Supplementary Figure 1. The workflow of the KinomeXplorer web interface.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

SUPPLEMENTARY INFORMATION

Basic Local Alignment Search Tool

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

PNmerger: a Cytoscape plugin to merge biological pathways and protein interaction networks

Supplementary Information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

High-Throughput Protein Quantitation Using Multiple Reaction Monitoring

Nature Genetics: doi: /ng Supplementary Figure 1. The phenotypes of PI , BR121, and Harosoy under short-day conditions.

Site-specific Identification of Lysine Acetylation Stoichiometries in Mammalian Cells

Robust Community Detection Methods with Resolution Parameter for Complex Detection in Protein Protein Interaction Networks

identifiers matched to homologous genes. Probeset annotation files for each array platform were used to

Supplementary Figure 1 Crystal packing of ClR and electron density maps. Crystal packing of type A crystal (a) and type B crystal (b).

Comparing whole genomes

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Key questions of proteomics. Bioinformatics 2. Proteomics. Foundation of proteomics. What proteins are there? Protein digestion

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Supplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Supplementary Materials for

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

*Equal contribution Contact: (TT) 1 Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv

BIOINFORMATICS LAB AP BIOLOGY

Interfacing chemical biology with the -omic sciences and systems biology

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Protein Quantitation II: Multiple Reaction Monitoring. Kelly Ruggles New York University

Estimation of Identification Methods of Gene Clusters Using GO Term Annotations from a Hierarchical Cluster Tree

86 Part 4 SUMMARY INTRODUCTION

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

In-Depth Assessment of Local Sequence Alignment

A New Similarity Measure among Protein Sequences

Supplementary Figure 1: To test the role of mir-17~92 in orthologous genetic model of ADPKD, we generated Ksp/Cre;Pkd1 F/F (Pkd1-KO) and Ksp/Cre;Pkd1

Supplementary Figure 1 The number of differentially expressed genes for uniparental males (green), uniparental females (yellow), biparental males

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Workflow concept. Data goes through the workflow. A Node contains an operation An edge represents data flow The results are brought together in tables

Carri-Lyn Mead Thursday, January 13, 2005 Terry Fox Laboratory, Dr. Dixie Mager

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Supplemental material

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

Nature Methods: doi: /nmeth Supplementary Figure 1. Fragment indexing allows efficient spectra similarity comparisons.

Introduction to Bioinformatics

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

Modeling Mass Spectrometry-Based Protein Analysis

Integration of functional genomics data

PDB : 1ZTB Ligand : ZINC

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

SUPPLEMENTARY MATERIALS

Graph Alignment and Biological Networks

Mixture models for analysing transcriptome and ChIP-chip data

Analyzing Microarray Time course Genome wide Data

Table S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants. GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg

Motif Prediction in Amino Acid Interaction Networks

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

HOWTO, example workflow and data files. (Version )

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Supplemental Information. The Mitochondrial Fission Receptor MiD51. Requires ADP as a Cofactor

Amine specific Labeling Reagents for Multiplexed Relative and Absolute Protein Quantitation

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Ensemble Non-negative Matrix Factorization Methods for Clustering Protein-Protein Interactions

Workshop: SILAC and Alternative Labeling Strategies in Quantitative Proteomics

Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

Comparative Genomics II

Chapter 15 Active Reading Guide Regulation of Gene Expression

Computational approaches for functional genomics

Computational Methods for Mass Spectrometry Proteomics

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018

Learning in Bayesian Networks

Transcription:

Supplementary Figure 1 SUMOylation of proteins changes drastically upon heat shock, MG-132 treatment and PR-619 treatment. (a) Schematic overview of all SUMOylation proteins identified to be differentially regulated after PR-619 treatment, using Label Free Quantification (LFQ). (b) As (a), but for MG-132 treatment. (c) As (a), but for heat shock. (d) Principle Component Analysis on all 17 biological replicates, revealing correlation between all samples. (e) Pearson correlation (R) analysis between all biological replicates. Pearson correlation is shown for both proteins (LFQ) and sites. Internal refers to correlation between same-condition samples, and external refers to correlation between different-condition samples. Error bars represent standard deviation. Average R values for internal correlation are shown below each condition. (f) Hierarchical Cluster analysis on all biological replicates. Samples are grouped based on correlation. Protein LFQ intensities were Z- scored prior to clustering. Green is indicative of upregulation of SUMOylation, and red is indicative of downregulation of SUMOylation.

Supplementary Figure 2 Validation of the lysine-deficient SUMO-2 mutant and confirmation of new SUMO-2 targets. (a) Immunoblot analysis (IB) of total lysates and His10-pulldown samples (PD) from HeLa cells expressing His10-SUMO-2 wild-type (WT) or lysine-deficient (K0) mutant, which were mock treated, or treated with heat shock, PR-619 or MG-132. Ponceau-S staining on total lysates is shown as a loading control. The experiment shown was replicated in biological duplicate. (b) Immunoblot analysis of total lysates and His10-pulldown samples from HeLa cells or HeLa cells expressing His10-SUMO-2. The experiment shown was performed in biological duplicate. (c) Similar to (b), but additionally using either mock treated or PR-619 treated cells. PR-619 LFQ ratios are indicated below their respective antibodies. The experiment shown was performed in biological duplicate.

Supplementary Figure 3 A comparison of SUMOylated proteins and sites identified in this work to proteins and sites previously identified in other SUMOylation studies. (a) Schematic overview of SUMOylated proteins identified in this study, as compared to previously published studies. From Golebiowski et al. 1, control SUMOylated proteins and heat shock SUMOylated proteins were classed separately. From Bruderer et al. 2, all 3 SUMOylation proteins and molecular-weight corrected SUMOylated proteins were classed separately. From Becker et al., SUMO-2 4 modified proteins and all SUMO-1 and SUMO-2 modified proteins were classed separately. From Tammsalu et al., proteins

SUMOylated after heat shock were compared with proteins SUMOylated after heat shock in this study, and separately with proteins identified under all other conditions. (b) Schematic overview of SUMOylation sites identified in this study, as compared to Matic et al. 5, Schimmel et al. 6, and all known SUMOylation sites from PhosphoSitePlus (PSP). Sites identified after heat shock by Tammsalu et al. were compared to sites identified after heat shock in this study, as well as sites detected under standard growth conditions and sites detected after MG-132 or PR-619 treatment in this study.

Supplementary Figure 4 SUMOylation occurs on phylogenetically highly conserved proteins. (a) Phylogenetic conservation analysis (within orthologues) of SUMOylation, as compared to other major PTMs ubiquitylation, acetylation, methylation and phosphorylation, as well as all human proteins (total). The conservation line descends by average percentage of conservation from human to 62 Ensembl-annotated eukaryotic organisms. The analysis was performed within orthologues, and only genes that have an orthologue were considered on a per-organism basis. P values for significance of difference between all datasets are indicated, and were determined by two-tailed paired Student s t test. (b) Phylogenetic conservation analysis outside of orthologues. Synonymous to (a), with the exception that genes lacking an orthologue were considered to be 0% conserved on a per-organism basis. P values for significance of difference between all datasets are indicated, and were determined by two-tailed paired Student s t test.

Supplementary Figure 5 The glutamate in the SUMOylation consensus motif of RanGAP1 is required for efficient SUMOylation, and Tel contains an inverted ExK SUMOylation site at Lys302. (a) Immunoblot analysis (IB) of total lysates from HeLa cells or HeLa cells expressing His10-SUMO-2 which were transfected with wildtype (WT) HA-RanGAP1, HA-RanGAP1 E526D, or a SUMOylation-deficient RanGAP1 (ΔGL). On the left side, S2 indicates SUMOmodified RanGAP1, and asterisks indicate non-specific bands. Ponceau-S staining is shown as a loading control. The experiment shown was replicated in biological duplicate. (b) Similar to (a), but using U2-OS cells. The experiment shown was replicated in biological duplicate. (c) Immunoblot analysis of His10-pulldown samples (PD) from HeLa cells or HeLa cells expressing His10-SUMO-2 which were transfected with the indicated HA-Tel constructs. On the left side, S2 indicates SUMO-modified Tel, and asterisks indicate non-specific bands. SART1 is shown as a pulldown control, and Ponceau-S staining is shown as a pulldown and loading control. HA-Tel-K11R was 7 used to monitor SUMOylation of K300, because Tel harbors a dominant major SUMOylation site at lysine-11. The experiment shown was replicated in biological triplicate. (d) Total lysates corresponding to (c), analyzed by SDS-PAGE and immunoblotting. Ponceau-S staining is shown as a loading control.

The experiment shown was replicated in biological triplicate.

Supplementary Figure 6 SUMO and other PTMs compete for modification of lysines in a set of highly SUMOylated and functionally interconnected proteins. (a) STRING network analysis of all proteins where SUMOylation and acetylation occur on at least one of the same lysines. STRING interaction confidence was set at 0.7 or greater, and the size and color of individual proteins corresponds to the amount of SUMOylation sites per protein. (b) As (a), but with SUMOylation and ubiquitylation occurring on at least one of the same lysines. (c) As (a), but with SUMOylation, acetylation and ubiquitylation occurring on at least one of the same lysines. (d) Schematic overview of the amount of SUMOylation sites identified per protein in total, or within the core STRING network (Fig. 6A), or within the three networks corresponding to (a), (b), and (c). Sites shared with the respective other PTMs are displayed in red. Additionally, the total amount of modification sites per protein for ubiquitylation and acetylation are shown. (e) Overview of STRING analyses depicted in (a), (b), and (c). Enrichment is a ratio derived from the observed amount of interactions divided by the expected amount of interactions. Connected refers to the percentage of input proteins connected to the core cluster. P values for all individual analyses are < 1E-15. (f) Overview of relative network score. This score was computed through multiplication of the interaction enrichment ratio, protein network connectivity, and the average STRING confidence of all interactions.

Supplementary Figure 7 Six interconnected networks of proteins modified by SUMO. MCODE clusters 4-9 corresponding to Fig. 6A. The size and color of the individual proteins corresponds to the amount of SUMOylation sites identified in the protein.

Supplementary Note Statistical validation of reproducibility Pearson correlations (R) were calculated with Perseus, using scatter plot analysis on the unfiltered site and protein data and individually comparing all biological replicates to each other on a per-site or per-protein basis. Internal correlation refers to correlation within samecondition replicates, and external correlation refers to correlation between different-condition samples. Principle Component Analysis was performed using Perseus, using the filtered and imputed data as input. For heatmap generation, the filtered and imputed data was Z-scored using Perseus, and then subjected to Euclidean hierarchical clustering. IceLogo generation For SUMOylation site analysis of all identified sites, amino acid sequence windows of 15 amino acids downstream as well as 15 amino acids upstream of the modified lysine were extracted from the corresponding proteins, generating 31 amino acid (31AA) sequence windows. IceLogo software version 1.2 8 was used to overlay 31AA SUMOylation site sequence windows in order to generate a consensus sequence, and compensated against the expected random occurrence frequencies of amino acids across all human proteins (icelogo). Alternatively, subsets of modification sites were compared directly to other subsets of modification sites, generating consensus sequences showing differential occurrence of amino acids between the subsets (SubLogo). Heat maps were generated in a similar fashion to IceLogos. For IceLogos, SubLogos and heat maps, all amino acids shown as enriched or depleted are significant with P < 0.05, as determined by IceLogo software version 1.2. Secondary structure analysis 31AA sequence windows corresponding to SUMOylation sites were analyzed using NetSurfP version 1.1 9 in order to predict localized protein surface accessibility and secondary structure. For each amino acid within the 31AA windows, probabilities for alpha-helix, betastrand, and coil were calculated. Additionally, amino acids were predicted to be buried or solvent-exposed, with a threshold of 25% exposure. In addition to calculating the properties of the central lysine, the average properties of the central lysine +/- 5 AA were calculated, as well as the average properties of the entire 31AA sequence windows. As a reference set, random selection of lysines from SUMOylated proteins was performed by dividing all SUMOylated proteins into 60 amino acid sequences, and selecting the first lysine in each sequence. 31AA sequence windows were assigned to half of these random sites (5,726 out

of 11,451). Duplicate sequence windows were discarded. The reference set was processed identically as compared to the SUMOylation sites set. SUMOylation and PTM site overlap analysis For comparative analysis, all 103 SUMOylation sites identified by Matic et al. 5, all 202 SUMOylation sites identified by Schimmel et al. 6, and all 1,002 SUMOylation sites identified by Tammsalu et al. 4 were assigned to matching Uniprot IDs and 31AA sequence windows were parsed. Furthermore, all unique human PhosphoSitePlus (PSP; PhosphoSitePlus, www.phosphosite.org, 10 ) SUMOylation sites were used (639), including MS/MS-identified sites as well as identifications made with low-throughput methodology. Additionally, 26,345 MS/MS-identified ubiquitylation sites, 7,463 acetylation sites, and 812 lysine-methylation sites were extracted from PSP, and 31AA sequence windows were assigned. From within each data set, duplicate sequence windows were removed. Perseus software was used to generate a matrix where all sequence windows from all PTMs were cross-referenced to each other. Corresponding parental proteins were assigned, and multiple modifications targeting the same lysines within the same proteins were further investigated using STRING network analysis. Term enrichment analysis Statistical enrichment analysis for protein and gene properties was performed using Perseus software. The human proteome was annotated with Gene Ontology terms 11, including Biological Processes (GOBP), Molecular Functions (GOMF), and Cellular Compartments (GOCC). Additional annotation was performed with the Kyoto Encyclopedia of Genes and Genomes (KEGG) 12, Protein families (Pfam) 13, Gene Set Enrichment Analysis (GSEA) 14, Keywords, and Comprehensive Resource of Mammalian Protein Complexes (CORUM) 15 terms for comparative enrichment analysis. SUMOylated proteins or subgroups of SUMOylated proteins were compared by annotation terms to the entire human proteome, using Fisher Exact Testing. Lysines modified by SUMO or other PTMs were compared against a background of the total amount of lysines in the human proteome, calculated from the average lysine occurrence multiplied by the amount of proteins and the average protein size in the human proteome, using Fisher Exact Testing. Benjamini and Hochberg FDR was applied to P values to correct for multiple hypotheses testing, and final corrected P values were filtered to be less than 2%. In the figures, categories are either ranked by the negative log 10 of the P value, or by a relative score calculated as follows: log 10 (P value) * (log 2 (enrichment ratio))^5. STRING network analysis

STRING network analysis was performed using the online STRING database 16, using all SUMOylated proteins or subgroups of SUMOylated proteins as input. Enrichment analysis was performed allowing network interactions at high or greater confidence (p>0.7). P values corresponding to the individual analyses were directly taken from the STRING database output. Protein interaction enrichment was performed based on the amount of interactions in the networks, as compared to the randomly expected amount of interactions, with both variables directly derived from the STRING database output. Network participation was measured as a percentage of the proteins which was connected to the core cluster. The total network interconnectivity score was calculated by multiplication of the interaction enrichment, the network participation, and the average STRING confidence of all individual interactions. Visualization of interaction networks was performed using Cytoscape version 3.0.2 17, and highly interconnected sub-clusters were localized using the Cytoscape plugin Molecular Complex Detection (MCODE) version 1.4.0-beta2 18. For sub-cluster localization the following settings were used; a degree cut-off of 3, a node score cut-off of 0.1, a maximum depth of 2, a K-Core of 5, and haircut. Phylogenetic conservation analysis Perseus software was used to annotate the human proteome using phylogenetic conservation scores from the Ensembl Database 19, which contains phylogenetic orthologous information ranging over many eukaryotic species. Only transcripts and genes marked as known were extracted from the Ensembl Database, and only unique results were retrieved. Phylogenetic conservation target scores from 62 eukaryotic species were mapped by their human Ensembl Protein ID to human Uniprot IDs. Subsequently, proteins identified as SUMOylated in this work under standard growth conditions were aligned to the annotated human proteome. All MS/MS-identified phosphorylation, acetylation, ubiquitylation and methylation sites were extracted from PSP, mapped to their respective parental proteins, and aligned to the annotated human proteome. Phylogenetic conservation scores were calculated for the entire human proteome as compared to all individual 62 eukaryotic species as a reference, and scores were separately calculated for SUMOylation in addition to all other PTMs. For phylogenetic conservation within orthologues, all human proteins lacking an orthologue were excluded from the analysis on a per-species basis. For phylogenetic conservation outside of orthologues, all human proteins lacking an orthologue were set to 0% conservation. Statistical significance between different phylogenetic conservation rates of the PTM and total protein groups was calculated using two-tailed paired Student s t test, using the average conservation scores of all 62 eukaryotic organisms. As such, the resulting P values are indicative of significance of difference across the entire eukaryotic dataset.

SUMO target protein overlap analysis For SUMO target protein analysis, all proteins identified in this work with at least one SUMO site were selected. Proteins successfully identified by various peptides, but lacking a SUMOpeptide, were ignored. For comparative analysis, identified SUMO target proteins were compared to other studies. SUMO-2 target proteins from Becker et al. 3 were selected by a SUMO-2 / Control ratio of greater than 2, and additionally filtered for a SUMO-2 intensity of greater than 10% as compared to SUMO-1 and control. All SUMO target proteins (SUMO-1 and SUMO-2) from Becker et al. were selected by a SUMO-1 / Control ratio or a SUMO-2 / Control ratio of greater than 2. SUMO-2 target proteins from Golebiowski et al. 1 were selected for a SUMO-2 / Control SILAC ratio of greater than 1.5. Heat-shock inducible SUMO-2 target proteins from Golebiowski et al. were filtered by a SUMO-2-HEAT / SUMO-2 SILAC ratio of greater than 1.5, in addition to a (SUMO-2-HEAT / SUMO-2 SILAC ratio) * (SUMO-2 / Control SILAC ratio) of greater than 1.5. Putative poly-sumo-modified proteins from Bruderer et al. 2 were selected as all identified proteins. Putative poly-sumo-modified proteins from Bruderer et al. were additionally filtered for an observed molecular weight shift of 2 times the molecular weight of SUMO or greater, as compared to the expected protein molecular weight. SUMO-2 target proteins from Matic et al. 5 were considered to be all proteins in which at least one site of SUMOylation was identified. SUMO-2 target proteins from Schimmel et al. 6 were selected as all proteins with a SUMO-2 / Control SILAC ratio of over 2. SUMO-2 target proteins from Tammsalu et al. 4 were considered to be all proteins in which at least one SUMOylation site was identified. Where required, gene IDs were mapped to the corresponding Uniprot IDs. Additionally, where multiple Uniprot IDs were listed for a singular protein identification, a major Uniprot ID was selected by selecting the first Uniprot ID in the list starting with a P, or otherwise the first Uniprot ID starting with a Q, or otherwise the first Uniprot ID in the list. Perseus 20 software was used to generate a complete gene list for all known human proteins, and all identified SUMO target proteins from our study as well as the above-mentioned studies were aligned based on matching Uniprot IDs. Supplementary References 1. Golebiowski,F. et al. System-wide changes to SUMO modifications in response to heat shock. Sci. Signal. 2, ra24 (2009). 2. Bruderer,R. et al. Purification and identification of endogenous polysumo conjugates. EMBO Rep. 12, 142-148 (2011). 3. Becker,J. et al. Detecting endogenous SUMO targets in mammalian cells and tissues. Nat. Struct. Mol. Biol. 20, 525-531 (2013).

4. Tammsalu,T. et al. Proteome-Wide Identification of SUMO2 Modification Sites. Sci. Signal. 7, rs2 (2014). 5. Matic,I. et al. Site-specific identification of SUMO-2 targets in cells reveals an inverted SUMOylation motif and a hydrophobic cluster SUMOylation motif. Mol. Cell 39, 641-652 (2010). 6. Schimmel,J. et al. Uncovering SUMOylation Dynamics during Cell-Cycle Progression Reveals FoxM1 as a Key Mitotic SUMO Target Protein. Mol. Cell(2014). 7. Roukens,M.G. et al. Identification of a new site of sumoylation on Tel (ETV6) uncovers a PIAS-dependent mode of regulating Tel function. Mol. Cell Biol. 28, 2342-2357 (2008). 8. Colaert,N., Helsens,K., Martens,L., Vandekerckhove,J., & Gevaert,K. Improved visualization of protein consensus sequences by icelogo. Nat. Methods 6, 786-787 (2009). 9. Petersen,B., Petersen,T.N., Andersen,P., Nielsen,M., & Lundegaard,C. A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC. Struct. Biol. 9, 51 (2009). 10. Hornbeck,P.V. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined posttranslational modifications in man and mouse. Nucleic Acids Res. 40, D261-D270 (2012). 11. Ashburner,M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25-29 (2000). 12. Kanehisa,M., Goto,S., Sato,Y., Furumichi,M., & Tanabe,M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109-D114 (2012). 13. Punta,M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290-D301 (2012). 14. Subramanian,A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A 102, 15545-15550 (2005). 15. Ruepp,A. et al. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 36, D646-D650 (2008). 16. Franceschini,A. et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808-D815 (2013). 17. Shannon,P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498-2504 (2003). 18. Bader,G.D. & Hogue,C.W. An automated method for finding molecular complexes in large protein interaction networks. BMC. Bioinformatics. 4, 2 (2003). 19. Flicek,P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48-D55 (2013).

20. Cox,J. & Mann,M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC. Bioinformatics. 13 Suppl 16, S12 (2012).