Analyzing six types of protein-protein interfaces. Yanay Ofran and Burkhard Rost

Similar documents
Measuring quaternary structure similarity using global versus local measures.

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Characterization of Protein Protein Interfaces

Analysis of Relevant Physicochemical Properties in Obligate and Non-obligate Protein-protein Interactions

From Amino Acids to Proteins - in 4 Easy Steps

Protein-protein Interaction Prediction using Desolvation Energies and Interface Properties

PDBe TUTORIAL. PDBePISA (Protein Interfaces, Surfaces and Assemblies)

SUPPLEMENTARY MATERIALS

Protein Secondary Structure Prediction

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Sequence analysis and comparison

Motif Prediction in Amino Acid Interaction Networks

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Packing of Secondary Structures

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Bi 8 Midterm Review. TAs: Sarah Cohen, Doo Young Lee, Erin Isaza, and Courtney Chen

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

ORGANIC - BROWN 8E CH AMINO ACIDS AND PROTEINS.

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Protein Secondary Structure Prediction

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Lec.1 Chemistry Of Water

Useful background reading

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Structure, mechanism and ensemble formation of the Alkylhydroperoxide Reductase subunits. AhpC and AhpF from Escherichia coli

The prediction of membrane protein types with NPE

Protein Structure Prediction

Insights into Protein Protein Interfaces using a Bayesian Network Prediction Method

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

PREDICTION OF PROTEIN BINDING SITES BY COMBINING SEVERAL METHODS

Acta Cryst. (2017). D73, doi: /s

Protein Structures. 11/19/2002 Lecture 24 1

Week 10: Homology Modelling (II) - HHpred

Bioinformatics: Secondary Structure Prediction

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Using Information Theory to Reduce Complexities of Neural Networks in Protein Secondary Structure Prediction

ARE BINDING RESIDUES CONSERVED?

BIOINFORMATICS: An Introduction

Analysis of Molecular Recognition Features (MoRFs)

Statistical Analysis and Prediction of Protein Protein Interfaces

QUANTA Protein Design MAY 2006

Lecture 4 Model Answers to Problems

Modeling for 3D structure prediction

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Optimization and Frustration:

Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Hands-On Nine The PAX6 Gene and Protein

Automatic Epitope Recognition in Proteins Oriented to the System for Macromolecular Interaction Assessment MIAX

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

DATE A DAtabase of TIM Barrel Enzymes

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

CAP 5510 Lecture 3 Protein Structures

QUANTA. Protein Design. Release December Scranton Road San Diego, CA / Fax: 858/

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Detection of Protein Binding Sites II

Sequence comparison: Score matrices

Data analysis and Geostatistics - lecture VI

Figure S1. Interaction of PcTS with αsyn. (a) 1 H- 15 N HSQC NMR spectra of 100 µm αsyn in the absence (0:1, black) and increasing equivalent

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Protein Structure Determination

Quantifying sequence similarity

Resonance assignments in proteins. Christina Redfield

Homology Modeling. Roberto Lins EPFL - summer semester 2005

C h a p t e r 2 A n a l y s i s o f s o m e S e q u e n c e... methods use different attributes related to mis sense mutations such as

Better Bond Angles in the Protein Data Bank

NMR, X-ray Diffraction, Protein Structure, and RasMol

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

PAM-1 Matrix 10,000. From: Ala Arg Asn Asp Cys Gln Glu To:

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Supplementary Figure 1 Crystal contacts in COP apo structure (PDB code 3S0R)

Prediction of the subcellular location of apoptosis proteins based on approximate entropy

mechanisms: Discrete molecular dynamics study

Interfacing chemical biology with the -omic sciences and systems biology

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

BLAST: Target frequencies and information content Dannie Durand

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Supplemental Information for. Quaternary dynamics of B crystallin as a direct consequence of localised tertiary fluctuations in the C terminus

Sunhats for plants. How plants detect dangerous ultraviolet rays

Protein Structure Prediction Using Neural Networks

Molecular Modeling Lecture 11 side chain modeling rotamers rotamer explorer buried cavities.

SUPPLEMENTARY INFORMATION

Ch. 9 Multiple Sequence Alignment (MSA)

Problem Set 1

Table S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants. GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg

arxiv: v1 [cond-mat.soft] 22 Oct 2007

Francisco M. Couto Mário J. Silva Pedro Coutinho

Protein Structure Prediction, Engineering & Design CHEM 430

Performing a Pharmacophore Search using CSD-CrossMiner

Aqueous solutions. Solubility of different compounds in water

Transcription:

Analyzing six types of protein-protein interfaces Yanay Ofran and Burkhard Rost

Goal of the paper To check 1. If there is significant difference in amino acid composition in various interfaces of protein-protein interactions 2. If there is preference b/w pairs of residues in each of the interfaces

Difference in data with previous studies Different types of interactions are just classified as internal and external NO DIFFERECE B/W HOMOMULTIMERS AND HETERO-MULTIMERS NO DIFFERENCE B/W PERMANENT INTERACTIONS AND TRANCIENT INTERACTIONS Different types of interactions are classified into six categories Hand selected Automated Patches Contacts

WHY DIFFERENCIATE B/W Patches and Contacts? Such surface patches may not capture all aspects of protein interactions. For example, slightly buried residues with long sidechains may be missed, although they participate in interfaces. Some residues identified as part of a surface patch may form important contacts, while others may not form contacts at all. Therefore, the analysis of surface patches may not capture all residue residue contacts that underlie the interaction.

Advantage of automated way to differentiate multimers There is no simple way to distinguish automatically between (i) interfaces between two chains that belong to one multi-chain protein and (ii) interfaces between two different proteins.

Data set: Largest possible non-redundant subset of PDB: no pair of proteins in that set had more than 25% identity over 100 aligned residues The non-redundant set included 1812 high-resolution structures. We excluded NMR structures, theoretical models, and chains shorter than 30 residues. Of these proteins, 936 (51%)had resolutions below 2A, 74 proteins (4%) had resolutions above 3A.

Generation of dataset Identifying interacting residues from complexes in PDB We defined a residue pair to be in contact if the distance between the closest of their respective atoms was 6 A and their sequence separation was three or more residues. Previous studies used distances between 4A and 12A between two Cα or Cβ atoms.

Data set statistics Type of interface Number of contacts Internal intra-domain 1 3,340,485 domain-domain 2 255,144 External homo-obligomers 3 218,104 homo-complexes 4 3,077 hetero-obligomers 5 18,886 hetero-complexes 6 166,412

Data mining To differentiate b/w 1. Homo-multimers & heteromultimers more than 10% difference in sequence 2. Permenent homo-multimers & transient homomultimers PDB & DIP - annotated as functional monomer 3. Permenent Hetero multimers & transient Hetero multimers cross reference to SWISSSPROT- if different chain in PDB included in same swissprot file it is permanent complex

Amino acids in interfaces Amino acid composition of six interface types. The propensities of all residues found in SWISS-PROT were used as background. If the frequency of an amino acid is similar to its frequency in SWISS-PROT, the height of the bar is close to zero. Over-representation results in a positive bar, and under-representation results in a negative bar. Ofran, Rost, J. Mol. Biol. 325, 377 (2003)

Whats wrong with Standard test of significance? Problematic when the data sets are small Another problem with these tests that is not commonly noted, is that the application of significance tests to very large data sets with a few degrees of freedom, like those analyzed here, can lead to severe inferential mistakes

sample P JS?... sample Q1 sample Q2 sample Qn... data set 1 data set 2 data set n

Amino acid composition 1. Pick a random sample P with 1000 residues from one of the six sets of residue residue contacts. 2. Pick six random samples Q1 Q6 with 1000 residues from each of the six sets. 3. Find the set Qp from Q1 Q6 least divergent from set P by measuring the JS divergence between P and Q1 Q6. 4. Record the types of interactions from which P and QP were sampled. We repeated this procedure 6000 times (1000 for each of the six types of interactions). If the residue composition differed significantly between the types, we expect Qp to be, in most cases, the sample that was sampled from the same population of P. That is, we expect that in most cases, P and the sample most similar to P were sampled from contacts of the same interface type.

Measuring the divergence between distributions JS (p 1,p 2 ) = H (π 1 p 1 + π 2 p 2 ) - π 1 H (p 1 ) - π 2 H (p 2 ) with: π 1 and π 2 >= 0, and π 1 + π 2 =1 and H(p) =- Σ i p(x i )log 2 (x i ) P1 and P2 are prior probabilities for distributions p1 and p2 Lin (1991) IEEE Transactions on Information Theory 37:145-151

Significant difference between amino acid composition of six interface types internal external intradomain domaindomain homoobligomer homocomplex heteroobligomer heterocomplex intra-domain 75.9 19.4 0.4-4.3 0.2 domain-domain 18.6 62.7 0.9-16.4 1.4 Homo-obligomer 0.9 1.5 78.0-2.2 17.4 Homo-complex - - 0.2 99.8 - - Hetero-obligomer 3.9 17.2 1.9-70.8 6.2 Hetero-complex 0.1 1.2 16.3-6.3 76.1

Significant difference in composition of six interfaces and surface residues internal intradomain domaindomain homoobligomer homocomplex heteroobligome r heterocomplex Surface exposed intra-domain 75.7 19.6 0.3 4.4 domain-domain 16.7 64.5 1.6 16.6 0.6 Homo-obligomer 0.4 2.1 73.6 3.4 20.5 Homo-complex 0.1 99.9 - - Hetero-obligomer 3.2 15.2 3.4 73.1 5.1 Hetero-complex 1.9 17.0 4.5 76.6 Exposed 100

Significant difference b/w contact preferences for six interface types internal external intradomain domaindomain homoobligomer homocomplex heteroobligomer heterocomplex intra-domain 63.0 29.5 1.7-4.7 1.1 domain-domain 21.8 61.2 0.9-12.6 3.5 Homo-obligomer 1.5 1.6 79.6-1.9 15.4 Homo-complex - - - 100.0 - - Hetero-obligomer 4.2 9.6 0.9-81.6 3.7 Hetero-complex 0.6 4.6 15.9-6.9 72.0

CONCLUSIONS ON STUDY OF RESIDUE CONTACT PREFERENCES Homo-complexes depleted in salt-bridges and rich in contacts between identical residues: Hydrophobic hydrophilic interactions dominated intra-domain, domain domain and heterocomplex interfaces Cysteine bridges were observed more often than expected for all interface types. Salt bridges were common with the exception of homocomplexes, for which they were observed less often than expected. Homo-complexes exhibited an extreme general preference for interactions between identical amino acid residues. Furthermore, overall the contact preferences also stood out most for homo-complexes.

Measuring residue residue preferences After it had been established that the amino acid composition differed between the six interface types, the six lists was to compute the likelihood of forming contacts between each pair of amino acid residues. In particular, the log odds ratio of the observed frequency of the pair over its expected frequency is compiled

Lx(i,j) = log 2 (P(i,j)/(P(i)P(j))) where the subscript x represents one of the six types of interfaces (intra-domain, domain domain, homoobligomers, homo-complex, hetero-obligomers and hetero-complex), i and j are types of amino acid residue, P(i,j ) is the probability of a contact between amino acids of type i and j in interfaces of type x, and P(i ) and P( j ) are the probability of occurrence for amino acid residues i and j, respectively, in the interfaces of type x. Hence, the denominator describes the probability of a contact between i and j if the formation of contacts between i and j were random. On the basis of this equation, we generated six matrices for the likelihood for all possible contacts in each interface type.

Pairing frequencies at interfaces Residue residue preferences. (A) Intra-domain: hydrophobic core is clear (B) domain domain, (C) obligatory homooligomers (homo-obligomers), (D) transient homo-oligomers (homo-complexes), (E) obligatory hetero-oligomers (heteroobligomers), and (F) transient heterooligomers (hetero-complexes). A red square indicates that the interaction occurs more frequently than expected; a blue square indicates that it occurs less frequently than expected. The amino acid residues are ordered according to hydrophobicity, with isoleucine as the most hydrophobic and arginine as the least hydrophobic. Ofran, Rost, J. Mol. Biol. 325, 377 (2003)

Observing the Result For the other interface types, we observed strong self-interaction preferences exclusively for cysteine residues, which are known to stabilise interactions through forming cysteine bridges Confirming earlier findings, we found saltbridges abundant in interfaces

Observing the Result Based on amino acid composition, some studies hypothesize that hydrophobic interactions are more frequent in permanent interactions than in transient interactions. This hypothesis is confirmed by our residue residue preference data, both for homo-obligomers and for hetero obligomers

Novel components that yielded results Set of interfaces were data-mined from PDB which is far larger than datasets analyzed before Analysis on a more finely grained distinction of interfaces than explored previously. Distinguished b/w two types of internal interactions Distinguished b/w four types of external interactions analyzed interface contacts rather than surface patches Established the statistical significance of differences through a rigorous information theory derived procedure.

Critical analysis Criteria for differentiating different types of complexes - no validation dependent on annotation if DIP

CONCLUSIONS Six types of interfaces analysed differed in amino acid composition residue contact preferences

Discussion/questions?