Improving De novo Protein Structure Prediction using Contact Maps Information

Similar documents
Contact map guided ab initio structure prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Protein Structure Prediction, Engineering & Design CHEM 430

Presenter: She Zhang

Supporting Online Material for

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein Structure Prediction

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Ab-initio protein structure prediction

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Protein Structure Determination

Building 3D models of proteins

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

A genetic algorithm for the ligand-protein docking problem

AB initio protein structure prediction or template-free

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Prediction and refinement of NMR structures from sparse experimental data

Francisco Melo, Damien Devos, Eric Depiereux and Ernest Feytmans

A new prediction strategy for long local protein. structures using an original description

Template Free Protein Structure Modeling Jianlin Cheng, PhD

CAP 5510 Lecture 3 Protein Structures

Protein structure analysis. Risto Laakso 10th January 2005

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

The typical end scenario for those who try to predict protein

Finding Similar Protein Structures Efficiently and Effectively

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein Structure Prediction

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

) P = 1 if exp # " s. + 0 otherwise

Portal. User Guide Version 1.0. Contributors

Detecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Template-Based Modeling of Protein Structure

FlexPepDock In a nutshell

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview

Week 10: Homology Modelling (II) - HHpred

Template Based Protein Structure Modeling Jianlin Cheng, PhD

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Comparative Study of Machine Learning Models in Protein Structure Prediction

Evolutionary design of energy functions for protein structure prediction

proteins The dual role of fragments in fragmentassembly methods for de novo protein structure prediction

Bioinformatics. Macromolecular structure

Bioinformatics: Secondary Structure Prediction

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Computational engineering of cellulase Cel9A-68 functional motions through mutations in its linker region. WT 1TF4 (crystal) -90 ERRAT PROVE VERIFY3D

Protein Structures. 11/19/2002 Lecture 24 1

Protein Structure Prediction

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Bioinformatics: Secondary Structure Prediction

Protein Structure Prediction 11/11/05

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Basics of protein structure

PROTEIN STRUCTURE PREDICTION II

Introduction to" Protein Structure

Computational Molecular Biology (

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Predicting protein contact map using evolutionary and physical constraints by integer programming (extended version)

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Better Bond Angles in the Protein Data Bank

ALL LECTURES IN SB Introduction

Prediction of Protein Backbone Structure by Preference Classification with SVM

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

HIV protease inhibitor. Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

Docking. GBCB 5874: Problem Solving in GBCB

Chemical Shift Restraints Tools and Methods. Andrea Cavalli

Reconstruction of Protein Backbone with the α-carbon Coordinates *

Packing of Secondary Structures

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

proteins Comparison of structure-based and threading-based approaches to protein functional annotation Michal Brylinski, and Jeffrey Skolnick*

proteins Estimating quality of template-based protein models by alignment stability Hao Chen 1 and Daisuke Kihara 1,2,3,4 * INTRODUCTION

Assignment 2 Atomic-Level Molecular Modeling

Kd = koff/kon = [R][L]/[RL]

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

April, The energy functions include:

All-atom ab initio folding of a diverse set of proteins

Chemical properties that affect binding of enzyme-inhibiting drugs to enzymes

Supplementing information theory with opposite polarity of amino acids for protein contact prediction

Generalized Pattern Search Algorithm for Peptide Structure Prediction

LOCAL MINIMA HOPPING ALONG THE PROTEIN ENERGY SURFACE

Protein quality assessment

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

1-D Predictions. Prediction of local features: Secondary structure & surface exposure

Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework

Transcription:

CIBCB 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology Improving De novo Protein Structure Prediction using Contact Maps Information Karina Baptista dos Santos PhD student in Computational Modeling National Laboratory for Scientific Computing (LNCC) Petrópolis, RJ - Brazil

Introduction Protein Structure Prediction (PSP) have a large potentital for biotechnological applications MQTVLKKRKKSGYGYIPD IADIRDFSYTPEKSVIAA LPPKVDLTPSFQVYDQGR IGSCTANALAAAIQFERI HDKQSPEFIPSRLFI Protein Function o Can provide essential clues to understanding diseases o Allows the creation of new proteins and the development of new drugs 2

Introduction CASP (Critical Assessment of Techniques for Protein Structure Prediction) http://predictioncenter.org/ o 12th CASP (1994-2016) o Assess the state of the art of protein structure prediction methods Evaluation via process of blind prediction o Progress in PSP methodologies Use of protein contact maps in the search of native conformation o Highlight where the future effort may be focused Proceedings of the National Academy of Sciences pages 7298-7303, 30 NOV 2006 http://www.pnas.org/content/103/19/7298/f1.expansion.html 3

Introduction Protein Contact Maps CASP10, CASP11 and CASP12 o A minimalist representation of the 3D protein structure. o Two residues in a protein are in contact if the Euclidean distance between their Cβ atoms (Cα for Glycine) is at most some threshold value, usually 8Å. Cβ i Cβ j 8.0Å Proteins: Structure, Function, and Bioinformatics pages 84-97, 16 OCT 2013 DOI: 10.1002/prot.24367 http://onlinelibrary.wiley.com/doi/10.1002/prot.24367/full#prot24367-fig-0001 4

Introduction Protein Contact Maps prediction Residue covariation analysis o Sufficiently large number of homologous proteins (MSA) Which residue pairs are most likely to be in contact? Which residue pairs are important to be maintained by evolution? MSA inference Contact in 3D covariation These contacts are not template! Contact Map Precision threshold [0, 1] Probability of such residues being in contact Plos One, v.6, n.12 page e28766, 7 DEC 2011 DOI: 10.1002/prot.24367 https://doi.org/10.1371/journal.pone.0028766 5

Introduction Protein Contact Maps prediction o MetaPSICOV 1 (the CASP11 winner) Stage 1 Classic contact prediction features 3 three coevolution-based methods PSICOV, mfdca-free Contact and CCMpred Infer correlations signals in MSA Stage 2 Fill in the gaps in contact map and remove outliers (Filter) Hydrogen bonds patterns between residues in contact 1 Kosciolek, Tomasz, and David T. Jones. "Accurate contact predictions using covariation techniques and machine learning." Proteins: Structure, Function, 6 and Bioinformatics (2015).

Objectives Develop a strategy to use the information of protein contact maps in the GAPF PSP program Contact Map Term a new term of the GAPF Fitness Function Assess the predictive information capacity of the contact maps provided by MetaPSICOV Stage 1 7

Methods GAPF 2, 3 - Genetic Algorithm for Protein Folding o Phenotypic crowding-based steady-state genetic algorithm o Candidate solutions are encoded by the backbone dihedral angles o Coarse-Grained representation for the side chain atoms 2 GAPF - Custódio, F. L.. Algoritmos Genéticos para Predição Ab Initio de Estrutura de Proteínas. Laboratório Nacional de Computação Científica. 2008. 3 Custódio, Fábio Lima, Helio J. C. Barbosa, and Laurent Emmanuel Dardenne. A Multiple Minima Genetic Algorithm for Protein Structure Prediction. Applied Soft Computing Journal 15:88 99. 2014.

Methods GAPF - Genetic Algorithm for Protein Folding o Protein Fragment Libraries Profrager 4 www.lncc.br/sinapad/profrager/ Fragment Structure Protein Model o Secondary Structure Prediction PSIPRED 5 4 Trevizani, R., Custódio, F.L., dos Santos, K.B., and Dardenne, L.E. Critical Features of Fragment Libraries for Protein Structure Prediction. PLOS ONE, vol.12, no.1, pp.1-22, 2017. 5 Jones, D.t., Protein secondary structure prediction based on position specific scoring matrices. Journal of molecular biology, vol.292, no2, pp.195-202,1999.

Methods GAPF - Fitness Function 7 o Based on the energy from the interaction between the atoms of the protein Dihedral potential (from Gromos96 force field); Atomic repulsive term; Hydrophobic compaction term; 4 Hydrogen bond terms Hydrogen bonds 1a00 10 7 Rocha, Gregório Kappaun. Desenvolvimento de Metodologias Para Predição de Estruturas de Proteínas Independente de Moldes. Laboratório Nacional de Computação Científica. 2015.

Methods Contact Map Term Distance constraints λ = Contribution value add to the predicted model energy γ = Precision threshold associated with the probability of such residues being in contact σ(i, j) = Euclidean distance between Cβ atoms of residues i and j o The more residues are respecting the distance constrainsts, the greater the contribution to the fitness function 11

Methods Test Set Experimentally determined 3D structures 3FIL 56 residues T0773-D1 67 residues (PDB 2N2U) T0820-D1 90 residues T0769-D1 97 residues (PDB 2MQ8) T0766-D1 108 residues (PDB 4Q53) 12

Methods Contact Maps o Native Map All residues pairs found in native structure (PDB file) Precision threshold (PT) = 1.0 R1 R2 d min d max PT 83 115 0 8 1.0 64 74 0 8 1.0 6 51 0 8 1.0 33 40 0 8 1.0... 13

Methods Contact Maps o Filtered Map Only the residues pairs predicted by MetaPSICOV Stage1 that are present in the Native Map. Precision threshold (PT) = MetaPSICOV Predicted value R1 R2 d min d max PT 83 115 0 8 0.71 64 74 0 8 0.56 6 51 0 8 0.44 33 40 0 8 0.34... 14

Methods Contact Maps o Filtered Map Only the residues pairs predicted by MetaPSICOV Stage1 that are present in the Native Map. Precision threshold (PT) = MetaPSICOV Predicted value R1 R2 d min d max PT 83 115 0 8 0.71 64 74 0 8 0.56 6 51 0 8 0.44 33 40 0 8 0.34... Native X Filtered Same Residues / Different Precision Threshold 15

Filtered Map - Precision threshold profile 0.63 0.50 0.38 0.66 0.88 16

Native x False contacts in the MetaPSICOV predicted map Contact Pairs Filter native contacts from predicted map Great challenge 6000 5000 4000 3655 4278 5356 3000 2000 1000 0 1953 1275 107 132 78 204 256 3filA T0773-D1 T0820-D1 T0769-D1 T0766-D1 All MetaPSICOV Stage1 contacts Native contacts o No method described is able to do this selection efficiently Precision threshold above some value (e.g. 0.5, 0.4 ) Number of contacts associated with the sequence length ( e.g., L/5, L/2 ) 17

Results Impact of the use of Contact Maps o 4000 models predicted by GAPF Best of All Models Predicted Top5 Best Energy Models o RMSD 4.0Å Good predictions 18

Results Impact of the use of Contact Maps o 4000 models predicted by GAPF Best of All Models Predicted T0820-D1 Top5 Best Energy Models 6Å >10Å o RMSD 4.0Å Good predictions 19

Results Satisfied contact does not mean correct contact! Best of All Top 5 Best Energy Native Map 60% 65% Filtered Map 48% 44% Cβ i Cβ j 8.0Å Native Structure Cβ- Cβ = 7.71 Å Model Cβ- Cβ = 4.23 Å 20

Results RMSD (Å) RMSD (Å) RMSD (Å) RMSD x Energy (Native Map) The greatest contribution to GAPF 3FIL T0773-D1 Energy (Kcal/mol) Energy (Kcal/mol) T0820-D1 Energy (Kcal/mol) 21

Results RMSD x Energy (Standard Procotol) o Difficult to select the best structure model methodology to select decoys 3FIL T0820-D1 22

Results T0820-D1 - Summary of improvements GAFP_St GAFP_Cm Native map GAFP_Cm Filtered map Best of All Models RMSD = 9.60 Å GDT = 40.28 % RMSD = 3.19 Å GDT = 68.89 % RMSD = 3.73 Å GDT = 66.11 % Top 5 Best Energy Models RMSD = 13.79 Å GDT = 26.94 % RMSD = 3.61 Å GDT = 66.11 % RMSD = 3.65 Å GDT = 65.28 % Experimental Structure Model predicted by GAPF program 23

Conclusions Contact Map in the form of Distance Constrains Useful Strategy for PSP Naive Potential Properly combined with GAPF fitness function Predictors give probability for all possible residue-residue contacts o Lack of full confidence in the prediction of the contacts Development of strategies to filter and enhance contact maps o Which contacts are most valuable for PSP 24

Acknowledgment Thank you! karinabs@lncc.br 25