Template Free Protein Structure Modeling Jianlin Cheng, PhD

Similar documents
Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein Structure Prediction, Engineering & Design CHEM 430

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Supporting Online Material for

Protein Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein Structure Prediction

Protein quality assessment

Building 3D models of proteins

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Protein Structure Prediction

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Orientational degeneracy in the presence of one alignment tensor.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

Protein Structure Determination

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

FlexPepDock In a nutshell

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Protein Structures. 11/19/2002 Lecture 24 1

Ab-initio protein structure prediction

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Analysis and Prediction of Protein Structure (I)

CAP 5510 Lecture 3 Protein Structures

Contact map guided ab initio structure prediction

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Abstract. Introduction

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

proteins Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten*

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Presenter: She Zhang

Monte Carlo Simulations of Protein Folding using Lattice Models

Template Based Protein Structure Modeling Jianlin Cheng, PhD

Prediction and refinement of NMR structures from sparse experimental data

From Amino Acids to Proteins - in 4 Easy Steps

Why Do Protein Structures Recur?

ALL LECTURES IN SB Introduction

Protein Structures: Experiments and Modeling. Patrice Koehl

Bioinformatics: Secondary Structure Prediction

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Template-Based Modeling of Protein Structure

Prediction of Protein Backbone Structure by Preference Classification with SVM

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

CS612 - Algorithms in Bioinformatics

Protein Modeling Methods. Knowledge. Protein Modeling Methods. Fold Recognition. Knowledge-based methods. Introduction to Bioinformatics

Protein Structure Prediction and Protein-Ligand Docking

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Improved Recognition of Native-Like Protein Structures Using a Combination of Sequence-Dependent and Sequence-Independent Features of Proteins

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Chemical Shift Restraints Tools and Methods. Andrea Cavalli

Molecular Modeling lecture 2

proteins Comprehensive description of protein structures using protein folding shape code Jiaan Yang* INTRODUCTION

Computational Molecular Modeling

Protein Structure Prediction and Display

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

3DRobot: automated generation of diverse and well-packed protein structure decoys

GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Coordinate Refinement on All Atoms of the Protein Backbone with Support Vector Regression

clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons

Hidden Markov Models in computational biology. Ron Elber Computer Science Cornell

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University

Protein sidechain conformer prediction: a test of the energy function Robert J Petrella 1, Themis Lazaridis 1 and Martin Karplus 1,2

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement

Minimalist Representations and the Importance of Nearest Neighbor Effects in Protein Folding Simulations

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Steps in protein modelling. Structure prediction, fold recognition and homology modelling. Basic principles of protein structure

Bioinformatics: Secondary Structure Prediction

3D Structure. Prediction & Assessment Pt. 2. David Wishart 3-41 Athabasca Hall

EBBA: Efficient Branch and Bound Algorithm for Protein Decoy Generation

Syllabus BINF Computational Biology Core Course

Protein Structure Prediction 11/11/05

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Bioinformatics. Macromolecular structure

Lecture 34 Protein Unfolding Thermodynamics

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Modeling for 3D structure prediction

Protein Structure Prediction

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Solvent accessible surface area approximations for rapid and accurate protein structure prediction

Protein Secondary Structure Prediction

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Week 10: Homology Modelling (II) - HHpred

Transcription:

Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013

Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

Two Approaches for 3D Structure Prediction Ab Initio Structure Prediction Physical force field protein folding Contact map - reconstruction Template-Based Structure Prediction MWLKKFGINLLIGQSV Simulation Select structure with minimum free energy Query protein MWLKKFGINKH Fold Recognition Alignment Protein Data Bank Template

Energy Functions T. Lazaridis, M. Karplus. Effective energy functions for protein structure prediction. Current Opinion in Structural Biology. 2000 A. Liwo, C. Czaplewski, S. Oldiej, H.A. Scheraga. Computational techniques for efficient conformational sampling of proteins. 2008 K. Simons et al. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. JMB. 1997. (Rosetta a case study)

Protein Energy Function The native state of a protein is the state of lowest free energy under physiological conditions This state corresponds to the lowest basin of the effective energy surface. The term effective energy refers to the free energy of the system (protein plus solvent)

Two Kinds of Energy Functions Physical effective energy function (PEEF): fundamental analysis of forces between particles Statistical effective energy function: data derived from known protein structures (e.g., statistics concerning pair contacts and surface area burial)

Statistical Effective Energy Function (SEEF) Less sensitive to small displacements Because of their statistical nature, they can, in principle, include all known and unrecognized, physical effects. Works better for protein structure prediction

SEEF Employ a reduced representation of the protein: a single interaction center at Ca or Cb for each residue. Basic idea: log (P ab / P a * P b ). P ab : is the observed probability that residues a and b are in contact. P a is frequency of a and P b is the frequency of b More info: use secondary structure, solvent accessibility, distance as conditions.

Energy Terms Pairwise contact potentials Hydrogen bonds Torsion angle Sidechain orientation coupling Rotamer energy Burial energy

Physical Effective Energy Function (PEEF) CHARMM implementation AMBER implementation

Benchmark Can a function select a native structure from a large pool of decoys? Can a function be used effectively in conformation sampling to generate a high proportion of near-native conformations?

Representation for Conformation Sampling How to change position of one residue?

Torsion Angles How to change position of one residue?

Vector Space

Simulated Annealing Accept a move based on a probability related to temperature, e.g., P ~ e^ (-ΔE / T) Temperature (T) controls the degree of exploration. Higher temperature, more exploration? Why? Temperature decreases as the sampling process progresses (from iteration to iteration): cool schedule

An Example

Pseudo Code

A TFM Example: Rosetta K. Simons, C. Kooperberg, E. Huang, D. Baker. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. JMB, 1997.

Basic Idea Short sequence segments are restricted to the local structures adopted by the most closely related sequences in the PDB Use the observed local conformations of similar local sequences to reduce sampling space

Scoring Functions of Selecting Local Conformations Knowledge-based potential functions Bayesian scoring function One native assumption is P(structure) = 1 / # of structures.

P(a structure) 0 for configurations with overlaps between atoms Proportional to exp(-radius of gyration^2) for all other configurations. Independent of secondary structure elements

Considering Beta-Sheet Pairing

Scoring P(Sequence Structure) E i can represent a variety of features of the local structural environment around residue i.

Implementation Second term: for pairs separated for more than 10 residues along the chain Buried environment: >16 other Cb atoms within 10 Angstrom of the Cb atom of the residue; otherwise, exposed

Negative Log of Interaction Probability Function

Structure Generation Initialization: Splicing together fragments of proteins of known structure with similar local sequences and evaluating them initially using equation (6).

Simulated Annealing Low scoring conformations with distributions of residues similar to those of known proteins are resampled by simulated annealing in conjunction with a simple move set that involves replacing the torsion angles of a segment of the chain with the torsion angles of a different protein fragment with a related amino acid sequence. The simulated conformation is evaluated by (8)

Methods Structures are represented using a simplified model consisting of heavy atoms of the mainchain and the C b atom of the side chain. All bond lengths and angles are held constant according to the ideal geometry of alanine (Engh & Huber 91); the only remaining variables are the backbone torsional angles.

Fragment Databases Nimers / trimers (sequences) and their conformations extracted from known structures in the database Identify sequence neighbors: simple amino acid frequency matching score. e S(aa,i) and X(aa,i) are the frequencies of amino acid aa at position i in nine residue segments of either the sequence being folded (S) or of one of the proteins of known structure based on multiple sequence alignment

Simulation The starting configuration in all simulations was the fully extended chain. A move consists of substituting the torsional angles of a randomly chosen neighbor at a randomly chosen position for those of the current configuration. Moves which bring two atoms within 2.5 Angstrom are immediately rejected; other moves are evaluated according to the Metropolis criterion using the scoring equation. Simulated annealing was carried out by reducing the temperature from 2500 to 10 linearly over the course of 10,000 cycles (attempted moves).

Simulated Structure Examples

Project 2 Develop a simple prototype of fragment assembly protein structure prediction system

Project Plan Representative protein structure database: http://bioinfo.mni.th-mh.de/pdbselect/, or something I will post at the course web site. Energy function: Rosetta 3, Dfire energy function (executable available), Yang Zhang s RW potential (executable available) Sampling approach Testing: 3 CASP10 TFM targets Present your plan Wednesday

Technical Issues and Resources Conversion between torsional angles and Cartesian coordinates: https://www.rosettacommons.org/content/conversion-dihedralangle-representation-cartesian-representation; Convert coordinates to torsion angles http://www.math.fsu.edu/~quine/mb_10/6_torsion.pdf RW potential: http://zhanglab.ccmb.med.umich.edu/rw/ Dfire energy: https://www.rosettacommons.org/manuals/archive/ rosetta3.4_user_guide/df/d11/ classcore_1_1scoring_1_1methods_1_1dfire_1_1_d_f_i_r_e energy.html Rosetta: https://www.rosettacommons.org/

Rosetta Resource How to Use Rosetta: http://www.cs.huji.ac.il/~fora/81855/ exercises/81855_ex3_2012.pdf Rosetta Online Document: https://www.rosettacommons.org/manuals/ archive/rosetta3.4_user_guide/index.html