Evolutionary design of energy functions for protein structure prediction

Similar documents
CMPS 3110: Bioinformatics. Tertiary Structure Prediction

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

3D HP Protein Folding Problem using Ant Algorithm

Protein Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein quality assessment

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Thermodynamics. Entropy and its Applications. Lecture 11. NC State University

Second Law Applications. NC State University

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Algorithm for protein folding problem in 3D lattice HP model

Template-Based Modeling of Protein Structure

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Generalized ensemble methods for de novo structure prediction. 1 To whom correspondence may be addressed.

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Protein Structure Prediction, Engineering & Design CHEM 430

Energy Landscapes and Accelerated Molecular- Dynamical Techniques for the Study of Protein Folding

) P = 1 if exp # " s. + 0 otherwise

Why Do Protein Structures Recur?

Selecting protein fuzzy contact maps through information and structure measures

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

arxiv:cond-mat/ v1 [cond-mat.soft] 19 Mar 2001

Lecture 21 (11/3/17) Protein Stability, Folding, and Dynamics Hydrophobic effect drives protein folding

Improving De novo Protein Structure Prediction using Contact Maps Information

How Nature Circumvents Low Probabilities: The Molecular Basis of Information and Complexity. Peter Schuster

Protein Structure Prediction

Ab-initio protein structure prediction

Contact map guided ab initio structure prediction

Introduction to Optimization

GOAP: A Generalized Orientation-Dependent, All-Atom Statistical Potential for Protein Structure Prediction

All-atom ab initio folding of a diverse set of proteins

Paul Sigler et al, 1998.

Research article Ab initio modeling of small proteins by iterative TASSER simulations Sitao Wu 1, Jeffrey Skolnick 2 and Yang Zhang* 1

Supporting Online Material for

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

Optimization Methods via Simulation

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

The role of form in molecular biology. Conceptual parallelism with developmental and evolutionary biology

Ab initio protein structure prediction Corey Hardin*, Taras V Pogorelov and Zaida Luthey-Schulten*

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Evolutionary Computation

Effect of Sequences on the Shape of Protein Energy Landscapes Yue Li Department of Computer Science Florida State University Tallahassee, FL 32306

proteins Extracting knowledge from protein structure geometry Peter Røgen1* and Patrice Koehl2 INTRODUCTION

Protein Folding In Vitro*

Evolution of functionality in lattice proteins

clustq: Efficient Protein Decoy Clustering Using Superposition-free Weighted Internal Distance Comparisons

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

TASSER: An Automated Method for the Prediction of Protein Tertiary Structures in CASP6

The protein folding problem consists of two parts:

Protein Folding Prof. Eugene Shakhnovich

Motif Prediction in Amino Acid Interaction Networks

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

Folding of small proteins using a single continuous potential

Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-Step Atomic-Level Energy Minimization

HUGO! Human Strategy based Genetic Optimizer

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 1 Oct 1999

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Human and Server CAPRI Protein Docking Prediction Using LZerD with Combined Scoring Functions. Daisuke Kihara

Protein structure forces, and folding

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Energetics and Thermodynamics

Protein Structure Prediction

Computational methods for predicting protein-protein interactions

Short Announcements. 1 st Quiz today: 15 minutes. Homework 3: Due next Wednesday.

Protein Structure Prediction ¼ century of disaster

Prediction and refinement of NMR structures from sparse experimental data

A new combination of replica exchange Monte Carlo and histogram analysis for protein folding and thermodynamics

Protein Folding. I. Characteristics of proteins. C α

Elucidation of the RNA-folding mechanism at the level of both

3DRobot: automated generation of diverse and well-packed protein structure decoys

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Identifying the Protein Folding Nucleus Using Molecular Dynamics

Overview. Optimization. Easy optimization problems. Monte Carlo for Optimization. 1. Survey MC ideas for optimization: (a) Multistart

Protein Structure Prediction on GPU: a Declarative Approach in a Multi-agent Framework

Protein Structure Prediction

Novel Monte Carlo Methods for Protein Structure Modeling. Jinfeng Zhang Department of Statistics Harvard University

Protein folding. Today s Outline

13 De Novo Protein Structure Prediction

Application of the Markov State Model to Molecular Dynamics of Biological Molecules. Speaker: Xun Sang-Ni Supervisor: Prof. Wu Dr.

Computer simulations of protein folding with a small number of distance restraints

ARTIFICIAL NEURAL NETWORK WITH HYBRID TAGUCHI-GENETIC ALGORITHM FOR NONLINEAR MIMO MODEL OF MACHINING PROCESSES

Are the native states of proteins kinetic traps? Abstract

arxiv:cond-mat/ v1 [cond-mat.soft] 5 May 1998

Curso de Doctorado Universidad del Centro de la Provincia de Buenos Aires, Argentina, 24 al 27 de Abril, 2006

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

Protein Folding Challenge and Theoretical Computer Science

Protein structure alignments

Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT-

A Genetic Algorithm for Energy Minimization in Bio-molecular Systems

Protein Structure Prediction 11/11/05

Protein structure and folding

proteins Prediction Report Template-based modeling and free modeling by I-TASSER in CASP7 Yang Zhang* 108 PROTEINS VC 2007 WILEY-LISS, INC.

AbInitioProteinStructurePredictionviaaCombinationof Threading,LatticeFolding,Clustering,andStructure Refinement

O 3 O 4 O 5. q 3. q 4. Transition

AM 121: Intro to Optimization Models and Methods: Fall 2018

Table S1. Primers used for the constructions of recombinant GAL1 and λ5 mutants. GAL1-E74A ccgagcagcgggcggctgtctttcc ggaaagacagccgcccgctgctcgg

Transcription:

Evolutionary design of energy functions for protein structure prediction Natalio Krasnogor nxk@ cs. nott. ac. uk Paweł Widera, Jonathan Garibaldi 7th Annual HUMIES Awards 2010-07-09

Protein structure prediction From 1D sequence to 3D structure LFSKELRCMMYGFGDDQNPYTESVDILEDLVIEFITEMTHKAMSIFSEEQLNRYEMYRRSAFPKAA IKRLIQSITGTSVSQNVVIAMSGISKVFVGEVVEEALDVCEKWGEMPPLQPKHMREAVRRLKSKGQIP Protein basics 20 amino acid alphabet sequence encodes structure structure determines activity ratio structures sequences = 0.2% Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 2 / 14

The algorithm of folding Anfinsen s thermodynamic hypothesis [Anfinsen, 1973] Refolding experiment folds to the same native state native state is energetically stable Energy funnel roll down free energy hill avoid local minima traps [Dill and Chan, 1997] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 3 / 14

The two aspects of folding Towards practical prediction Energy landscape all-atom force field statistical potential Search method random walk structure optimisation [Dill and Chan, 1997] Folding@home 8.5 peta FLOPS 10 000 CPU days for 10µs of folding Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 4 / 14

The two aspects of folding Towards practical prediction Energy landscape all-atom force field statistical potential Search method random walk structure optimisation [Dill and Chan, 1997] Folding@home 8.5 peta FLOPS 10 000 CPU days for 10µs of folding Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 4 / 14

Community wide prediction experiment Critical Assessment of techniques for protein Structure Prediction CASP facts biannual competition started in 1994 parallel prediction and experimental verification model assessment by human experts 9th edition of CASP 150 human groups 140 server groups Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 5 / 14

How to find good quality models? Correlation between energy and distance to the native structure energy Requirements energy reflects distance distance reflects similarity native state distance Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 6 / 14

How the best of CASP do it? Energy of models vs. distance to a target structure Similarity measure RMSD = 1 i=n δ 2 N i i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 7 / 14

How the best of CASP do it? Energy of models vs. distance to a target structure Similarity measure RMSD = 1 i=n δ 2 N i i=1 Decoys generated by I-TASSER [Wu et al., 2007] Robetta [Rohl et al., 2004] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 7 / 14

How the energy function is designed? Weighted sum vs. free combination of terms F( T ) = w 1 T 1 +... w n T n [Zhang et al., 2003] «F( T ) = T 1 T 3 w 1 log(t 2 ) + sin T 4 w 2 T 1 T 5 exp(cos(w 1 T 3 )) Decision support local numerical approximation GP input terminals: T 1,..., T 8 functions: add sub mul div sin cos exp log random ephemerals in range [0,1] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 8 / 14

How the energy function is designed? Weighted sum vs. free combination of terms F( T ) = w 1 T 1 +... w n T n [Zhang et al., 2003] «F( T ) = T 1 T 3 w 1 log(t 2 ) + sin T 4 w 2 T 1 T 5 exp(cos(w 1 T 3 )) Decision support local numerical approximation GP input terminals: T 1,..., T 8 functions: add sub mul div sin cos exp log random ephemerals in range [0,1] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 8 / 14

How the energy function is designed? Weighted sum vs. free combination of terms F( T ) = w 1 T 1 +... w n T n [Zhang et al., 2003] «F( T ) = T 1 T 3 w 1 log(t 2 ) + sin T 4 w 2 T 1 T 5 exp(cos(w 1 T 3 )) [Widera et al., 2010] Decision support local numerical approximation GP input terminals: T 1,..., T 8 functions: add sub mul div sin cos exp log random ephemerals in range [0,1] Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 8 / 14

Can GP improve over a weighted sum of terms? Nelder-Mead downhill simplex optimisation spearman-sigmoid correlation method d-100 all d-100 all simplex 0.734 0.638 0.650 0.166 GP 0.835 0.714 *0.740 *0.200 Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 9 / 14

Criteria for human-competitivness CRITERION F result >= past achievement in the field CRITERION E result >= most recent human-created solution to a long-standing problem CRITERION H result holds its own in a competition involving human contestants Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 10 / 14

Criteria for human-competitivness CRITERION F result >= past achievement in the field CRITERION E result >= most recent human-created solution to a long-standing problem CRITERION H result holds its own in a competition involving human contestants Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 10 / 14

Criteria for human-competitivness CRITERION F result >= past achievement in the field CRITERION E result >= most recent human-created solution to a long-standing problem CRITERION H result holds its own in a competition involving human contestants Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 10 / 14

Comparison to the human made solution 1 automated method to discover the best combination of the energy terms 2 human-competitive improvement to the solution of a long-standing problem 3 challenge weighted sum of terms with expert-picked weights Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 11 / 14

Potential impact 1 automated energy design using a free functional combination of terms haven t been used before 2 energy functions determines the search landscape and its smoothness is a key to the efficient prediction 3 long-term effects in protein science that the improvement in prediction quality could bring Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 12 / 14

Why this is the best entry? 1 innovates the field with a novel approach to a long-standing problem 2 could be a step towards more accurate prediction and in a long-term improve drug design and identification of disease-causing mutations 3 represent a new and difficult challange for GP http://www.infobiotics.org/gpchallenge/ Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 13 / 14

References Anfinsen, C. (1973). Principles that Govern the Folding of Protein Chains. Science, 181(4096):223 30. Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nat Struct Mol Biol, 4(1):10 19. Rohl, C. A., Strauss, C. E. M., Misura, K. M. S., and Baker, D. (2004). Protein Structure Prediction Using Rosetta. In Brand, L. and Johnson, M. L., editors, Numerical Computer Methods, Part D, volume Volume 383 of Methods in Enzymology, pages 66 93. Academic Press. Widera, P., Garibaldi, J., and Krasnogor, N. (2009). Evolutionary design of the energy function for protein structure prediction. In IEEE Congress on Evolutionary Computation 2009, pages 1305 1312, Trondheim, Norway. Widera, P., Garibaldi, J., and Krasnogor, N. (2010). GP challenge: evolving energy function for protein structure prediction. Genetic Programming and Evolvable Machines, 11(1):61 88. Wu, S., Skolnick, J., and Zhang, Y. (2007). Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol, 5(1):17. Zhang, Y., Kolinski, A., and Skolnick, J. (2003). TOUCHSTONE II: A New Approach to Ab Initio Protein Structure Prediction. Biophys. J., 85(2):1145 1164. Natalio Krasnogor Evolutionary design of energy functions for PSP HUMIES 2010 14 / 14