The computational Core of Molecular Modeling. (The What? Why? And How? )

Similar documents
Computational structural biology and bioinformatics

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Fondamenti di Chimica Farmaceutica. Computer Chemistry in Drug Research: Introduction

Proteins are not rigid structures: Protein dynamics, conformational variability, and thermodynamic stability

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Molecular Mechanics, Dynamics & Docking

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Thermodynamics. Entropy and its Applications. Lecture 11. NC State University

An introduction to Molecular Dynamics. EMBO, June 2016

The protein folding problem consists of two parts:

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Limitations of temperature replica exchange (T-REMD) for protein folding simulations

Early Stages of Drug Discovery in the Pharmaceutical Industry

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Modeling Biomolecular Systems II. BME 540 David Sept

Molecular Interactions F14NMI. Lecture 4: worked answers to practice questions

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

ONETEP PB/SA: Application to G-Quadruplex DNA Stability. Danny Cole

SAM Teacher s Guide Protein Partnering and Function

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

Lecture 34 Protein Unfolding Thermodynamics

Simulating Folding of Helical Proteins with Coarse Grained Models

The Molecular Dynamics Method

Dr. Sander B. Nabuurs. Computational Drug Discovery group Center for Molecular and Biomolecular Informatics Radboud University Medical Centre

Free energy simulations

Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar.

Energy Landscapes and Accelerated Molecular- Dynamical Techniques for the Study of Protein Folding

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

Evolutionary design of energy functions for protein structure prediction

Protein Folding. I. Characteristics of proteins. C α

Why study protein dynamics?

Short Announcements. 1 st Quiz today: 15 minutes. Homework 3: Due next Wednesday.

ALL LECTURES IN SB Introduction

NMR study of complexes between low molecular mass inhibitors and the West Nile virus NS2B-NS3 protease

Energetics and Thermodynamics

arxiv: v1 [cond-mat.soft] 22 Oct 2007

Introduction to" Protein Structure

Structural biology and drug design: An overview

Applications of Molecular Dynamics

Enhancing Specificity in the Janus Kinases: A Study on the Thienopyridine. JAK2 Selective Mechanism Combined Molecular Dynamics Simulation

Structure Investigation of Fam20C, a Golgi Casein Kinase

Molecular Dynamics. A very brief introduction

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Development of Software Package for Determining Protein Titration Properties

BIOCHEMISTRY Unit 2 Part 4 ACTIVITY #6 (Chapter 5) PROTEINS

Biophysics II. Key points to be covered. Molecule and chemical bonding. Life: based on materials. Molecule and chemical bonding

QM/MM study on inhibitor of HIV-1 protease. Graduate School of Science, Kyoto University Masahiko Taguchi, Masahiko Kaneso, Shigehiko Hayashi

Supplementary Figure S1. Urea-mediated buffering mechanism of H. pylori. Gastric urea is funneled to a cytoplasmic urease that is presumably attached

Protein Simulations in Confined Environments

In silico pharmacology for drug discovery

Free Energy. because H is negative doesn't mean that G will be negative and just because S is positive doesn't mean that G will be negative.

Molecular Dynamics. Molecules in motion

NB-DNJ/GCase-pH 7.4 NB-DNJ+/GCase-pH 7.4 NB-DNJ+/GCase-pH 4.5

Advanced Molecular Dynamics

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Enzyme Catalysis & Biotechnology

Introduction to Computational Structural Biology

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Lecture 11: Enzymes: Kinetics [PDF] Reading: Berg, Tymoczko & Stryer, Chapter 8, pp

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

The Computational Microscope

The Molecular Dynamics Simulation Process

2-4 Chemical Reactions and Enzymes

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Principles and Applications of Molecular Dynamics Simulations with NAMD

Overview & Applications. T. Lezon Hands-on Workshop in Computational Biophysics Pittsburgh Supercomputing Center 04 June, 2015

Crystal Structure Prediction using CRYSTALG program

Biological Thermodynamics

Improving Protein Function Prediction with Molecular Dynamics Simulations. Dariya Glazer Russ Altman

Aggregation of the Amyloid-β Protein: Monte Carlo Optimization Study

Identification of core amino acids stabilizing rhodopsin

Modeling Biological Systems Opportunities for Computer Scientists

Free energy, electrostatics, and the hydrophobic effect

Structural Bioinformatics (C3210) Molecular Mechanics

VMD: Visual Molecular Dynamics. Our Microscope is Made of...

Molecular Structure Prediction by Global Optimization

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Biomolecules: lecture 10

The Molecular Dynamics Method

Energy functions and their relationship to molecular conformation. CS/CME/BioE/Biophys/BMI 279 Oct. 3 and 5, 2017 Ron Dror

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Distance Constraint Model; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 11

Molecular dynamics simulation of Aquaporin-1. 4 nm

Physical Models of Allostery: Allosteric Regulation in Capsid Assembly

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Molecular Dynamics, Monte Carlo and Docking. Lecture 21. Introduction to Bioinformatics MNW2

Case Study: Protein Folding using Homotopy Methods

Properties of amino acids in proteins

Biochemistry 3100 Sample Problems Binding proteins, Kinetics & Catalysis

Transcription:

The computational Core of Molecular Modeling. (The What? Why? And How? ) CS Seminar. Feb 09. Alexey Onufriev, Dept of Computer Science; Dept. of Physics and the GBCB program.

Thanks to: Faculty co-authors: T.M. Murali, M. Prisant, L. Heath, C. Simmerling Student co-authors: J. Gordon, J. Myers, A. Fenley, J. Ruscio, R. Anandakrishnan, D. Kumar, M. Shukla, V. Sojia Sponsors: NIH, VT. Support: System-X team, N. Polys (myoglobin graphics)

Will focus on the computational side of solving problems in the following areas: a. Rational Drug Design b. From atomic motion to biological function c. The grand challenge of computational science: the Protein Folding Problem.

The emergence of in virtuo Science. in vivo in vitro in virtuo

Biological function = f( 3D molecular structure ) A T G C DNA sequence Protein structure Bilogical function How can we predict/understand/modify function if we know the structure? Can we predict the structure? Key challenges: Biomolecular structures are complex (e.g. compared to crystal solids). Biology works on many time scales. Experiments can only go so far. A solution: Computational methods. Model movements of individual atoms.

The paradigm: All things are made of atoms, and everything that living things do can be understood in terms of the wigglings and jigglings of atoms R. Feynmann Suggests the approach: model what nature does, i.e. let the molecule evolve with time according to underlying physics laws.

A protein on a surface. Atomic resolution

Theme 1. Rational (structurebased) design of new medicines: Picture: Design of a HIV protease inhibitor. Hornak, V.; Okur, A., Rizzo, R. and Simmerling, C., HIV-1 protease flaps spontaneously open and reclose in molecular dynamics simulations, Proc. Nat. Acad. Sci. USA, 103:915-920 (2006)

Example: rational drug design. If you block the enzymes function you kill the virus. e.g:viral protease (chops up proteins) Drug agent

Example of successful computer-aided (rational) drug design: One of the drugs that helped slow down the AIDS epidemic (part of anti-retro viral cocktail). The drug blocks the function of a key viral protein. To design the drug, one needs a precise 3D structure of that protein.

A computational challenge: Need high quality, complete protein structures. But the experiment (Xray) does not ``see hydrogen atoms. These have to be placed computationally.

Combinatorial explosion problem: A molecule with N sites, each of which can be occupied by an H+ atom (or be empty), has 2 N possible states. X 1 X + 2 + + W 23 X 3 All of the possible charge (protonation) states must be taken into account. For a typical protein with 100 ionizable amino-acids this means 2 100 ~ 10 30 possible variants to consider! And we need to find the minimum energy state among them (for the experts: what we really need is, of course, the partition function Z, see below. Which is an even more computationally demanding job) N ΔG k (ph) = Σ x i k (kt ln10 ph - Δ E i calc ) + 1/2ΣΣx ik x j k W ij Free energy i X i = 1 or 0 (occupied or empty in state k) k protonation state (out of 2 N ) N N i j trouble Matrix of site-site interactions. Z = Σ all states exp(-δg k /kt)

Beating the combinatorics: the clustering approach. Example: a protein with N = 6 ionizable groups. Total number of protonation states = 2 N = 64 Every site interacts with every other site, a complete graph Cluster 1, N=3 Cluster 2, N=3 Solution: neglect the ``weak edges. Cluster the strong ones. After clustering: Total number of states = 2 3 + 2 3 = 16 < 64.

http://biophysics.cs.vt.edu/h++ A web-server that adds hydrogens to molecular structures. Launched by Onufriev s group in June 2005. ~1000 registered users since.

Intrigued? Suggested reading: The Many roles of computation in drug discovery, W. Jorgensen, Sceince 303, 1813 (2004). + references therein.

THEME II The protein folding challenge. Nature does it all the time. Can we? Amino-acid sequence translated genetic code. MET ALA ALA ASP GLU GLU--. How? Experiment: amino acid sequence uniquely determines proteinʼs 3D shape (ground state). Why bother: proteinʼs shape determines its biological function.

Protein Structure in 3 steps. Step 1. Two amino-acids together (di-peptide) Peptide bond Amino-acid #1 Amino-acid #2

Protein Structure in 3 steps. Step 2: Most flexible degrees of freedom:

A protein is simply a chain of amino-acids: φ 1 φ 2 φ 3 φ 4 Each configuration {Φ1, Φ2, Φκ} has some energy. The folded (biologically functional) protein has the lowest possible energy - global minimum. So just find this conformation by some kind of a minimization algorithm what s he big deal?

The magnitude of the protein folding challenge: Enormous number of the possible conformations of the polypeptide chain φ 1 φ 2 φ 3 φ 4 A typical protein is a chain of ~ 100 mino acids. Assume that each amino acid can take up only 10 conformations (vast underestimation) Total number of possible conformations: 10 100 Suppose each energy estimate is just 1 float point operation. Suppose you have a Penta-Flop supercomputer. An exhaustive search for the global minimum would take 10 85 seconds ~ 3*10 78 years. Age of the Universe ~ 2*10 10 years.

2 Free energy 1 3 Folding coordinate Adopted from Ken Dill s web site at UCSF Finding a global minimum in a multidimensional case is easy only when the landscape is smooth. No matter where you start (1, 2 or 3), you quickly end up at the bottom - - the Native (N), functional state of the protein.

Adopted from Ken Dill s web site at UCSF Realistic landscapes are much more complex, with multiple local minima folding traps. Proteins trapped in those minima may lead to disease, such as Altzheimer s

Adopted from Dobson, NATURE 426, 884 2003

Since minimization won t work, choose an alternative. Do what Nature does: just let it fold on its own, at normal temperature. Method: Molecular Dynamics

Principles of Molecular Dynamics (MD): Y Each atom moves by Newton s 2 nd Law: F = ma F = de/dr System s energy - Bond spring + x E = Kr 2 Bond stretching + A/r 12 B/r 6 VDW interaction + Q 1 Q 2 /r Electrostatic forces +

Now we have positions of all atoms as a function of time. Can compute statistical averages, fluctuations; Analyze side chain movements, Cavity dynamics, Domain motion, Etc.

Molecular Dynamics: PRICIPLE: Given positions of each atom x(t) at time t, its position at next time-step t + Δt is given by: force x(t + Δt) x(t) + v(t) Δt + ½*F/m * (Δt) 2 Key parameter: integration time step Δt. Controls accuracy and speed of numerical integration routines. Smaller Δt more accurate, but need more steps. How many needed to simulate biology? How many can one afford?

As a result, we can not quite get into the biological time scales. Currently accessible times biology 10-14 10-6 10 0 H-C bond vibration Protein folding Characteristic time scales [sec] Time-step, Δt For stability, Δt must be at least an order of magnitude less than the fastest motion, i.e Δt ~ 10-15 s. Example: to simulate folding of the fastest folding protein, at least 10-6 /10-15 = 10 9 steps will be needed.

The bottleneck of the methodology: computation of long-range interactions. Electrostatic interactions fall of as inverse distance between atoms. Too strong to neglect. Need to account for all of them. Very expensive. Up to 99% of total cost for a protein.

Massive parallel machines help. Virginia Tech s supercomputer, System-X

The worst problem for parallel computations: Processor #1 Processor #2 X 1, F(on X 1 ) X 2, F(on X 2 ) Force acting on each atom depends upon positions of every other atom in the system. Processor #4 Processor #3 Computed coordinates have to be communicated between all processors at each step

Without approximations, computation of long-range electrostatic forces will cost you O(N 2 ), where the number of atoms N may be as large as N ~ 10 6. Too expensive for large systems. Every atom interacts with every other atom, a complete graph A solution: combine charges (vertices) into groups, that is use coarse-graining. +3 After coarse-graining costs can be as low as Nlog(N) Works because macromolecules are naturally partitioned into hierarchical levels: atoms -> amino-acids -> proteins -> complexes..

Simulated Refolding pathway of the 46-residue protein. Molecular dynamics based on AMBER-7 Movie available at: www.scripps.edu/~onufriev/research/in_virtuo.html 0 1 1 2 3 4 5 6 NB: due to the absence of viscosity, folding occurs on much shorter time-scale than in an experiment.

Intrigued? Suggested articles: 1. Protein Folding and Misfolding, C. Dobson, Nature 426, 884 (2003). 2. Design of a Novel Globular Protein Fold with Atomic-level Accuracy, Kuhlman et al., Science, 302, 1364, (2003) + references therein.

Theme III: how does oxygen get inside the protein myoglobin? Trivia: it is myoglobin (with oxygen bound to it) that gives fresh meat its red color. The protein stores oxygen.

Myoglobin the hydrogen atom of structural biology Exactly what are the migration pathways of small non-polar ligands (O2, CO, NO) inside the protein? A 50 year old question. O 2

The heme group the heart of myoglobin. O2

Myoglobin: protein responsible for oxygen storage in all breathing organisms. O 2? How does Oxygen, or other small non-polar ligands get in to bind to Fe atom in its core? Fe

After 50 years of research many questions still remain. Example: one or many pathways? Single dominant pathway, HIS gate e.g. Olson et al. Multiple pathays, e.g. Elber and Karplus

Two complementary approaches: #1 Simulate explicit ligand (CO) trajectories via microseconds of room temperature, explicit solvent MD. #2 Compute dynamic volume fluctuations in the protein.

Approach #1: Straightforward MD (explicit solvent, PME, NPT, 300K, 7 µs total) 1. Simulation type #1. Model photodetachement of carbon monoxyde (CO) ligand. 20 trajectories, 90 ns each. The ligand starts at the docking site (Fe) and comes out, sometimes. 2. Simulation type #2. Model diffusion of the ligand from the outside. The ligand starts in the solvent, and sometimes diffuses all the way to the docking site inside. 48 trajectories, each 90 ns long.

Raw Result #1: a sample of simulated trajectories that show ligand migration paths multiple individual trajectories combined in one slide Photodetachement experiment. CO Initially at the binding site CO initially in the water

Summary of the MD trajectories: Pathway #1 Pathway #2 Concern: Have we explored ALL the pathways? Challenge: can we get the same pathways cheaper?

Approach #2. PATHFINDER dynamic free volume analysis. Run a separate, much shorter (15 ns) MD in which the ligand does not move. In each snapshot, identify cavities large enough to hold the ligand. Purely steric criterion. Combine the cavities over the entire trajectory, determine connectivity. timeline pathway

The challenge: protein shape is very complex, internal cavities are hiding within, also very complex shape. Connectivity to the outside is a non-trivia issue. In this case, it is critical to have a SIMPLE and robust algorithm, to have just ONE complexity to deal with, not a product of the two: (problem complexity) x (algorithm complexity).

Free volume analysis (PATHFINDER)

Pathways from volume fluctuations (PATHFINDER):

Combining the two approaches: Explicit trajectories (MD) Free volume fluctuations J. Ruscio et al. Atomic level computational identification of ligand migration pathways between solvent and binding site in myoglobin, Proc. Natl. Acad. Sci. USA, 105, 9204 (2008)

The end result: oxygen diffusion pathways in myoglobin: The pathways (blue) occur inside the rigid scaffold structures (helices, white).

The law of (bio/phys/chem)- computational science: speed easy hard genius 0 accuracy

Lunch time