Case Study: Protein Folding using Homotopy Methods

Similar documents
Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Docking. GBCB 5874: Problem Solving in GBCB

Prediction of double gene knockout measurements

Assignment 2 Atomic-Level Molecular Modeling

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Quasi-Newton Methods

CAP 5510 Lecture 3 Protein Structures

UNIT 5. Protein Synthesis 11/22/16

Introduction to gradient descent

Translation Part 2 of Protein Synthesis

An introduction to Molecular Dynamics. EMBO, June 2016

The protein folding problem consists of two parts:

Cybergenetics: Control theory for living cells

Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

HOPE: A Homotopy Optimization Method for Protein Structure Prediction. DANIEL M. DUNLAVY, 1 DIANNE P. O LEARY, 2 DMITRI KLIMOV, 3 and D.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Syllabus BINF Computational Biology Core Course

Computational Biology: Basics & Interesting Problems

Papers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function. The importance of proteins

Learning Tetris. 1 Tetris. February 3, 2009

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Computational Biology & Computational Medicine

Bioinformatics 2 - Lecture 4

3. An Introduction to Molecular Mechanics

BIOINFORMATICS: An Introduction

TRANSCRIPTION VS TRANSLATION FILE

Graphical Analysis and Errors - MBL

Dynamic Risk Measures and Nonlinear Expectations with Markov Chain noise

Advanced Molecular Dynamics

Bioinformatics and BLAST

SAM Teacher s Guide Protein Partnering and Function

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

What is Systems Biology

Case study: spider mimicry

Saturation of Information Exchange in Locally Connected Pulse-Coupled Oscillators

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Essential Forces in Protein Folding

Lecture 6 Optimization for Deep Neural Networks

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Potts And XY, Together At Last

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

A First Course on Kinetics and Reaction Engineering Supplemental Unit S4. Numerically Fitting Models to Data

1 Introduction. Abstract

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Multiple Choice Review- Eukaryotic Gene Expression

O 3 O 4 O 5. q 3. q 4. Transition

Lecture 2: Divide and conquer and Dynamic programming

Sin, Cos and All That

8.334: Statistical Mechanics II Problem Set # 4 Due: 4/9/14 Transfer Matrices & Position space renormalization

Organic Chemistry Option II: Chemical Biology

BIMS 503 Exam I. Sign Pledge Here: Questions from Robert Nakamoto (40 pts. Total)

Molecular Dynamics Simulation of HIV-1 Reverse. Transcriptase

Optimization for neural networks

Topic 3: Neural Networks

Molecular Modeling lecture 2

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Discussion of Maximization by Parts in Likelihood Inference

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Directed Reading B continued

Dynamic Approaches: The Hidden Markov Model

Lesson Overview The Structure of DNA

ALL LECTURES IN SB Introduction

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Penalized Loss functions for Bayesian Model Choice

full file at

Genomes and Their Evolution

Scientists have been measuring organisms metabolic rate per gram as a way of

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Example 2.1. Draw the points with polar coordinates: (i) (3, π) (ii) (2, π/4) (iii) (6, 2π/4) We illustrate all on the following graph:

What Can Physics Say About Life Itself?

One of the powerful themes in trigonometry is that the entire subject emanates from a very simple idea: locating a point on the unit circle.

Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation...

Structure-Based Comparison of Biomolecules

Graphical Analysis and Errors MBL

CSC321 Lecture 8: Optimization

Nonlinear Optimization for Optimal Control

AP Biology. Proteins. AP Biology. Proteins. Multipurpose molecules

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

Molecular Dynamic Simulations in JavaScript

3. An Introduction to Molecular Mechanics

CE 530 Molecular Simulation

The Potential Energy Surface

Lecture 1: Supervised Learning

CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004

An Algorithm for Unconstrained Quadratically Penalized Convex Optimization (post conference version)

17 Solution of Nonlinear Systems

Nonlinear equations and optimization

CS 6820 Fall 2014 Lectures, October 3-20, 2014

Solving distance geometry problems for protein structure determination

Numerical Algorithms as Dynamical Systems

Why study protein dynamics?

Stochastic Proximal Gradient Algorithm

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Final exam. Please write your name on the exam and keep an ID card ready. You may use a calculator (but no computer or smart phone) and a dictionary.

Transcription:

Case Study: Protein Folding using Homotopy Methods james robert white Since the late 1990s, the world of genomics has exploded. Beginning with the bacterium Haemophilus influenzae in 1995, researchers have sequenced the genomes of thousands of different organisms, which include dog, rat, mouse, and two versions of human. The central dogma of molecular biology is that DNA is transcribed to RNA which is then translated to an amino acid sequence or protein. As the protein is translated, it folds into a distinct shape that determines its function. Under stable conditions, a protein will always fold the same way because its folded shape has minimum free energy. A mutation in a gene at the DNA level has the potential to change the amino acid composition of the resulting protein. This new protein may not fold in the same way as the original, and the function of the protein may be destroyed. Many diseases are linked to protein misfolding such as: Alzheimer s, Huntington s, Parkinson s, and several types of cancer. The pharmaceutical industry deps on knowing the structure of a folded protein. After determining the structure, they can create drugs to interact with a specific region of the protein. Current methods such as X-ray crystallography are tremously expensive and time-consuming. It can take years to decipher the structure of a single protein. In an effort to accelerate the pace of research, several groups are working to develop computational methods to determine the three-dimensional structure of a protein from its amino acid sequence. No sufficiently accurate methods have been developed yet, but databases of known structures are helping researchers gain insight into the major features of new proteins. The intuition is, if you know the structure of one protein, and you want to know how a similar protein folds, then use the structure you know as a starting guess. This paradigm of computational protein folding fits the concept of homotopy very well. In this case study, we will use the Homotopy Optimization Method (HOM) to predict how chains of particles fold in two dimensions. In addition, we shall implement a new extension of HOM known as Homotopy Optimization using Perturbations and Ensembles or HOPE. The HOPE algorithm was developed by Daniel Dunlavy at the University of Maryland, and this case study parallels the experiments in his thesis. See pointer for additional information. 1

The Free Energy of 2-D Charged Particle Chains Suppose that we have a chain of n charged particles in two dimensions. The charges on each particle can be +1 or -1 and each particle in the chain has a fixed bond length of r. Then, the angles between all pairs of adjacent bonds define the structure of the chain. Let x = (x 1,..., x n ) R 2n be the vector of coordinates in the xy plane with x k R 2 representing the two-dimensional coordinates of the k th particle. For example, the five particles x 1,..., x 5, are represented in the following figure: Figure 1: An example of particle structure with bond angles. Let the first particle in the chain be fixed at (r, 0) and the second particle be fixed at (0,0). Therefore, given an n 2 vector of angles, θ = (θ 1,..., θ n 2 ), where θ i [0, 2π) represents the bond angle between x i, x i+1, and x i+2, we can use the following recursion to compute the positions of every particle in the chain. x k = x k 1 + ( cos(θk 2 ) sin(θ k 2 ) sin(θ k 2 ) cos(θ k 2 ) ) (x k 2 x k 1 ) CHALLENGE 1 To familiarize yourself with the recursion, write a function in MATLAB that plots the coordinates of a particle chain given a vector of bond angles. Let r = 1.5. Recall that we fix the first and second particles at (r,0) and (0,0), respectively. To get a balanced view of the chain, use axis equal when plotting. Plot the coordinates of the chain for the following input bond angle vectors: θ 0 = [pi/2; 7*pi/8; 7*pi/8], [pi; pi/2; pi/3; pi/4; pi/5], and [4*pi/5; 7*pi/5; 4*pi/5; 7*pi/5; 4*pi/5; 7*pi/5]. We shall declare the distance between two particles x i and x j to be r ij = x i x j 2. Now that we have a way to construct the chain s position from its 2

internal bond angles, we may define the potential energy function, E : R n 2 R as: where and E vdw (θ) = E(θ) = E vdw (θ) + E el (θ), n 3 n i=1 j=i+3 E el (θ) = ɛ ( ( σ n 2 r ij n i=1 j=i+2 ) 12 ( ) ) 6 σ 2, r ij q i q j r ij. E vdw (θ) is the van der Waals equation, which is the sum of all interactions between particles along the chain. E el (θ) is the equation for electrostatic potential, and q i equals the charge of particle x i (+1 or -1). Note that although θ is the input of the energy equation, it is nowhere to be found on the right side. We must use the recursion to calculate the coordinates given the vector of bond angles in order to solve this equation. Figure 2 shows a chain of 40 particles of alternating charges which has a very high energy value. A natural chain would quickly unfold from this state because it seeks its minimum free energy. So, given a chain of charged particles, we are trying to find θ opt R n 2 such that E(θ opt ) = min θ E(θ). CHALLENGE 2 Use the code from the previous challenge to write a function which calculates E(θ) of a chain of particles assuming all particles are positively charged. Let eplison = 0.4 and sigma = 3.6 in the van der Waals equation. Find the energy of a chain of particles with each of the following starting values: θ 0 = [pi/2; 7*pi/8; 7*pi/8], [pi; pi/2; pi/3; pi/4; pi/5], and [4*pi/5; 7*pi/5; 4*pi/5; 7*pi/5; 4*pi/5; 7*pi/5]. Defining a Homotopy To design a proper homotopy for the free energy function, we must think about which terms are actually changing in the equations as we transition from the equation for our initial chain, E 0 (θ), to our final chain, E 1 (θ). We assume that the initial and final chains both have the same number of particles. Therefore, the only thing that changes from one molecule to the next are the charges on each particle. This means we do not need to define the homotopy with respect to E vdw (θ). We are only concerned with deforming the initial charges q 0 to the final charges q 1. For clarity, we consider q 0 k to be the charge of the kth particle in the initial chain. 3

16 14 12 10 y 8 6 4 2 0 18 16 14 12 10 8 6 4 2 0 2 x Figure 2: A chain of particles organized in a structure with energy 3.6320e+03. This is nowhere near the natural state of the chain, it just looks cool. Thus, we define the homotopy function as: H(θ, λ) = n 2 n i=1 j=i+2 (λqi 1 + (1 λ)q0 i )(λq1 j + (1 λ)q0 j ) + E vdw (θ) The Homotopy Optimization Method (HOM) Recall in Chapter 24, we discussed the homotopy method for solving nonlinear equations. The following variation allows us to use the homotopy function to solve an unconstrained minimization problem. To illustrate the application to our problem, we use our homotopy equation in the description. r ij 4

Algorithm 1 HOM Algorithm Begin with a homotopy function H(θ, λ), and a local minimizer of H(θ, 0), θ 0. Set λ = 0, and let m be some integer such that m 1. for k = 1,..., m Increase λ by 1/m. Using your favorite algorithm, minimize H(θ, λ) starting with θ (k 1), obtaining θ (k). At the, we have computed θ (m) which minimizes H(θ, 1), so θ (m) minimizes our original problem. CHALLENGE 3 For the free energy equation, let σ = 3.6, ɛ = 0.4, and r = 1.5. (a) Use the Quasi-Newton BFGS method (fminunc) to find a global minimizer of the energy of the molecule: q 0 = [+1 +1 +1 +1 +1 +1]. Turn off the LargeScale option for (fminunc). The minimum energy of this chain is 0.7741. Give an example of a starting value for which Quasi-Newton does not converge to a global minimizer, and one example for which it does converge to a global minimizer. (b) Implement the HOM algorithm to solve for the minimum energy of the molecule q 1 = [-1-1 +1 +1 +1 +1] using the global minimizer of part (a) as your starting value and q 0 as your starting molecule. Use Quasi-Newton BFGS for minimization. Let m = 10, so the method takes 10 steps. After finishing the algorithm, plot (side-by-side) the initial molecule at its global minimizer, and the final molecule at its minimizer. Mark negatively charged particles with a red circle, and positively charged particles with a blue circle with a + in the center. At the top of the plots, give the minimum energy found for that chain. Do the same starting with the non-global minimizer from (a). The global minimum energy of q 1 is -3.4178. Did you find a global minimizer in both cases? The HOPE Method Despite the improvement of HOM over standard Newton-like methods, we usually seek the global minimizer of the problem. HOM effectively creates a set of points which converge to a local minimizer, but if it is not the global minimizer, then the structure of the chain of particles will not be correct. In an effort to improve the probability of finding the true global minimizer, there exists an extension of HOM that generates an ensemble of paths which converge to different local minimizers simultaneously. The HOPE algorithm takes a set of minimizers found from a previous step, and creates several perturbations of each minimizer. Then, it minimizes the problem starting at each of these perturbed versions and selects a set of points that generate the lowest unique values of the function. 5

For the perturbation of each minimizer in our problem, we denote a function ξ : R N R N, where ξ(θ) is a stochastically perturbed version of the minimizer θ. As shown below, there are a few additional variables that must be considered in the HOPE algorithm. c max is an integer describing the maximum size of an ensemble allowed for each step. This is important because if we had no upper bound on ensemble size, then it would increase exponentially. ĉ is the number of perturbed versions of a minimizer to create for each step. Algorithm 2 HOPE Algorithm Begin with a homotopy function H(θ, λ), and a local minimizer of H(θ, 0), θ 0. Set λ = 0, c (0) = 0, and let m, c max, and ĉ be integers such that m 1, c max 1, and ĉ 0. for k = 1,..., m Increase λ by 1/m. for i = 1,..., c (k 1) Minimize H(θ, λ) starting with θ (k 1) i if ĉ > 0 then for j = 1,..., ĉ Minimize H(θ, λ) starting with ξ(θ (k 1) i c (k) = min{c (k 1) (ĉ + 1), c max } θ (k) 1,..., θ(k) c (k) 1,..., c (k 1), j = 0,..., ĉ, obtaining θ (k) i,0. ), obtaining θ (k) i,j. = the best (unique) local minimizers out of θ (k) i,j, i = At the, we choose the minimizer with the lowest function value of θ (m) i, i = 1,..., c (m) to be our output. By creating an ensemble of different minimizers, the HOPE algorithm increases the likelihood of finding the true global minimizer. This method is very effective for problems that have many local minima, but due to the number of minimization calls, there may be a strain on computational resources. Tuning the variables appropriately for each situation is essential. 6

CHALLENGE 4 (a) For the HOPE algorithm, if c max was, what would be the value of c after the k th iteration of the program? What values of ĉ and c max reduce the HOPE algorithm to HOM? (b) Consider our problem of folding, and choose a stochastic permutation function which will be used to generate perturbed versions of minimizers found in previous steps of HOPE. Discuss why you chose this function over other possibilities. (c) Implement the HOPE algorithm with your permutation function from (b). Let m = 5, ĉ = 3, and c max = 6. Begin by finding a local minimizer of the energy for: q 0 = [+1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1] given a starting guess of θ 0 = [9*pi/10; 9*pi/10; 9*pi/10; 9*pi/10; 9*pi/10; 9*pi/10; 9*pi/10; 9*pi/10; 9*pi/10]. Then, use HOPE to find a minimizer for the energy of q 1 = [+1 +1 +1-1 -1-1 +1 +1 +1-1 -1-1]. Turn off the LargeScale option for (fminunc). After finishing the algorithm, plot (side-by-side) the initial molecule at its local minimizer, and the final molecule at its minimizer. Mark negatively charged particles with a red circle, and positively charged particles with a blue circle with a + in the center. At the top of the plots, give the minimum energy found for that chain. (d) Using the same input parameters as in part (c), compare your results with the HOM function from the previous challenge. Which method resulted in the better structure prediction? Can HOPE ever do worse than HOM? Why? 7

POINTER. The HOPE method was developed at the University of Maryland by Dunlavy in 2005. For extensive results on 2-D folding of charged particles, see Dulavy, D. (2005) Homotopy Optimization Methods and Protein Structure Prediction. PhD thesis - University of Maryland - College Park. For information on the extension of HOPE to 3-D advanced protein folding problems, see Dunlavy, D. et al. (2005) HOPE: A Homotopy Optimization Method for Protein Structure Prediction. Journal of Computational Biology, 12, 1275-1288. For other information on computational protein folding: Li, H. et al. (1996) Emergence of Preferred Structures in a Simple Model of Protein Folding. Science 273, 666-669. Radford, S. and C. Dobson. (1999) From Computer Simulations to Human Disease: Emerging Themes in Protein Folding. Cell, 97, 291-298. and http://folding.stanford.edu/. The Critical Assessment of Techniques for Protein Structure Prediction (CASP) is a competition held every two years in which groups try to predict a structure of a new protein whose true folding shape has been secretly and experimentally determined. See http://predictioncenter.org/casp7/ 8