Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation...
|
|
- Maria Dickerson
- 5 years ago
- Views:
Transcription
1 Laidebeure Stéphane
2 Context of the project...3 What is protein design?...3 I The algorithms...3 A Dead-end elimination procedure...4 B Monte-Carlo simulation...5 II The model...6 A The molecular model...6 B The energy model...6 C Generating problems...7 III Results...7 Conclusion...Erreur! Signet non défini.
3 Context of the project CS273 is a class which introduces the computational approach of structure and motion in molecular biology. During that class, we have an overview of cinematic models of molecules, algorithms concerning structure and sequence similarity, and structure prediction. For the project, I wanted to cover a field which was not really covered in class, and for that reason I decided to work on protein design. To make that work interesting, I first wanted to build it inside Gromacs, which is a molecular dynamic system; in this way, my system would have been able to use all the functions of Gromacs, especially actual energy functions and the possibility to use pre-built models for molecules and amino-acids, and to read from large molecule databases to extract useful features. Unfortunately, this turned out to be a bad idea because most of Gromacs has no comments in the code, and when it has some, the comments assume you already know what the function globally does; but I could not find any good documentation about the code in Gromacs, only tutorials on how to use it. Therefore, since I could not possibly build my project on a code that I do not fully understand, I had to switch back to a simpler project, still involving protein design algorithms, but applied on rather smaller and simpler problems, since it will be a toy model instead of real molecules. What is protein design? The aim of protein design is to get proteins matching a certain structures, and therefore certain properties. More precisely, we are given a certain structure, namely a backbone, and we want to find a sequence of amino-acids which would fit into that specific structure. To find the sequence of such proteins, we are looking for molecules that could possibly be stable in this configuration. To do that, the idea is to try and build the sequence which, in that configuration, has the lowest energy. That is, we try to get a protein P such that C arg min E( P, C) where E( P, C) is the energy of protein P in configuration C; but we do C that by computing P min ( C) arg min E( P, C ). Consequently, this model makes the assumption P that the protein having the lowest energy in that configuration must be stable in this configuration. I The algorithms As we have seen before, the problem is to find a sequence of amino-acids having a certain property. The naïve method to do such a computation would be to apply backtracking on it; but for a molecule of length n, it leads to 20 n steps of backtracking, which for reasonable values of n (at least 20, which would be a rather small protein) would take way too much time. As a consequence, we need to use either approximate algorithms, which would lead to a good solution but not necessarily to the best one; or algorithms which would allow us to reduce the search, without changing the accuracy. To reduce the search space, the most common approach is the dead-end elimination procedure.
4 A Dead-end elimination procedure The idea of the dead-end elimination procedure is to try to find amino-acids which cannot possibly be part of a solution, and eliminate them. Notation: a sequence P is a shortcut for: P 0, P 1,..., P n, P n 1. The optimal criterion to eliminate such a candidate amino-acid the following: A j at position i in the chain, is if A A P 0,..., P i 1, P i 1,..., P n 1, E P, C E P, C, then k j P i A P i A we can be sure that the amino-acid A j cannot be at position i, since for all possible combinations over the rest of the molecule, there is another amino-acid which would reduce n the energy. But computing this criterion has a cost of the order of O(20 ) as well, therefore we will not expect this to be a better choice. But by having an energy function which can be expressed as a sum of unary terms and pair-wise terms, we can reduce that formula to a weaker form of it which happens to be efficient: A A, E( P i A ) E( P i A ) min E( P i A, P l A ) E( P i A, P l A ) 0 k j j k j m k m m l which means that we check whether the amino-acid j A j, in the configuration which is the most advantageous for it against A k, is still worse than this other amino-acid. If it is the case, then we can be sure that A in position i cannot be part of the optimal solution. In the code, the j function running one step of dead-end elimination is as follows: bool deadendeliminate() bool res=false; // The result variable, indicates if there is an update. for(int i=0;i<nb;i++) // For all positions in the chain vector<int> tmp=allowed[i]; allowed[i].clear(); for(int j=0;j<tmp.size();j++) // for all rotamers possibly at that place bool test=false; for(int k=0;!test && k<tmp.size();k++) // for any other rotamer at that place // We first compute the difference in the unary terms double minidiff=precomputedselfenergy[i][tmp[j]] -precomputedselfenergy[i][tmp[k]]; for(int l=0;l<nb;l++) if (l!=i) // for all other positions in the chain double mindiff=infinity; for(int m=0;m<allowed[l].size();m++) // for all rotamers there double v=precomputedcoupleenergy[i][tmp[j]][l][allowed[l][m]] -precomputedcoupleenergy[i][tmp[k]][l][allowed[l][m]]; if (v<mindiff) mindiff=v; minidiff+=mindiff; if (minidiff>0) test=true; // the other rotamer is always better if (!test) allowed[i].push_back(tmp[j]); // This rotamer is still allowed else res=true; // or not return res; k
5 Different other changes in that heuristic are also used in real-life problems, in particular pairwise heuristics on the choice of amino-acids (keeping track of the fact that some amino-acid would not fit well with another one, when running dead-end elimination). B Monte-Carlo simulation Monte-Carlo method is, in many situations, the best way to get an approximate solution to minimize an energy function over a large space. The general idea behind that is to build a first solution (at random), and try to minimize the energy function by doing regular small changes (changing one amino-acid, or changing two amino-acids). If done in a deterministic way, this would lead to a greedy algorithm, equivalent to a coordinate descent; but if we add some kind of simulated annealing to such a method, it leads to a procedure which is fairly good at leaving local minima, by authorizing changes that increase the total energy, with a certain time-decreasing probability. The procedure can be expressed as follows: Procedure MCAlgorithm Create random assignment; Compute total energy TE; T=MAX_TEMP; While (T>MIN_TEMP) Select small change; Compute updated energy E; If (E<TE) Store as best assignment; If (srand()>min(1,exp((e-te)/t))) reject change; else accept change; Update T; End While End Procedure The original and final values of T, as well as the way it is updated, lead to diverse behaviors of the program, from a fully greedy algorithm (if T is too small) to a purely random algorithm (if T is too big). The type of update (multiplicative or additive) will decide of the tendancy to be greedier at the end (which is better when we do not store the temporarily best solution), and the number of steps is decided in function of the expected speed of the program. In my code, the temperature values were not very big (starting at 1000, down to 0.001) and the update were multiplicative (leading to more greediness). This choice was made because the bad sequences tend to have an extremely large energy, and therefore I wanted to get rid of them earlier by having a T value which is not too big; and at the end, we want to make sure that small conformational changes cannot make it better, because we are more interested in a stable sequence. I also added another feature on restarting: with a certain probability, each 100 steps of update without having found a new optimum, with a low probability, the program may change the configuration to set it back to the optimal assignment. This allows us to make sure that, with still having a chance to visit the whole space, we tend to stay in interesting areas. To do that, the probability must be low in order to authorize the visit of any point in the space. Adding random restart procedures above this allows us to search the space of solutions in a more random way, avoiding to be stuck in a deep local minimum as could happen on a single run.
6 II The model A The molecular model The toy model I designed consists in a system of rigid sticks: an amino-acid is a sequence of a certain number of main sticks (the backbone), being allowed to rotate freely one around the other, but not around the knees. Therefore, this model has no notion of torsion angles (this assumption has been made to simplify computation, but is in no way difficult to add in the code if needed), but a notion of bond angles. I will refer to the knees as atoms, because it is the equivalent notion for real-life problems. The lengths of the sticks are also fixed, being equivalent to bond distances. In addition to the backbone, the molecular system also defines a notion of side chains, by adding new sticks associated with a position on the chain, and two angles giving the orientation of that stick. When an atom is deformed, the side chains are automatically moved in the position which makes the global structure (in terms of relative positions) as similar as possible to the original one. For example, if a bond angle is diminished, the angle in that same direction with the side chain will be multiplied by the same factor, so that the side chain will keep the same position relatively to those two atoms. At each run of the program, a library containing a certain number of variants of those aminoacids is read, and tries to find an optimal sequence fitting the target carbon chain, given by the set of 3D coordinates of its atoms. B The energy model The energy model is meant to be very similar to actual energy models, to make the computation more similar to real-life situations, so that the results should be reasonable. The energy model can be divided into two parts: a) Bond atoms energy model In the molecular model, bond atoms have rigid sticks between them, therefore their distance should be fixed; yet, we allow a certain variability in that factor, for a certain cost of energy: if d0 is the default length of the stick, the energy for having those two atoms at length d will be 12 6 d0 d0 E A 2 ; this corresponds to van der Waals binding d d energy. The angles at bindings are also an important part of the model. For them, in the same way, we have a default angle a 0, and the cost for having an anglea is therefore 0 2 E K a a. We can notice that both those energy functions have their minimum value for the default values, leading to a better stability when the amino-acids are in their standard configuration. b) Non-bound atoms model
7 For non-bound atoms, the energy model contains only a term concerning their distances. In the same way as before, the energy depends on a default value d , and the energy function is 12 6 d0 d0 E A 2. d d More generally, the final energy function for the model is the sum of all those terms. C Generating problems To generate problems to solve, and because I couldn t make it from reading PDBs, I wrote scripts that generate chains and amino-acids, in independent ways: for the chain, starting from the origin with the first atom, I set the distance to the next atom with a normal distribution centered on 1, and then for each new atom I generate a new random unitary vector and average it with the preceding one (to have some continuity in the backbone), and set it to the randomly chosen length. The library of amino-acids is generated by randomly choosing the bond distances between each pair of bound atoms, as well as the bond angle for each consecutive three atoms. The side chains are generated by setting randomly a parent atom, an angle (in the direction of the bond), and a second angle (torsion relative to the bond). III Results Average number of remaining rotamers after dead-end elimination (20 at the beginning), depending on the folding of the chain and on its length (in terms of number of amino-acids): Folding \ Length Almost straight Average Lots of folding Dead-end elimination allows us to accurately reduce the amount of search in the process of protein design. When used separately, both dead-end elimination and Monte-Carlo simulation lead to perfect and fast results on small problems, but when the size of the domain grows, Monte-Carlo simulation keeps being fast but lacks accuracy, while the backtracking step in dead-end elimination becomes too slow for being actually used. The use of both techniques together, even if it does not theoretically lead to an optimal solution, tends to give extremely good results when dead-end elimination works well, combining the advantages of both techniques in most situations (in most situations where the dead-end elimination performs well, the dead-end MC simulations gives the optimal solution; and in the other cases, it performs quite logically just like the standard MC simulation). The dead-end elimination procedure does not perform the same with all problems; its use is extremely efficient when the protein has almost no folds, leading to a greedy algorithm most of the time; and on the other hand, it seems to perform extremely poorly on proteins having lots of folds, probably because in this situation, all rotamers tend to have at least one advantage on the others. In terms of speed, the margin between the domain where the algorithm leads to a greedy search, and the domain where it becomes intractable, is really tight. For straight chains, I could run it up to fairly long chains without any trouble (most of the computational time was
8 due to the dead-end elimination rather than to the backtracking chain); on the other hand, even for chains of length 10 (amino-acids), on situations where the dead-end elimination was not extremely efficient, the backtracking would be way too long. Conclusion In this project, once the stage of finding documents and defining the subject was passed, the most difficult part has, surprisingly, been to generate interesting sets of sticks (coherent, but having enough difference to generate different behaviors), together with the chain. Because the chain could not be generated by the sticks themselves (otherwise the solution would have had an energy so much lower than the other combinations that the algorithm would have necessarily found it), the difficulty was to generate a chain which could be built with them. The poor results of dead-end elimination on my data can probably be attributed to two factors: the absence of torsion angles makes the shape of the sticks much easier to change, leading to a better capability to make them fit not too bad to any part of the molecule; and the second factor is the fact that the sticks are single long entities, while amino-acids sidechains can be considered more or less independently, leading to more local changes. One way to save a lot of computational time on this model would be to develop a forward-checking step in the backtracking, by calling the dead-end elimination at each new labeling, which would take advantage of the reduced domains of all the instantiated variables to find more rotamers to eliminate for the next step. Documents: Thoroughly sampling sequence space: Large-scale protein design of structural ensembles, STEFAN M. LARSON, JEREMY L. ENGLAND, JOHN R. DESJARLAIS, and VIJAY S. PANDE Computational protein design and discovery, SHELDON PARK, XIAORAN FU STOWELL, WEI WANG, XI YANG and JEFFERY G. SAVEN Generalized Dead-end Elimination Algorithms Make Large-Scale Protein Side-chain Structure Prediction Tractable: Implications for Protein Design and Structural Genomics, LORAL L. LOOGER and HOMME W. HELLINGA
Abstract. Introduction
In silico protein design: the implementation of Dead-End Elimination algorithm CS 273 Spring 2005: Project Report Tyrone Anderson 2, Yu Bai1 3, and Caroline E. Moore-Kochlacs 2 1 Biophysics program, 2
More informationCS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C.
CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring 2006 Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. Latombe Scribe: Neda Nategh How do you update the energy function during the
More informationCourse Notes: Topics in Computational. Structural Biology.
Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationA.I.: Beyond Classical Search
A.I.: Beyond Classical Search Random Sampling Trivial Algorithms Generate a state randomly Random Walk Randomly pick a neighbor of the current state Both algorithms asymptotically complete. Overview Previously
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationLocal Search and Optimization
Local Search and Optimization Outline Local search techniques and optimization Hill-climbing Gradient methods Simulated annealing Genetic algorithms Issues with local search Local search and optimization
More informationLecture 35 Minimization and maximization of functions. Powell s method in multidimensions Conjugate gradient method. Annealing methods.
Lecture 35 Minimization and maximization of functions Powell s method in multidimensions Conjugate gradient method. Annealing methods. We know how to minimize functions in one dimension. If we start at
More informationAssignment 2 Atomic-Level Molecular Modeling
Assignment 2 Atomic-Level Molecular Modeling CS/BIOE/CME/BIOPHYS/BIOMEDIN 279 Due: November 3, 2016 at 3:00 PM The goal of this assignment is to understand the biological and computational aspects of macromolecular
More informationOptimization Methods via Simulation
Optimization Methods via Simulation Optimization problems are very important in science, engineering, industry,. Examples: Traveling salesman problem Circuit-board design Car-Parrinello ab initio MD Protein
More informationMolecular Modeling Lecture 11 side chain modeling rotamers rotamer explorer buried cavities.
Molecular Modeling 218 Lecture 11 side chain modeling rotamers rotamer explorer buried cavities. Sidechain Rotamers Discrete approximation of the continuous space of backbone angles. Sidechain conformations
More informationProgramme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues
Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling
More informationRandom Number Generation. Stephen Booth David Henty
Random Number Generation Stephen Booth David Henty Introduction Random numbers are frequently used in many types of computer simulation Frequently as part of a sampling process: Generate a representative
More informationSolving Quadratic & Higher Degree Equations
Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationComputational statistics
Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f
More informationMolecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror
Molecular dynamics simulation CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror 1 Outline Molecular dynamics (MD): The basic idea Equations of motion Key properties of MD simulations Sample applications
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for solving
More informationDocking. GBCB 5874: Problem Solving in GBCB
Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular
More informationConformational Geometry of Peptides and Proteins:
Conformational Geometry of Peptides and Proteins: Before discussing secondary structure, it is important to appreciate the conformational plasticity of proteins. Each residue in a polypeptide has three
More informationMolecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007
Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline
More informationCS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004
CS273: Algorithms for Structure Handout # 2 and Motion in Biology Stanford University Thursday, 1 April 2004 Lecture #2: 1 April 2004 Topics: Kinematics : Concepts and Results Kinematics of Ligands and
More informationAccurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space
Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space Pablo Gainza CPS 296: Topics in Computational Structural Biology Department of Computer
More informationSolving Quadratic & Higher Degree Equations
Chapter 9 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,
More informationBuilding 3D models of proteins
Building 3D models of proteins Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier
More informationProtein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.
Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationKd = koff/kon = [R][L]/[RL]
Taller de docking y cribado virtual: Uso de herramientas computacionales en el diseño de fármacos Docking program GLIDE El programa de docking GLIDE Sonsoles Martín-Santamaría Shrödinger is a scientific
More informationDesign of a Novel Globular Protein Fold with Atomic-Level Accuracy
Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein
More informationLecture 18 Generalized Belief Propagation and Free Energy Approximations
Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and
More informationLocal search algorithms
Local search algorithms CS171, Winter 2018 Introduction to Artificial Intelligence Prof. Richard Lathrop Reading: R&N 4.1-4.2 Local search algorithms In many optimization problems, the path to the goal
More informationThe Ising model and Markov chain Monte Carlo
The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte
More informationConformational Analysis of n-butane
Conformational Analysis of n-butane In this exercise you will calculate the Molecular Mechanics (MM) single point energy of butane in various conformations with respect to internal rotation around the
More informationExercise 2: Solvating the Structure Before you continue, follow these steps: Setting up Periodic Boundary Conditions
Exercise 2: Solvating the Structure HyperChem lets you place a molecular system in a periodic box of water molecules to simulate behavior in aqueous solution, as in a biological system. In this exercise,
More informationLecture 27: Theory of Computation. Marvin Zhang 08/08/2016
Lecture 27: Theory of Computation Marvin Zhang 08/08/2016 Announcements Roadmap Introduction Functions Data Mutability Objects This week (Applications), the goals are: To go beyond CS 61A and see examples
More informationChapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
More informationModeling Biological Systems Opportunities for Computer Scientists
Modeling Biological Systems Opportunities for Computer Scientists Filip Jagodzinski RBO Tutorial Series 25 June 2007 Computer Science Robotics & Biology Laboratory Protein: πρώτα, "prota, of Primary Importance
More informationHomework Problem Set 4 Solutions
Chemistry 380.37 Dr. Jean M. Standard omework Problem Set 4 Solutions 1. A conformation search is carried out on a system and four low energy stable conformers are obtained. Using the MMFF force field,
More informationSampling from Bayes Nets
from Bayes Nets http://www.youtube.com/watch?v=mvrtaljp8dm http://www.youtube.com/watch?v=geqip_0vjec Paper reviews Should be useful feedback for the authors A critique of the paper No paper is perfect!
More informationPredicting Protein Interactions with Motifs
Predicting Protein Interactions with Motifs Jessica Long Chetan Sharma Lekan Wang December 12, 2008 1 Background Proteins are essential to almost all living organisms. They are comprised of a long, tangled
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationSimulations with MM Force Fields. Monte Carlo (MC) and Molecular Dynamics (MD) Video II.vi
Simulations with MM Force Fields Monte Carlo (MC) and Molecular Dynamics (MD) Video II.vi Some slides taken with permission from Howard R. Mayne Department of Chemistry University of New Hampshire Walking
More informationAnnouncements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic
CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return
More informationStructural Bioinformatics (C3210) Molecular Docking
Structural Bioinformatics (C3210) Molecular Docking Molecular Recognition, Molecular Docking Molecular recognition is the ability of biomolecules to recognize other biomolecules and selectively interact
More informationCS 360, Winter Morphology of Proof: An introduction to rigorous proof techniques
CS 30, Winter 2011 Morphology of Proof: An introduction to rigorous proof techniques 1 Methodology of Proof An example Deep down, all theorems are of the form If A then B, though they may be expressed
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationMulti-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins
Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Zhong Chen Dept. of Biochemistry and Molecular Biology University of Georgia, Athens, GA 30602 Email: zc@csbl.bmb.uga.edu
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationIntroduction to Simulated Annealing 22c:145
Introduction to Simulated Annealing 22c:145 Simulated Annealing Motivated by the physical annealing process Material is heated and slowly cooled into a uniform structure Simulated annealing mimics this
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html
More informationSolving Quadratic & Higher Degree Equations
Chapter 7 Solving Quadratic & Higher Degree Equations Sec 1. Zero Product Property Back in the third grade students were taught when they multiplied a number by zero, the product would be zero. In algebra,
More informationQuadratic Equations Part I
Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationBayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies
Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development
More informationSide-chain positioning with integer and linear programming
Side-chain positioning with integer and linear programming Matt Labrum 1 Introduction One of the components of homology modeling and protein design is side-chain positioning (SCP) In [1], Kingsford, et
More informationHomework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM
Homework 9: Protein Folding & Simulated Annealing 02-201: Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM 1. Set up We re back to Go for this assignment. 1. Inside of your src directory,
More informationAdvanced Molecular Dynamics
Advanced Molecular Dynamics Introduction May 2, 2017 Who am I? I am an associate professor at Theoretical Physics Topics I work on: Algorithms for (parallel) molecular simulations including GPU acceleration
More informationCS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash
CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness
More informationExample questions for Molecular modelling (Level 4) Dr. Adrian Mulholland
Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland 1) Question. Two methods which are widely used for the optimization of molecular geometies are the Steepest descents and Newton-Raphson
More informationBU CAS CS 538: Cryptography Lecture Notes. Fall itkis/538/
BU CAS CS 538: Cryptography Lecture Notes. Fall 2005. http://www.cs.bu.edu/ itkis/538/ Gene Itkis Boston University Computer Science Dept. Notes for Lectures 3 5: Pseudo-Randomness; PRGs 1 Randomness Randomness
More informationMolecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment
Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.
More informationCan a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?
Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Ruhong Zhou 1 and Bruce J. Berne 2 1 IBM Thomas J. Watson Research Center; and 2 Department of Chemistry,
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationPapers listed: Cell2. This weeks papers. Chapt 4. Protein structure and function. The importance of proteins
1 Papers listed: Cell2 During the semester I will speak of information from several papers. For many of them you will not be required to read these papers, however, you can do so for the fun of it (and
More informationIntroduction to Optimization
Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)
More informationLocal Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat:
Local Search Local Search (Greedy Descent): Maintain an assignment of a value to each variable. Repeat: I I Select a variable to change Select a new value for that variable Until a satisfying assignment
More informationCS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =
CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)
More informationLecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability
Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions
More informationMethods for finding optimal configurations
CS 1571 Introduction to AI Lecture 9 Methods for finding optimal configurations Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Search for the optimal configuration Optimal configuration search:
More informationBioengineering 215. An Introduction to Molecular Dynamics for Biomolecules
Bioengineering 215 An Introduction to Molecular Dynamics for Biomolecules David Parker May 18, 2007 ntroduction A principal tool to study biological molecules is molecular dynamics simulations (MD). MD
More informationAlgorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University
Algorithms NP -Complete Problems Dong Kyue Kim Hanyang University dqkim@hanyang.ac.kr The Class P Definition 13.2 Polynomially bounded An algorithm is said to be polynomially bounded if its worst-case
More informationNP Completeness and Approximation Algorithms
Chapter 10 NP Completeness and Approximation Algorithms Let C() be a class of problems defined by some property. We are interested in characterizing the hardest problems in the class, so that if we can
More informationICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below
ICCP Project 2 - Advanced Monte Carlo Methods Choose one of the three options below Introduction In statistical physics Monte Carlo methods are considered to have started in the Manhattan project (1940
More informationUsing Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell
Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Mathematics and Biochemistry University of Wisconsin - Madison 0 There Are Many Kinds Of Proteins The word protein comes
More informationJoana Pereira Lamzin Group EMBL Hamburg, Germany. Small molecules How to identify and build them (with ARP/wARP)
Joana Pereira Lamzin Group EMBL Hamburg, Germany Small molecules How to identify and build them (with ARP/wARP) The task at hand To find ligand density and build it! Fitting a ligand We have: electron
More informationComputational Protein Design
11 Computational Protein Design This chapter introduces the automated protein design and experimental validation of a novel designed sequence, as described in Dahiyat and Mayo [1]. 11.1 Introduction Given
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationAlpha-Beta Pruning: Algorithm and Analysis
Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Introduction Alpha-beta pruning is the standard searching procedure used for 2-person
More informationMonte Carlo (MC) Simulation Methods. Elisa Fadda
Monte Carlo (MC) Simulation Methods Elisa Fadda 1011-CH328, Molecular Modelling & Drug Design 2011 Experimental Observables A system observable is a property of the system state. The system state i is
More informationFundamentals of Metaheuristics
Fundamentals of Metaheuristics Part I - Basic concepts and Single-State Methods A seminar for Neural Networks Simone Scardapane Academic year 2012-2013 ABOUT THIS SEMINAR The seminar is divided in three
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationBayes Nets III: Inference
1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy
More information3D HP Protein Folding Problem using Ant Algorithm
3D HP Protein Folding Problem using Ant Algorithm Fidanova S. Institute of Parallel Processing BAS 25A Acad. G. Bonchev Str., 1113 Sofia, Bulgaria Phone: +359 2 979 66 42 E-mail: stefka@parallel.bas.bg
More informationUniversal Similarity Measure for Comparing Protein Structures
Marcos R. Betancourt Jeffrey Skolnick Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893. Warson Rd., Creve Coeur, MO 63141 Universal Similarity Measure for Comparing Protein
More informationMotivation, Basic Concepts, Basic Methods, Travelling Salesperson Problem (TSP), Algorithms
Motivation, Basic Concepts, Basic Methods, Travelling Salesperson Problem (TSP), Algorithms 1 What is Combinatorial Optimization? Combinatorial Optimization deals with problems where we have to search
More informationAnalog Computing: a different way to think about building a (quantum) computer
Analog Computing: a different way to think about building a (quantum) computer November 24, 2016 1 What is an analog computer? Most of the computers we have around us today, such as desktops, laptops,
More informationThis semester. Books
Models mostly proteins from detailed to more abstract models Some simulation methods This semester Books None necessary for my group and Prof Rarey Molecular Modelling: Principles and Applications Leach,
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationNumerical Studies of the Quantum Adiabatic Algorithm
Numerical Studies of the Quantum Adiabatic Algorithm A.P. Young Work supported by Colloquium at Universität Leipzig, November 4, 2014 Collaborators: I. Hen, M. Wittmann, E. Farhi, P. Shor, D. Gosset, A.
More informationClustering. Léon Bottou COS 424 3/4/2010. NEC Labs America
Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.
More informationFlexPepDock In a nutshell
FlexPepDock In a nutshell All Tutorial files are located in http://bit.ly/mxtakv FlexPepdock refinement Step 1 Step 3 - Refinement Step 4 - Selection of models Measure of fit FlexPepdock Ab-initio Step
More informationHOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.
HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental
More informationBayesian networks: approximate inference
Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact
More informationLearning Energy-Based Models of High-Dimensional Data
Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero www.cs.toronto.edu/~hinton/energybasedmodelsweb.htm Discovering causal structure as a goal
More informationThe protein folding problem consists of two parts:
Energetics and kinetics of protein folding The protein folding problem consists of two parts: 1)Creating a stable, well-defined structure that is significantly more stable than all other possible structures.
More information