Scoring functions. Talk Overview. Eran Eyal. Scoring functions what and why

Similar documents
Aqueous solutions. Solubility of different compounds in water

From Amino Acids to Proteins - in 4 Easy Steps

Chimica Farmaceutica

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Intermolecular Forces I

BIOC : Homework 1 Due 10/10

3. Solutions W = N!/(N A!N B!) (3.1) Using Stirling s approximation ln(n!) = NlnN N: ΔS mix = k (N A lnn + N B lnn N A lnn A N B lnn B ) (3.

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Dana Alsulaibi. Jaleel G.Sweis. Mamoon Ahram

Proteins polymer molecules, folded in complex structures. Konstantin Popov Department of Biochemistry and Biophysics

Lec.1 Chemistry Of Water

Free energy, electrostatics, and the hydrophobic effect

`1AP Biology Study Guide Chapter 2 v Atomic structure is the basis of life s chemistry Ø Living and non- living things are composed of atoms Ø

The protein folding problem consists of two parts:

Protein Folding experiments and theory

Biophysics II. Hydrophobic Bio-molecules. Key points to be covered. Molecular Interactions in Bio-molecular Structures - van der Waals Interaction

Solutions and Non-Covalent Binding Forces

Docking. GBCB 5874: Problem Solving in GBCB

Intermolecular Forces

16 years ago TODAY (9/11) at 8:46, the first tower was hit at 9:03, the second tower was hit. Lecture 2 (9/11/17)

Structural Bioinformatics (C3210) Molecular Mechanics

BCMP 201 Protein biochemistry

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

Other Cells. Hormones. Viruses. Toxins. Cell. Bacteria

Biochemistry Prof. S. DasGupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture - 06 Protein Structure IV

Chapter 1. Topic: Overview of basic principles

Chapter 3. Crystal Binding

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Electonegativity, Polar Bonds, and Polar Molecules

Proton Acidity. (b) For the following reaction, draw the arrowhead properly to indicate the position of the equilibrium: HA + K + B -

Micro-canonical ensemble model of particles obeying Bose-Einstein and Fermi-Dirac statistics

CHEM 4170 Problem Set #1

Rama Abbady. Zina Smadi. Diala Abu-Hassan

2 Structure. 2.1 Coulomb interactions

( ) ( ) ( ) + ( ) Ä ( ) Langmuir Adsorption Isotherms. dt k : rates of ad/desorption, N : totally number of adsorption sites.

Objectives. By the time the student is finished with this section of the workbook, he/she should be able

Physical Chemistry - Problem Drill 01: Chemistry and Physics Review

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

M1.(a) (i) giant lattice allow each carbon atom is joined to three others 1

This semester. Books

SAM Teacher s Guide Protein Partnering and Function

Lecture 26: Polymers: DNA Packing and Protein folding 26.1 Problem Set 4 due today. Reading for Lectures 22 24: PKT Chapter 8 [ ].

Building 3D models of proteins

Computational protein design

Curve Sketching. The process of curve sketching can be performed in the following steps:

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

ATOMIC BONDING Atomic Bonding

EXAM 1 Fall 2009 BCHS3304, SECTION # 21734, GENERAL BIOCHEMISTRY I Dr. Glen B Legge

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007

Dipole-Dipole Interactions London Dispersion Forces

POGIL 7 KEY Intermolecular Forces

Supporting Online Material for

Physics 2B Chapter 17 Notes - First Law of Thermo Spring 2018

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Chemistry Review: Atoms

2. Thermodynamics of native point defects in GaAs

CS 2, HCN, BeF 2 Trigonal planar. Cl 120 BF 3, AlCl 3, SO 3, NO 3-, CO NCl 3,PF 3,ClO 3,H 3 O + ...

Water, water everywhere,; not a drop to drink. Consumption resulting from how environment inhabited Deforestation disrupts water cycle

Atomic and molecular interaction forces in biology

Saba Al Fayoumi. Tamer Barakat. Dr. Mamoun Ahram + Dr. Diala Abu-Hassan

2.2.2 Bonding and Structure

The Chemistry and Energy of Life

One Q partial negative, the other partial negative Ø H- bonding particularly strong. Abby Carroll 2

EXAM I COURSE TFY4310 MOLECULAR BIOPHYSICS December Suggested resolution

SAM Teachers Guide Chemical Bonds

Bulk behaviour. Alanine. FIG. 1. Chemical structure of the RKLPDA peptide. Numbers on the left mark alpha carbons.

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

BIBC 100. Structural Biochemistry

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 1. Extreme points

The dative covalent bond acts like an ordinary covalent bond when thinking about shape so in NH 4. the shape is tetrahedral

States of matter Part 1

Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations

Water. 2.1 Weak Interactions in Aqueous Sy stems Ionization of Water, Weak Acids, and Weak Bases 58

States of matter Part 1. Lecture 1. University of Kerbala. Hamid Alghurabi Assistant Lecturer in Pharmaceutics. Physical Pharmacy

Chapter 12 Section 1

When intermolecular forces are strong, the atoms, molecules, or ions are strongly attracted to each other, and draw closer together.

[8.5] Melting Points and Boiling Points of Solutions

REVIEW : INTRODUCTION TO THE MOLECULAR ORIGINS OF MECHANICAL PROPERTIES QUANTITATIVE TREATMENT OF INTERATOMIC BONDING : THE LENNARD-JONES POTENTIAL

Intermolecular Forces of Attraction

Fundamental Interactions: 6 Forces

Electronegativity: the ability of an atom to attract bonding electrons

Name: Date: Period: #: BONDING & INTERMOLECULAR FORCES

Ch 9 Liquids & Solids (IMF) Masterson & Hurley

Microscopic analysis of protein oxidative damage: effect of. carbonylation on structure, dynamics and aggregability of.

Definition: An Ionic bond is the electrostatic force of attraction between oppositely charged ions formed by electron transfer.

Definition: Let f(x) be a function of one variable with continuous derivatives of all orders at a the point x 0, then the series.

Bonding and the Determination of Melting Points and Boiling Points

Silicon / Si If not silicon then CE = 0 / 3 1

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

Chap 10 Part 4Ta.notebook December 08, 2017

Lecture C2 Microscopic to Macroscopic, Part 2: Intermolecular Interactions. Let's get together.

Chem 406 Biophysical Chemistry Lecture 1 Transport Processes, Sedimentation & Diffusion

Protein Structure Basics

Transcription:

Scoring unctions Talk Overview Scoring unctions what and why Force ields based on approximation o molecular orces as we understand them Knowledge-based potentials let the data speak May 2011 Eran Eyal Atoms are composed o smaller particles. egatively charged electrons are distributed with some probability unction around the nucleus In order to perorm accurate calculations regarding the orces acting between atoms and molecules, sophisticated quantum-mechanical calculations are needed to calculate probabilities and the quantum energy states QM calculations requires CPU time which grows exponentially with the number o atoms Today's hardware may acilitate calculations with up to ew hundred atoms and are thereore easible only or small molecules or or restricted regions o macromolecules To perorm calculations on larger molecules with need simpler representation and approximation or the basic detailed physical representation o the system In molecular mechanics we treat an atom as a ball with a deine radius This radius doesn t represent a clear physical border. Instead, it represents the space where electrons are distributed most o the time

Atom radii: element H C O F P S Cl van-der-waals radius / Å 1.20 1.70 1.55 1.52 1.47 1.80 1.80 1.89 Force ields Force ield is the name given to an expressions empirically account or all orces acting on molecules Molecular mechanics tools make use o orce ields to evaluate orces and energies in the molecular systems The use o orce ields is predominantly to evaluate conormations during modeling procedures and or ranking dierent solutions. There are various orce ields which are dierent in the energy components they consider and in parameterization o equivalent terms Components o orce ields can be divided to two types: -Intramolecular -Intermolecular Intra-molecular terms include potentials resulted rom deviation rom canonical geometry between atoms separated by up to 3 covalent bonds Intermolecular terms include potentials resulted rom all physical orces acting between non covalently bonded atoms, but orm electrostatic interactions in space. The most popular orce ields are CHARMM (implemented within the CHARMM package and CHARMm commercial sotware), AMBER (implemented within the AMBER package) and GROMOS (implemented within GROMACS).

Intra-molecular potential Intra-molecular potential potential resulted rom bond stretching/shortening + potential resulted rom angle distortion + potential resulted rom conormational distortion + Deviation rom optimal bond length The most common way to account or deviation rom optimal bond length is by parabolic unction: Intra-molecular potentials Deviation rom optimal angles Potential resulted rom deviations rom optimal values (e.g. 109 in tetrahedral carbon are punished.

Deviation rom optimal dihedral angles The potential resulted rom deviation rom optimal dihedral angles is expressed using a periodic unction such as COS Energy o non-bonded atoms Composed o orces acting between atoms which are not covalently attached

The main orces are van der Waals attraction orces, van der Waals repulsion orces, electrostatic orces. Van der Waals orces are meaningul only in short distances. The van der Waals attraction and repulsion orces are requently represented by the 6-12 equation (Lennard Jones equation). van der Waals attraction orces are also known as London orces and act between any two atoms, including neutral atoms, although they have also electrostatic nature. Johannes Diderik van der Waals 1837-1923 Dutch scientist with a distinguisherd contribution or physics and thermodynamics van der Waals equation: Electrostatic orces Acting between charged bodies. Similar charges repulse each other, while opposite charges attract each other The potential is determined by the Coulomb low : Winner o 1910 obel price in physics

Hydrogen bonds Formed between small electronegative atoms Element H C O F P S Cl Electronegativity (s'pauling) 2.1 2.5 3.0 3.5 4.0 2.1 2.5 3.0 Hydrogen bonds are responsible or the special properties o water including relatively high melting temperature Hydrogen bonds are much stronger than attractive van der Waals orces but weaker than covalent bonds Hydrogen bonds have essential role in shaping structure o macromolecules A maor determinant o protein stability is the ability to satisy most possible hydrogen bonds. When the protein is not olded all hydrogen bonds are satisied with water molecule partners. To get a olded state which is more stable, the protein must have the vast maority o its potential hydrogen bonds satisied within the molecule.

Hydrogen bonds network in immunoglobulin Salt bridges A combination o two noncovalent interactions: hydrogen bonding and electrostatic interactions The salt bridge most oten arises rom the anionic carboxylate (RCOO-) o either aspartic acid or glutamic acid and the cationic ammonium (RH3+) rom lysine or the guanidinium (RHC(H2)2+) o arginine Short range interactions (< 4 Å)

Free energy is composed o energy and entropy ΔG ΔH-TΔS Free energy potential energy entropy Usually we compare two possible states o the system and thereore talk about G. What matters is the dierence between the two states and not the absolute energy values ΔG and ΔΔG ΔG between two given conormations determine the stability o the molecule with respect to these conormations. We can then calculate the ratio between the numbers o molecules in the two conormations Oten we like to compare stability o two dierent molecules. For this purpose we compare ΔG o one molecule to that o the other molecule. The dierence between the two is designated by ΔΔG The Protherm database holds thermodynamic data on thousands o mutations in hundreds o dierent proteins and is the most comprehensive collection or such data http://gibk26.bse.kyutech.ac.p/ouhou/protherm/protherm.html

Protherm data statistics (5/2011): G 1 G G1 - G2 http://gibk26.bio.kyutech.ac.p/cgi-bin/ouhou/protherm/pp_stat.pl G 2 The hydrophobic eect is mainly an entropic eect The is no such thing hydrophobic orce The presence o many hydrophobic groups in hydrophilic environment creates large contact surace area between water molecules and these groups. In these regions the water are relatively ordered and this reduces the overall entropy E local minima Global minima When the hydrophobic groups o the protein contact each other, less water molecules are constrained and the system has larger overall entropy Conormations

Knowledge based potentials The persistent need or accurate scoring unction or evaluation o structures The problem o our insuicient understanding o the orces the drive molecular interactions The rapid increase in size o biological databases Residue level/atom level Choosing highly detailed system composed o many type o elements, or example each atom type, might provide detailed inormation. The disadvantages are that a lot o data is required in order to obtain accurate potentials or the many pair potentials in the system. Choosing lower level o representation (or example residue types), might bring to less inormative potentials. The advantages are system which is less sensitive to minor structural inaccuracies and more robust statistics available or obtaining the potentials. Representation o residues When working on the residue level, it is necessary to determine how to represent the residue Usually the representation is simply by a point. The most common positions are C α, C β

Another well accepted position is the center o mass or center o geometry o the side chain atoms Kocher et al., 1994 Zhang et al., 2003 Representation o residues Residues can be represented also by several atoms or by vectors which provide also some inormation about the general orientation o the residue in space Database used to derive the potentials Reliable data on-redundant data (representative data) Oriented database (toward speciic goal) C α (i) C β (i) C β () C α ()

Boltzmann distribution the relation between probabilities and energy p i e E i / Z kt Boltzmann distribution dependence on temperature E 1 < E 2 < E 3 p i e E i / Z kt Z e kl E kl / kt P 3 P 3 K is the Boltzmann s constant T is the absolute temperature p is the probability density unction P 2 P 1 P 2 P 1 Z is the partition unction. In order to know the value o Z we will need to know the energy value o every state in the system. T 1 > T 2 i e E i / Z kt E kt ln[ ] kt ln[ Z] E kt ln[ k, l ] kt ln[ k, l Z ΔE E E k, l kt ln[ ] ( kt ln[ ΔE kt ln[ k l ] kt ln[, Inverse Boltzmann relation ] k, l ] kt ln[ Z ] ] kt ln[ Z ]) ΔE kt ln[ k l ] kt ln[, ΔE kt(ln[ k l ] ln[, Sometimes we are working with some pseudo energetic scores which are linearly related to the energy, but it is not exactly clear (and not really important) how. In this case we can also ignore the constants which their physical meaning is anyway not clear in our case ] ]) ΔE ln[ k, l ] ln[ ] Δ S ln[ k, l ] ln[ ] ΔE

What we have rom known data is the relative requencies o the dierent states i, 1 which we hope represent well the real lie probabilities p i, 1 It is thereore crucial that the set will be as close as possible to the real probabilities p The reerence state A crucial actor in building knowledge based potentials is the reerence state. The reerence state is the probability o some event that we expect by chance alone. I a particular conormation is ound to have the reerence state probability then it does not give inormation about the system Every probability should be normalized with respect to the reerence state probability Another way to look at the reerence state the state which its energy is equal to zero Corrections or the problem o small sample size i, 1 p i, 1 For very large sample size n the real the probability distribution and the observed are similar: lim n p In real lie this is oten not the case, and the sample size is small. In such cases or many states () is not a good approximation o p. I we can not increase the number o observation (which depends on the database size), we usually orce to have insuicient amount o data The best we can do is to minimize the damage rom large possible deviations between and p. Sippl was the irst to introduce a method to account or this problem The idea is to give weight to the inormation we take rom the database according to the number o observations we have

i 1 [ re + σn i z ',, ] re is the reerence probability σ is a constant which represents the weight o each observation in the database ' k, l ( re + nk, l ) 1 k, l k, l z σ + σn total z is the new sum o pseudo requencies and is needed or normalization 1 1+ σn [ + σn ' re k, l total ' 1 n n totalσ re + 1+ σ n 1+ n σ n 1 total ] total k, l total ' total re + 1+ σ ntotal 1+ ntotalσ n σ I there is plenty o data, we rely on the data to derive accurate potentials Contact potentials 1 n σ lim + total ntotal re 1+ σntotal 1+ ntotalσ I there is insuicient amount o data, we preer to use mean value, namely the reerence state probability lim 1 total n re total 0 + 1+ σntotal 1+ ntotalσ n σ re Kocher et al., 1994

Problems with the concept to the Boltzmann model when applied to protein? The Boltzmann model was originally introduced or gas state. Does it appropriate or proteins? Distribution o peptide bonds in Proline is correctly predicted based on Boltzmann distribution On the other hand, interactions are not independent, and protein atoms are constrained by covalent bonds. The connectivity between atoms might introduce bias Values in the potentials might be inluenced by the dominance o other interactions

Factors inluencing on the prediction

Examples o applications o knowledge based potentials S S geometry + S contact + S neighbors + S clashing + S backbone + S rotamer + S local knowledge based potentials are applied or variety o problems in proteins One application is to determine the thermo-stability o proteins and o mutants S geometry G ln i)) i g geo g i) ) Cα (i) Centroid (i) Cα () Centroid () The problem is to decide which amino acid is better in a given position The geometry potential S geometry determines how likely is the interaction between speciic pair o residues. This is done according to the probability to ind the pair o residues in that speciic geometry in the database. S S geometry + S contact + S neighburs + S clashing + S backbone + S rotamer + S local S contact i ln i) i) ) ) x 2 aa y xy (i) () + The contact potential S contact determines how likely is the interaction between a given pair o residues. This is determined by the probability to ind this pair o residues in close contact, relatively to other residue pairs. Contact potential

3.29 2.98 8.30 4.45 7.64 6.59 7.03 7.36 8.17 11.86 3.24 11.38 15.47 2.69 5.65 2.24 8.92 6.61 10.99 8.32 6.81 4.26 7.74 8.39 7.86 8.35 3.25 6.42 10.38 4.32 6.67 1.55 7.45 8.98 8.85 5.26 4.39 0.00 0.00 4.39 S S geometry + S contact + S neighbors + S clashing + S backbone + S rotamer + S local 6.24 3.27 6.35 10.88 15.50 19.59 6.73 4.16 15.13 12.56 4.16 11.47 12.09 10.96 1.83 5.03 13.38 0.00 5.26 8.85 7.74 12.95 11.51 5.22 5.82 9.19 7.23 10.30 5.09 3.40 12.98 5.49 5.25 4.96 14.91 8.93 0.00 13.38 8.98 7.45 1.91 7.73 5.50 6.57 4.02 5.36 3.72 6.00 10.71 9.35 11.30 9.86 6.57 11.81 15.90 2.14 12.17 17.11 21.20 8.22 6.36 5.36 9.21 5.03 1.65 6.27 9.02 6.06 1.57 5.49 8.06 8.83 11.24 7.95 4.35 7.38 7.84 6.69 6.36 0.00 16.68 14.09 5.19 12.78 13.62 12.49 0.00 6.36 6.31 4.96 9.22 6.81 6.55 0.00 12.49 6.69 4.14 2.47 11.61 1.30 0.00 6.55 13.62 7.84 8.93 5.03 1.55 14.91 1.83 6.67 4.96 10.96 6.42 5.25 12.09 8.35 4.32 10.38 3.25 7.86 S neighbours i Ci ln bins i) (i) 5.95 10.38 8.92 1.31 6.79 9.64 5.76 8.33 4.90 2.93 11.25 0.00 1.30 6.81 12.78 7.38 5.49 11.47 8.39 7.74 5.54 2.75 5.67 10.28 14.34 18.43 5.89 3.22 14.61 11.92 0.00 11.25 11.61 9.22 5.19 4.35 12.98 4.16 4.26 6.81 6.76 11.97 10.53 3.02 4.34 7.95 6.25 9.32 3.29 0.00 11.92 2.93 2.47 4.96 14.09 7.95 3.40 12.56 8.32 6.61 10.05 15.26 13.82 5.61 3.67 4.98 9.54 12.61 0.00 3.29 14.61 4.90 2.62 2.67 4.53 7.42 13.18 17.27 3.07 0.00 12.61 9.32 3.22 8.33 1.91 5.72 4.56 4.83 10.11 14.20 0.00 3.07 9.54 6.25 5.89 5.76 14.71 19.92 18.48 10.27 4.09 0.00 14.20 17.27 4.98 7.95 18.43 9.64 10.62 15.83 14.39 6.88 0.00 4.09 10.11 13.18 3.67 4.34 14.34 6.79 5.38 9.65 8.21 0.00 6.88 10.27 4.83 7.42 5.61 3.02 10.28 1.31 5.29 3.82 0.00 8.21 14.39 18.48 4.56 4.53 13.82 10.53 5.67 8.92 4.14 8.83 6.06 9.02 6.27 1.65 9.86 6.31 8.06 5.03 9.21 5.36 6.36 9.35 16.68 11.24 5.09 15.13 10.99 8.92 5.49 1.57 10.30 4.16 2.24 5.65 8.22 2.14 7.23 6.73 2.69 3.24 21.20 15.90 9.19 19.59 15.47 11.86 17.11 11.81 5.82 15.50 11.38 8.17 12.17 6.57 5.22 10.88 7.36 7.03 6.00 5.36 11.51 6.35 6.59 7.64 The neighbors potential S neighbors evaluate the environment o a given residues. This is determined by counting the number o neighbors o a given residue, and giving the probability to ind the residue with this number o neighbors in the database. 5.21 0.00 3.82 9.65 15.83 19.92 5.72 2.67 15.26 11.97 2.75 10.38 11.30 10.71 3.72 4.02 12.95 3.27 4.45 8.30 0.00 5.21 5.29 5.38 10.62 14.71 1.91 2.62 10.05 6.76 5.54 5.95 6.57 5.50 7.73 1.91 7.74 6.24 2.98 3.29 S S geometry + S contact + S neighbors + S clashing + S backbone + S rotamer + S local S backbone i) bb( i) ln i x x bb( i) aa The backbone potential S backbone relects the probability to ind the residue given the local backbone conormation (which is ixed in our problem). For example, Pro in α-helix will have small S backbone. eighbors potential

S S geometry + S contact + S neighbors + S clashing + S backbone + S rotamer + S local S S geometry + S contact + S neighbors + S clashing + S backbone + S rotamer + S local S rotamer PR ( i)) bb( i) i ln r P rot r( i) bb( i)) The rotamer potential S rotamer relects the probability to ind this rotamer in the protein database given the local backbone conormation. The probabilities o the rotamers where taken rom pre-calculated rotamer library. S local i ln ( s i), ) S, D d i), ), s, d x y s d )( x y xysd x, y, S, D The local potential S local relects the probability to ind the residue in the local environment (±3 amino acids on the sequence) given both the sequence and the structural conormation in that region. ) S S geometry + S contact + S neighbors + S clashing + S backbone + S rotamer + S local S S geometry +K c S contact +K n S neighbors + K w S clashing +K d S dipeptide +K b S backbone + K l S local Optimization o the K s was done on dataset471 using Monte Carlo procedure. The K s were the parameters to be optimized and the correlation coeicient (r) between the calculated scores and the experimental scores was the obective unction.

r 0.73, trend 0.78 Sequence design r 0.54, trend 0.77 r 0.57, trend 0.74 Knowledge based Potentials can be used to design sequences that will be compatible to a given structure For each position the potentials help to determine which is the most appropriate residue. Ota et al., 1997

Detection and evaluation o sequence-structure compatibility By ar the most common applications o the knowledge based potentials, especially using the residue level, is or evaluation o protein structures This includes: Ranking dierent models o a given protein Evaluate individual protein structures 1D 3D alignment Inverse olding problem searching sequence databases with a given structure