Computational Molecular Modeling Lecture 1: Structure Models, Properties Chandrajit Bajaj
Today s Outline Intro to atoms, bonds, structure, biomolecules, Geometry of Proteins, Nucleic Acids, Ribosomes, Viruses Space Occupancy, Bonded, Non-Bonded Areas, Volumes, Derivatives Dynamic Maintenance and Bioinformatics
The Tree of Life! The World of the Cell, 1996) Eukaryotic cells Viruses? Ribosome Prokaryotic cell The problems of chemistry and biology can be greatly helped if our ability to see what we are doing, and to do things on an atomic level, is ultimately developed - a development which I think cannot be avoided Richard Feynman, 1959 CalTech
The Tree of Life! The World of the Cell, 1996) Eukaryotic cells Viruses? Ribosome Prokaryotic cell The problems of chemistry and biology can be greatly helped if our ability to see what we are doing, and to do things on an atomic level, is ultimately developed - a development which I think cannot be avoided Richard Feynman, 1959 CalTech
X-ray diffraction analysis for Atomic Resolution Structure Determination X-ray crystallography (diffraction) Atomic resolution Difficulties (experimental, computational) human deoxy-hemoglobin Protein Data Bank
Xray Crystallography- elucidating structure Ø Periodicity and Symmetry in a Crystal Ø Diffraction Pattern and Bragg s Law Ø Reciprocal Space and Fourier Transform Ø Phase Problem and Solutions Ø Fitting, Refinement, and Validation Crystal à Diffraction pattern à Electron density à Model
Molecular Structure of Hemoglobin secondary, tertiary, quaternary structure One myoglobin chain contains eight α-helices and no β -sheets. Nobel Prize
The PDB file ATOM 1 N GLU A 27 41.211 44.533 94.570 1.00 85.98 ATOM 2 CA GLU A 27 42.250 44.748 95.621 1.00 86.10 ATOM 3 C GLU A 27 42.601 43.408 96.271 1.00 85.99 ATOM 4 O GLU A 27 43.691 42.865 96.065 1.00 85.71 ATOM 5 CB GLU A 27 41.725 45.720 96.687 1.00 86.36 ATOM 6 CG GLU A 27 42.804 46.349 97.563 1.00 86.44 ATOM 7 CD GLU A 27 43.628 47.387 96.817 1.00 86.98 ATOM 8 OE1 GLU A 27 44.194 47.051 95.754 1.00 87.40 ATOM 9 OE2 GLU A 27 43.713 48.540 97.296 1.00 87.02 ATOM 10 N ARG A 28 41.662 42.882 97.053 1.00 85.65 ATOM 11 CA ARG A 28 41.839 41.607 97.739 1.00 85.29 ATOM 12 C ARG A 28 41.380 40.458 96.835 1.00 85.31 ATOM 13 O ARG A 28 42.184 39.619 96.424 1.00 85.09 ATOM 14 CB ARG A 28 41.035 41.607 99.045 1.00 84.62 ATOM 15 CG ARG A 28 39.564 41.944 98.851 1.00 84.07 ATOM 16 CD ARG A 28 38.845 42.152 100.169 1.00 84.00 ATOM 17 NE ARG A 28 37.423 42.439 99.980 1.00 84.27 ATOM 18 CZ ARG A 28 36.945 43.413 99.208 1.00 84.53 ATOM 19 NH1 ARG A 28 37.771 44.208 98.537 1.00 83.83 ATOM 20 NH2 ARG A 28 35.634 43.598 99.111 1.00 84.38...
PDB -> 2--> PQR Replace the temperature and occupancy columns with per-atom charge (Q) and radius (R) for a PDB file. Field Atom_num Atom_name Res_name Chain_ID Res_numr X Y Z temp occupy ATOM 76368 CB LYS L 57 87.677 124.547 7.349 1.00 35.51 C ATOM 76369 CG LYS L 57 86.549 125.304 6.741 1.00 37.35 C ATOM 76370 CD LYS L 57 85.427 124.333 6.451 1.00 38.17 C Field Atom_num Atom_name Res_name Chain_ID Res_numr X Y Z Charge Radius ATOM 76368 CB LYS L 57 87.677 124.547 7.349 0.211 1.908 C ATOM 76369 CG LYS L 57 86.549 125.304 6.741-0.303 1.908 C ATOM 76370 CD LYS L 57 85.427 124.333 6.451 0.799 1.908 C Two widely used approaches: PDB2PQR and Amber
What is the atomic charge? Atomic Charges Based on atomic electronegativity, optimized for a given Force Field. example: Gasteiger charges. Based on atomic electronegativity and the resulting electrical field. example: Charge Equilibrium charges (QEq). * Based on the electronic distribution calculated by QM. example: Mulliken charges. Based on the electrostatic potential near the molecule, calculated by a non-empirical method (or determined experimentally). examples: Chelp, ChelpG, RESP. Center for Computational Visualization Institute for Computational and Engineering Sciences Sep 2011
Proteins Amino acids contain an amide, a residue and a carboxyl group Proteins are polypeptide chains, made from amino acids combined via peptide bonds. H H N R Cα H C O OH N H R Cα H C O H N H Cα R O C
Amino Acids (I) Unlabeled atoms are either carbon or hydrogen. C alpha atoms are shaded. Double bonds and partially double bonds are shown in bold.
Amino Acids (II) Unlabeled atoms are either carbon or hydrogen. C alpha atoms are shaded. Double bonds and partially double bonds are shown in bold.
Protein Geometry: Backbone + SideChains
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
Figure Phil Bradley: pbradley@fhcrc.org
RNA:Ribo-Nucleic Acids α β P H5 O5 γ H4 C5 δ Torsion angles C4 H3 H5 C3 ε O3 ζ P Phosphoric acid Adenine, Guanine, Cytosine, Uracil. O4 C2 H2 χ C1 H2 Base C5 Nucleotide C4 Sugar C3 C1 C2 Phosphoric acid Can also specify ribose dihedral angles and puckering phase, amplitude Base C5 RNA polymer Nucleotide Base Sugar C1 C4 C3 C2
Bases
Figure Phil Bradley: pbradley@fhcrc.org
Molecular Models I Solvent molecule modeled as a probe sphere. Water: radius 1.4A Probe sphere SAS 1NT5 Molecule VDW SES/SCS SAS: solvent accessible : locus of probe center VDW: van der Waals: Union of spheres with VDW radii SES/SCS: solvent excluded/solvent contact --- Molecular surface
Power Diagram & Union of Balls (vdw) Union of disks (CPK) Laguerre Voronoi (Power) Diagram Regular Triangulation Skeletal Complex
Laguerre Geometry & Union of Balls H. Edelsbrunner. The union of balls and its dual shape. Disc. Comput. Geom., 13:415 440, 1995. C. Bajaj, V. Pascucci, A. Shamir, R. Holt, A. Netravali Dynamic Maintenance and Visualization of Molecular Surfaces Discrete Applied Mathematics 127 (2003). Pages 23-51.
Adaptive Grids & Union of Balls Legend Gridpoint: l VDW (red) l SAS (green) l OUT (unmarked) Gridcell: l Buried (brown) l VDWBoundary (light green) l SASBand (dark blue) l SASBoundary (light blue) l Out (white) l Gridpoint classes l Gridcell classes
Adap. Grids & Updating Union of Balls Add l l l l Gridpoints can be reclassified l SAS -> VDW, OUT -> SAS, OUT ->VDW Gridcell classification is also changed based on new classification of gridpoints The new atom is marked as exposed if its insertion resulted in marking a gridpoint as SAS Previously exposed atoms intersecting the new atom are marked buried if their SAS volume does not contain any gridpoint marked SAS l Add is O(1) under the assumption that the grid-spacing h = O(r s ) and r s = O(r max )
Adap. Grids & Updating Union of Balls Remove l l l Gridpoints can be reclassified l SAS -> OUT, VDW -> SAS, VDW ->OUT Gridcell classification is also changed based on new classification of gridpoints Previously buried atoms intersecting the removed atom can become exposed if their SAS volume now contain any gridpoint marked SAS l Removal is O(1) under the assumption that the grid-spacing h = O(r s ) and r s = O(r max )
Structural Properties _I
Structural Properties _II
Structural Properties _III We shall correct this critical problem in the next lecture on Smooth Structural Interfaces!
Challenge #2: Bimolecular Models in Solvent Computational Problems Smooth Interfaces Parameterization Area and Volume Derivatives Dynamic Updates Techniques nfft for fast summations Fast Dynamic Particle Maintenance Integrals, Quadrature Challenges Protein Flexibility Protein Folding, Rotamer Packing Spontaneous Assembly NEXT Aperiodic Quasi-crystals or Quasi-lattices
Some Useful Software Links Some useful Structure Links The molecular energetics and docking client /viewer TexMol and a user-manual The molecular viewer PyMOL and this tutorial The program MODELLER. Some useful Databases/Servers The PDB, the repository of experimentally determined protein structures. NCBI PSI-Blast. MUSCLE protein multiple sequence alignment server. PredictProtein protein sequence analysis web server: secondary structure prediction, coiled-coils, transmembrane helices, fold recognition,... PROSITE protein sequence patterns. SCOP structure classification database. matrix2png, a handy bioinformatics tool. Bioinfo MetaServer, consensus fold recognition server.