Nucleic Acid Reactivity: Challenges for Next-Generation Semiempirical Quantum Models

Similar documents
Specific Reaction Parametrization of the AM1/d Hamiltonian for Phosphoryl Transfer Reactions: H, O, and P Atoms

Helical Structure and Circular Dichroism Spectra of DNA: A Theoretical Study

NUCLEIC ACIDS. Basic terms and notions. Presentation by Eva Fadrná adapted by Radovan Fiala

DFT Study of the Interaction of Thymine with Cu + and Zn 2+

SUPPLEMENTAL MATERIAL

J.Phys. & Theo.Chem.I.A.U. Iran M.Monajjemi et al. Vol.4, No.1, Spring 2007

Supplementary information Silver (I) as DNA glue: Ag + - mediated guanine pairing revealed by removing Watson- Crick constraints

The biomolecules of terrestrial life

Organic and Biochemical Molecules. 1. Compounds composed of carbon and hydrogen are called hydrocarbons.

2: CHEMICAL COMPOSITION OF THE BODY

Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland

Why Is Molecular Interaction Important in Our Life

ONETEP PB/SA: Application to G-Quadruplex DNA Stability. Danny Cole

Dr. Nafith Abu Tarboush

Chapter 002 The Chemistry of Biology


1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

Review of Lecture 1. Be able to identify the cell components for bacterial, animal, and plant cells and know their functions Properties of water

DNA Structure. Voet & Voet: Chapter 29 Pages Slide 1

Ch. 2 BASIC CHEMISTRY. Copyright 2010 Pearson Education, Inc.

Advanced Cell Biology. Lecture 6

BIOCHEMISTRY GUIDED NOTES - AP BIOLOGY-

Ch 3: Chemistry of Life. Chemistry Water Macromolecules Enzymes

Human Biology. The Chemistry of Living Things. Concepts and Current Issues. All Matter Consists of Elements Made of Atoms

Full file at

Nucleophilic Attack on Phosphate Diesters: A Density Functional Study of In-Line Reactivity in Dianionic, Monoanionic, and Neutral Systems

It s the amino acids!

Teacher Instructions

NMR of Nucleic Acids. K.V.R. Chary Workshop on NMR and it s applications in Biological Systems November 26, 2009

Biomolecules. Energetics in biology. Biomolecules inside the cell

Chapter 2. Chemical Principles

Semi-Empirical MO Methods

The Chemistry of Microbiology

DNA structure: answers to questions What is wrong with this diagram?

Chapter Two: The Chemistry of Biology. The molecules of life make up the structure of cells Chemistry of biological molecule

Foundations in Microbiology Seventh Edition

with the larger dimerization energy also exhibits the larger structural changes.

PCCP Accepted Manuscript

Chapter 2: Chemical Basis of Life

Chapter 25 Organic and Biological Chemistry

NMR parameters intensity chemical shift coupling constants 1D 1 H spectra of nucleic acids and proteins

W2. Chemical structures of protein and DNA

Practice Problems on Nucleic Acids

EVPP 110 Lecture Exam #1 Study Questions Fall 2003 Dr. Largen

Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations

Chemical Principles and Biomolecules (Chapter 2) Lecture Materials for Amy Warenda Czura, Ph.D. Suffolk County Community College Eastern Campus

Basic Chemistry. Chapter 2 BIOL1000 Dr. Mohamad H. Termos

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Chapter 6 Cyclic urea - a new central unit in bent-core compounds

Carbon and the Molecular Diversity of Life

Carbon and Molecular Diversity - 1

Bio10 Cell and Molecular Lecture Notes SRJC

Chapter 2 The Chemistry of Biology. Dr. Ramos BIO 370

Unit 2: Basic Chemistry

The body has three primary lines of defense against changes in hydrogen ion concentration in the body fluids.

`1AP Biology Study Guide Chapter 2 v Atomic structure is the basis of life s chemistry Ø Living and non- living things are composed of atoms Ø

Atomic Structures of the Molecular Components in DNA and. RNA based on Bond Lengths as Sums of Atomic Radii

Structures of the Molecular Components in DNA and RNA with Bond Lengths Interpreted as Sums of Atomic Covalent Radii

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Chapter 1 Annotating Outline Honors Biology

2: CHEMICAL COMPOSITION OF THE BODY

MITOCW watch?v=gboyppj9ok4

2/25/2013. Electronic Configurations

Session 1. Introduction to Computational Chemistry. Computational (chemistry education) and/or (Computational chemistry) education

Chemical Basis of Life

Chapter 2: Chemistry. What does chemistry have to do with biology? Vocabulary BIO 105

Finite Ring Geometries and Role of Coupling in Molecular Dynamics and Chemistry

Chemistry Basics. Matter anything that occupies space and has mass Energy the ability to do work. Chemical Electrical Mechanical Radiant. Slide 2.

Sample Questions for the Chemistry of Life Topic Test

Unit 1: Chemistry - Guided Notes

BIO16 E1. Multiple Choice Identify the choice that best completes the statement or answers the question.

Preferred Phosphodiester Conformations in Nucleic Acids. A Virtual Bond Torsion Potential to Estimate Lone-Pair Interactions in a Phosphodiester

Name: Date: Period: Biology Notes: Biochemistry Directions: Fill this out as we cover the following topics in class

Figure 1: Transition State, Saddle Point, Reaction Pathway

Quantum Chemistry Study of NMR Parameters of cis Watson-Crick/Sugar Edge RNA Base Pair Family

Vibrational Spectra of Nucleic Acid Bases and Their Watson Crick Pair Complexes

MULTIPLE CHOICE. Circle the one alternative that best completes the statement or answers the question.

Today in Astronomy 106: the long molecules of life

Supporting Information

Gilles Frison, Gilles Ohanessian. To cite this version: HAL Id: hal

Figure ) Letter E represents a nucleic acid building block known as a. Answer: nucleotide Diff: 3 Page Ref: 54

Homework Problem Set 4 Solutions

Conformational Geometry of Peptides and Proteins:

Alanine: Then There Was Water

Chapter 2. The Structure of Atoms. The Structure of Atoms. The Structure of Atoms

2) Matter composed of a single type of atom is known as a(n) 2) A) element. B) mineral. C) electron. D) compound. E) molecule.

Introduction to Polymer Physics

Chemical Principles. PowerPoint Lecture Presentations prepared by Bradley W. Christian, McLennan Community College C H A P T E R

Chapter 02 Testbank. 1. Anything that occupies space and has mass is called. A. an electron. B. living. C. matter. D. energy. E. space.

Biophysics II. Hydrophobic Bio-molecules. Key points to be covered. Molecular Interactions in Bio-molecular Structures - van der Waals Interaction

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

BSc and MSc Degree Examinations

PTYS 214 Spring Announcements. Midterm #1 on Tuesday! Be on time! No one enters after the first person leaves! Do your homework!

Resolving Apparent Conflicts between Theoretical and Experimental Models of Phosphate Monoester Hydrolysis

Chapter 02 Testbank. 1. Anything that occupies space and has mass is called. A. an electron. B. living. C. matter. D. energy. E. space.

Chemistry of Life. Chapter 2

Analyze Nucleotides, Nucleosides, Purine, and Pyrimidine Bases Simultaneously with the Ultra IBD Column

BIOLOGY 101. CHAPTER 4: Carbon and the Molecular Diversity of Life: Carbon: the Backbone of Life

Chemical Principles. 2-1 Describe the structure of an atom and its relation to the physical properties of elements. 6 C differ from.

Transcription:

WWW.C-CHEM.ORG FULL PAPER Nucleic Acid Reactivity: Challenges for Next-Generation Semiempirical Quantum Models Ming Huang, [a,b] Timothy J. Giese, [b] and Darrin M. York* [b] Semiempirical quantum models are routinely used to study mechanisms of RNA catalysis and phosphoryl transfer reactions using combined quantum mechanical (QM)/molecular mechanical methods. Herein, we provide a broad assessment of the performance of existing semiempirical quantum models to describe nucleic acid structure and reactivity to quantify their limitations and guide the development of next-generation quantum models with improved accuracy. Neglect of diatomic differential overlap and self-consistent density-functional tight-binding semiempirical models are evaluated against high-level QM benchmark calculations for seven biologically important datasets. The datasets include: proton affinities, polarizabilities, nucleobase dimer interactions, dimethyl phosphate anion, nucleoside sugar and glycosidic torsion conformations, and RNA phosphoryl transfer model reactions. As an additional baseline, comparisons are made with several commonly used density-functional models, including M062X and B3LYP (in some cases with dispersion corrections). The results show that, among the semiempirical models examined, the AM1/d-PhoT model is the most robust at predicting proton affinities. AM1/d-PhoT and DFTB3-3ob/OPhyd reproduce the MP2 potential energy surfaces of 6 associative RNA phosphoryl transfer model reactions reasonably well. Further, a recently developed linear-scaling modified divide-and-conquer model exhibits the most accurate results for binding energies of both hydrogen bonded and stacked nucleobase dimers. The semiempirical models considered here are shown to underestimate the isotropic polarizabilities of neutral molecules by approximately 30%. The semiempirical models also fail to adequately describe torsion profiles for the dimethyl phosphate anion, the nucleoside sugar ring puckers, and the rotations about the nucleoside glycosidic bond. The modeling of pentavalent phosphorus, particularly with thio substitutions often used experimentally as mechanistic probes, was problematic for all of the models considered. Analysis of the strengths and weakness of the models suggests that the creation of robust nextgeneration models should emphasize the improvement of relative conformational energies and barriers, and nonbonded interactions. VC 2015 Wiley Periodicals, Inc. DOI: 10.1002/jcc.23933 Introduction Semiempirical quantum mechanical (QM) methods play an important role in theoretical chemistry by providing atomic and electronic levels of detail to complex chemical problems while retaining the computational efficiency necessary to routinely investigate large systems. [1 3] Semiempirical QM methods can be combined with molecular mechanical (MM) models to form QM/MM methods, [4,5] or linear-scaling quantum force fields (QMFFs) [6,7] which are applied within molecular dynamics simulations to achieve the level of statistical sampling often necessary to make direct comparison with experiment. Semiempirical methods achieve their efficiency [8,9] by replacing expensive multicenter integrals encountered in robust ab initio methods with computationally tractable approximations that are parametrized so that the resulting method retains suitably accurate small molecule geometries, heats of formation, and electronic properties. A broadly applicable semiempirical model can be made by parametrizing to a large set of reference data or by choosing parameters from theoretical arguments. Alternatively, specialized parameter sets can potentially achieve even higher accuracy for a limited subset of chemistry. New limitations or deficiencies in the model become apparent only through their application and testing to chemistries extending beyond the training set. [9] In this spirit, recent effort has been made to test and validate semiempirical methods [10 17] to promote the development of new methods, [8,9,18 22] with particular emphasis on improving the description of biologically relevant systems and nonbonded interactions, such as hydrogen bonding and dispersion interactions. [20],[23 30] In this work, we use large datasets related to biocatalysis, with particular emphasis on RNA catalysis, [31,32] to make extensive comparisons between neglect of diatomic differential overlap (NDDO) and self-consistent density-functional tight-binding (SCC-DFTB) semiempirical methods, including: PM7, [9] PM6, [19] AM1/d- PhoT, [18] AM1, [33] DFTB3-3ob, [20] and DFTB2-mio, [34] and their closely related variations [3,21,24,25],[28,35] to high-level ab initio calculations. The datasets are categorized as follows: proton affinities, polarizabilities, binding energies of nucleobase dimers, 2D conformational profiles of dimethyl phosphate anion (DMP), 2D conformational profiles of nucleoside sugar rings, 1D [a] M. Huang Department of Chemistry, Scientific Computation, University of Minnesota, 207 Pleasant St. SE, Minneapolis, Minnesota, 55455 0431 [b] M. Huang, T. J. Giese, D. M. York Center for Integrative Proteomics Research, BioMaPS Institute for Quantitative Biology, Department of Chemistry and Chemical Biology, Rutgers University, 174 Frelinghuysen Road, Piscataway, New Jersy, 08854 8076 E-mail: york@biomaps.rutgers.edu VC 2015 Wiley Periodicals, Inc. Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 1

FULL PAPER WWW.C-CHEM.ORG Table 1. Benchmark datasets used and calculated in this work. Dataset N Level of theory Proton affinity [71] 118 CBS-QB3 Polarizability 58 MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) Nucleobase dimer [76] 34 CCSD(T)/CBS//MP2/TZVPP and MP2/CBS//MP2/cc-pVTZ 2D PES of DMP Conformation 5184 MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) 2D PES of Sugar Conformation [22] 8 3 441 MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) 1D PES of Glycosidic Torsion 8 3 72 MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) 2D PES of RNA Model Reaction 2 3 441 MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) 1D PES of RNA Model Reaction 6 3 111 1 69 MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) PES stands for potential energy surface and N is the number of structures in each dataset. conformational profiles of nucleoside glycosidic torsions, and 2D (and 1D) energy surfaces of RNA phosphoryl transfer model reactions. The proton affinity and polarizability datasets are composed of a wide range of important molecules in biocatalysis [36] involving amino acid side chain and backbone residues, nucleobases in both keto and enol tautomeric forms, A- and B-form nucleic acid sugar rings, and various phosphates and phosphoranes relevant to phosphoryl transfer reactions [37] and RNA catalysis. [31,32,38] The complexes in the nucleobase dimer dataset are subcategorized into hydrogen bonding and dispersion interactions. These interactions are responsible for the flexible tertiary and quaternary structures of macromolecules and their function. [39] Dimethyl phosphate (DMP) is the simplest molecule to mimic the phosphodiester linkage in the highly charged nucleic acid backbone, and has been widely used as a model compound to simulate the properties of phosphate group. [40 43] Ribose and deoxyribose sugar rings in nucleic acids form the flexible link between the nucleobase and phosphate backbone. [44,45] The nucleobase and sugar ring are connected through a glycosidic bond, and the glycosidic torsion around this bond determines their relative orientation. The nucleoside sugar ring pucker conformation and glycosidic torsion play important roles in the structure and function of nucleic acids, [46 50] and have been studied extensively with quantum electronic structure methods. [51 53] RNA phosphoryl transfer is a fundamental reaction [51,54,55] in biology that is catalyzed by both proteins and RNA enzymes. [56 61] The evaluation of the NDDO-based and SCC-DFTB semiempirical methods presents a clearer perspective of the strengths and weaknesses of the methods and provides insights for their usage and further improvement. The collected benchmark datasets also serve as an important tool for parametrization and/or verification of new computational methods. Methods The methods section is subdivided into subsections that describe details about the semiempirical and densityfunctional QM models that are tested, and the reference data used for comparison (which in many cases involve high-level quantum calculations). The datasets used to compare the models and the ab initio methods used to construct the reference values are summarized in Table 1 and described in more detail below. QM models Table 2 summarizes theoretical models used in this work. The QM models used for comparisons include the PM7, [9] PM6, [19] AM1/d-PhoT, [18] and AM1 [33] NDDO methods; the DFTB3-3ob [20] and DFTB2-mio [34] tight-binding methods; and the B3LYP [62,63] and M062X [64] density functional theory (DFT) methods. Many variations of these methods containing special corrections to improve the description of hydrogen bonds and dispersion interactions are also used in this manuscript. These variations include: PM6-DH1, [28] PM6-D3, [65] AM1-D, [24,25] DFTB2-D, [35] DFTB2-c h, [8] the dispersion corrected DFT methods B3LYP-D3 [65] and M062X-D3, [65] and the long range corrected B3LYP methods CAM-B3LYP [66] and CAM-B3LYP-D3, [65,66] DFTB2-D-c h and DFTB3-D denote the DFTB2-c h and DFTB3-3ob methods augmented with the dispersion correction developed for DFTB2-D, [8] respectively. We note that PM7 natively includes hydrogen bonding and dispersion corrections, [9] and M062X has been parameterized to account for mid-ranged dispersion. [64] For brevity, we refer to AM1/d-PhoT, DFTB3-3ob, and DFTB2-mio as: AM1/d, DFTB3, and DFTB2, respectively, unless explicitly noted otherwise. In addition to the models above, which include modified functional forms, we also make use of some DFTB3 models with specialized parameters, which we label DFTB3-NH, DFTB3-OP, and DFTB3-HNOP. DFTB3- NH is equivalent to the DFTB3 NHmix model used in Ref. [8]. Specifically, DFTB3-NH applies the reparametrized H-N tight binding matrix element splines developed in Ref. [20] to the sp 3 hybridized nitrogens while retaining the DFTB3-3ob parameters for all other atom pairs. DFTB3-OP is equivalent to the DFTB3 OPhyd model developed in Ref. [21], which differs from standard DFTB3-3ob only through the use of reparametrized O-P tight binding matrix element splines. We note that the OPhyd DFTB3 parameters were specifically developed for improving the description of phosphate hydrolysis reactions [21] ; however, we test their use outside of their original application area by applying them within proton affinity calculations, for example. We use DFTB3-HNOP as an abbreviation to denote a method that uses DFTB3-NH for nitrogen containing molecules and DFTB3-OP for phosphorous containing molecules. The recently developed mdc linear-scaling QM force field, [2,3] which is based on DFTB3, is also included in the comparison of nucleobase dimer interactions. Several density-functional methods, including B3LYP [62,63] and M062X [64] (including in some cases corrections for 2 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER Table 2. Theoretical models applied in this work. NDDO and SCC-DFTB denote neglect of diatomic differential overlap and self-consistent-charge densityfunctional tight-binding methods, respectively. Model Description and references CBS CBS-QB3 [67,68] MP2 MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) B3LYP B3LYP/6 31111G(3df,2p)//B3LYP/6 3111G(d,p) B3LYP-D3 B3LYP-D3/6 31111G(3df,2p)//B3LYP-D3/6 3111G(d,p) (B3LYP with dispersion correction [65] ) CAM-B3LYP CAM-B3LYP/6 31111G(3df,2p)//CAM-B3LYP/6 3111G(d,p) (B3LYP with long range correction [66] ) CAM-B3LYP-D3 CAM-B3LYP-D3/6 31111G(3df,2p)//CAM-B3LYP-D3/6 3111G(d,p) [66] M062X M062X/6 31111G(3df,2p)//M062X/6 3111G(d,p) M062X-D3 M062X-D3/6 31111G(3df,2p)//M062X-D3/6 3111G(d,p) (M062X with dispersion correction [65] ) AM1 NDDO-based Austin Model 1 [33] AM1-D AM1 including dispersion correction [24,25] AM1/d AM1/d-PhoT, AM1/d with specific reaction parametrization for phosphoryl transfer reactions [18] PM6 NDDO-based Parameterized Model 6 [19] PM6-DH1 PM6 with dispersion and hydrogen-bonding corrections [28] PM6-D3 PM6 augmented with the dispersion correction developed for DFT [65] PM7 NDDO-based Parameterized Model 7 [9] DFTB2 SCC-DFTB or DFTB2-mio developed for materials and biological systems [34] DFTB2-c h DFTB2-mio with the c h function [8] DFTB2-D DFTB2-mio with dispersion correction [35] DFTB2-D-c h DFTB2-c h augmented with the dispersion correction developed for DFTB2 DFTB3-mio SCC-DFTB including the third-order expansion of the DFT total energy [8] DFTB3 DFTB3-3ob developed for organic and biological application [20] DFTB3-HN DFTB3-3ob with specialized H-N ( NHmod ) parameters for the sp [3] hybridized nitrogen atoms [8,20] DFTB3-OP DFTB3-3ob with specialized O-P ( OPhyd ) parameters for phosphate hydrolysis reactions, DFTB3-3ob/OPhyd [21] DFTB3-HNOP DFTB3-HN for sp [3] N-containing molecules and DFTB3-OP for P-containing molecules DFTB3-D DFTB3-3ob augmented with the dispersion correction developed for DFTB2 mdc A linear-scaling quantum mechanical force field based on DFTB3-3ob [2,3] dispersion), post Hartree-Fock (MP2) and multilevel (CBS-QB3) [67,68] methods were used in this work. All these ab initio calculations were performed using Gaussian 09. [69,70] The DFT calculations used an ultrafine numerical integration grid. Proton affinity reference data The benchmark proton affinity data was systematically calculated at CBS-QB3 [67,68] for the molecules studied in our previous work, [71] except for P(O)(SCH)(O)(OH), which was excluded from the statistics because it often underwent dissociation on geometry optimization. The reference data was computed with CBS-QB3 because this level has previously been found to provide reliable proton affinities when compared to experiment. [71] With exception to the NDDO models, the proton affinity values were computed from approximate enthalpy differences corresponding to the reaction A 2 ðgþ 1H1 ðgþ! HA ðgþ (1) on geometry optimization at the corresponding level of theory. The enthalpy used for the ab initio methods and approximate DFTB methods were computed from standard normal mode analysis methods. The NDDO-based semiempirical methods, however, implicitly include enthalpic correction through the manner of their parametrization. The NDDO models underestimate the heat of formation of H 1 ; therefore, the experimental value 367.2 kcal/mol was used in its place. [13,72 75] A complete list of all proton affinities can be found in the Supporting Information. Table 3 summarizes the error statistics, including: the root mean square error (RMSE), the Pearson correlation coefficient (R) between the model and CBS-QB3 reference values, and the mean unsigned error (MUE) and mean signed error (MSE) and their corresponding standard deviations (STD). The error statistics in Table 3 and Figure 1 are divided into six categories. (1) The amino acid subset includes model compounds for amino acid site chain capped by a methyl group together with glycine and proline as amino acid backbone models. (2) The nucleobase subset contains multiple protonation sites of adenine (A), cytosine (C), guanine (G), thymine (T), and uracil (U). (3) The nucleobase tautomer subset are not technically proton affinities, but are instead the relative proton affinities between the two relevant tautomeric forms of the nucleobase. (4) The ribose subset contains three model compounds for the RNA ribose: 2-hydroxytetrahydrofuran (THF), 2-hydroxy-1,3,4-trimethyltetrahydrofuran (Methyl), and 2- hydroxy-1,4-dimethyl-3-methoxytetrahydrofuran (Ribose), which are successively more similar to the sugar ring of RNA. (5) The phosphate and (6) phosphorane subsets consist of metaphosphates, phosphates and phosphoranes representing vital states along the reaction path of RNA phosphoryl transfer reactions as well as their thio substituents, which are useful mechanism probes of ribozyme catalysis. In addition, the adjusted phosphorane set in Table 3 excludes the pentavalent phosphorus species that incorrectly geometry optimize to form tetravalent phosphorus compounds. Polarizability reference data Isotropic dipole polarizabilities of the neutral molecules within the proton affinity dataset were computed with each method and compared to MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 3

FULL PAPER WWW.C-CHEM.ORG Table 3. Error statistics of proton affinities (kcal/mol). Error M062X B3LYP AM1/d AM1 PM7 PM6 DFTB3-HNOP DFTB3 DFTB2 All data RMSE (R) 1.1 (1.00) 1.6 (1.00) 4.9 (1.00) 7.4 (1.00) 12.2 (0.99) 9.2 (1.00) 11.2 (0.99) 15.4 (0.99) 16.3 (0.99) MUE (STD) 0.9 (0.7) 1.2 (1.0) 3.3 (3.7) 5.3 (5.2) 8.4 (8.9) 8.1 (4.4) 7.8 (8.1) 10.0 (11.7) 10.4 (12.6) MSE (STD) 20.0 (1.1) 0.7 (1.4) 20.3 (4.9) 20.7 (7.4) 26.8 (10.1) 27.5 (5.3) 2.7 (10.9) 0.5 (15.4) 25.6 (15.3) Amino Acid RMSE (R) 1.1 (1.00) 1.1 (1.00) 4.3 (1.00) 6.1 (1.00) 7.0 (1.00) 7.8 (1.00) 5.8 (1.00) 6.1 (1.00) 13.4 (1.00) MUE (STD) 1.0 (0.5) 0.9 (0.7) 3.2 (2.9) 5.4 (2.8) 6.4 (2.9) 6.9 (3.6) 5.2 (2.7) 5.6 (2.5) 10.6 (8.2) MSE (STD) 20.7 (0.8) 0.3 (1.1) 0.2 (4.3) 21.3 (5.9) 26.4 (2.9) 26.3 (4.6) 5.2 (2.7) 2.8 (5.5) 29.4 (9.6) Nucleobase RMSE (R) 1.2 (1.00) 2.0 (1.00) 3.3 (1.00) 3.3 (1.00) 9.7 (1.00) 9.6 (1.00) 10.5 (1.00) 10.5 (1.00) 7.2 (1.00) MUE (STD) 0.8 (0.8) 1.6 (1.2) 2.7 (2.0) 2.5 (2.2) 8.4 (4.8) 8.0 (5.3) 9.6 (4.4) 9.6 (4.4) 5.1 (5.0) MSE (STD) 20.5 (1.1) 1.6 (1.3) 20.0 (3.3) 21.4 (3.0) 27.8 (5.8) 26.8 (6.8) 9.0 (5.4) 9.0 (5.4) 24.2 (5.8) Nucleobase Tautomer RMSE (R) 0.6 (1.00) 0.9 (1.00) 1.9 (0.98) 2.1 (1.00) 3.0 (0.96) 5.3 (0.69) 4.4 (0.85) 4.4 (0.85) 6.7 (0.59) MUE (STD) 0.5 (0.2) 0.8 (0.4) 1.5 (1.2) 1.9 (1.1) 2.6 (1.6) 4.6 (2.6) 4.0 (2.0) 4.0 (2.0) 5.7 (3.5) MSE (STD) 20.3 (0.5) 0.5 (0.7) 1.5 (1.2) 1.9 (1.1) 2.1 (2.2) 3.3 (4.1) 23.3 (3.0) 23.3 (3.0) 24.1 (5.3) Ribose RMSE (R) 1.1 (0.96) 2.1 (0.88) 5.4 (20.51) 4.8 (0.93) 3.8 (0.68) 8.7 (0.72) 7.3 (0.94) 7.3 (0.94) 1.6 (0.96) MUE (STD) 0.8 (0.7) 1.8 (1.2) 4.1 (3.5) 4.6 (1.6) 3.1 (2.2) 8.4 (2.4) 7.2 (1.4) 7.2 (1.4) 1.3 (1.0) MSE (STD) 0.3 (1.0) 21.5 (1.5) 3.8 (3.8) 4.6 (1.6) 23.0 (2.4) 28.4 (2.4) 7.2 (1.4) 7.2 (1.4) 21.3 (1.0) Phosphate RMSE (R) 1.2 (1.00) 1.2 (1.00) 4.5 (1.00) 12.6 (0.99) 4.3 (1.00) 8.1 (1.00) 4.4 (1.00) 4.7 (1.00) 13.2 (1.00) MUE (STD) 1.0 (0.7) 1.0 (0.7) 3.5 (2.7) 10.0 (7.8) 3.7 (2.2) 7.2 (3.7) 3.5 (2.6) 3.8 (2.8) 9.3 (9.4) MSE (STD) 0.4 (1.2) 0.7 (1.0) 0.5 (4.4) 0.1 (12.6) 1.4 (4.0) 27.0 (4.0) 2.5 (3.5) 2.5 (4.0) 4.2 (12.5) Phosphorane RMSE (R) 1.0 (0.98) 1.1 (0.98) 7.7 (0.63) 7.1 (0.84) 22.4 (0.60) 10.7 (0.85) 18.8 (0.43) 29.8 (0.38) 30.3 (0.61) MUE (STD) 0.7 (0.6) 0.8 (0.7) 4.2 (6.4) 6.1 (3.7) 16.9 (14.7) 10.3 (2.9) 11.4 (15.0) 21.5 (20.6) 24.4 (18.0) MSE (STD) 0.4 (0.9) 0.1 (1.1) 22.8 (7.1) 21.1 (7.0) 216.1 (15.6) 210.3 (2.9) 211.2 (15.2) 220.9 (21.2) 220.2 (22.6) Phosphorane (adjusted) RMSE (R) 1.0 (0.98) 1.1 (0.98) 2.8 (0.91) 7.1 (0.84) 3.2 (0.95) 10.7 (0.85) 4.6 (0.85) 4.8 (0.80) 9.6 (0.86) MUE (STD) 0.7 (0.6) 0.8 (0.7) 2.3 (1.6) 6.1 (3.7) 2.6 (1.9) 10.3 (2.9) 3.1 (3.3) 3.2 (3.6) 7.5 (6.0) MSE (STD) 0.4 (0.9) 0.1 (1.1) 20.8 (2.7) 21.1 (7.0) 20.9 (3.1) 210.3 (2.9) 22.9 (3.5) 22.1 (4.3) 0.9 (9.5) N 2 12 6 11 12 The adjusted phosphorane set excludes the tetravalent phosphorus species that are supposed to be pentavalent. N is the number of such tetravalent phosphorus compounds in the phosphorane set. results. The ab initio method polarizabilities were computed analytically within Gaussian 09, whereas the semiempirical model polarizabilities were computed through finite differentiation of the molecular dipole moment with respect to applied dipole field strength. Figure 2 and Table 4 summarize the error statistics of isotropic polarizabilities, which are divided into five Figure 1. MUE and MSE with their STDs of proton affinities (kcal/mol). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] 4 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER Figure 2. MUE and MSE with their STDs of isotropic polarizabilities (Bohr 3 ). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] categories, including the amino acid, nucleobase, ribose, phosphate, and phosphorane subsets as described in the proton affinity reference data section. A complete list of all isotropic polarizabilities are presented in the Supporting Information. Nucleobase dimer reference data The benchmark calculations of the nucleobase dimers were taken from the JSCH-2005 database, [76,77] whose reference interaction energies were calculated with CCSD(T)/CBS//MP2/ TZVPP or MP2/CBS//MP2/cc-pVTZ, as described in Ref. [76]. The hydrogen bonded or stacked dimers in this test set include only those neutral complexes from the JSCH-2005 database [76,77] that result from geometry optimization; that is, we do not include the dimers placed in artificial geometries. A complete list of dimers and extended analysis are provided in the Supporting Information. The nucleobase interactions are compared in two ways. Table 5 and Figures 3 and 4 compare the interactions on geometry optimization of both the dimer and the monomers, whereas Table 6, and Figures 5 and 6 compare the interactions based on single point calculations of dimers evaluated using the reference structures without inclusion of monomer deformation energies. Table 4. Error statistics of isotropic polarizabilities (Bohr 3 ). Error M062X B3LYP PM7 PM6 AM1 AM1/d DFTB3 DFTB2 All data RMSE (R) 1.8 (1.00) 1.8 (1.00) 18.4 (0.95) 18.0 (0.95) 18.7 (0.96) 23.2 (0.97) 26.3 (0.99) 26.8 (0.99) MUE (STD) 1.6 (0.86) 1.5 (0.96) 17.4 (5.92) 17.1 (5.76) 17.9 (5.42) 22.5 (5.37) 25.7 (5.48) 26.2 (5.48) MSE (STD) 21.5 (0.90) 1.5 (1.05) 217.4 (5.92) 217.1 (5.76) 217.9 (5.42) 222.5 (5.37) 225.7 (5.48) 226.2 (5.48) Amino Acid RMSE (R) 0.8 (1.00) 1.3 (1.00) 18.7 (0.99) 18.3 (0.99) 18.3 (0.99) 18.0 (0.99) 21.0 (0.99) 21.4 (0.99) MUE (STD) 0.8 (0.28) 1.3 (0.33) 18.4 (3.40) 18.0 (3.31) 18.0 (3.27) 17.6 (3.84) 20.5 (4.24) 21.0 (4.07) MSE (STD) 20.6 (0.52) 1.3 (0.33) 218.4 (3.40) 218.0 (3.31) 218.0 (3.27) 217.6 (3.84) 220.5 (4.24) 221.0 (4.07) Nucleobase RMSE (R) 3.1 (1.00) 0.5 (1.00) 20.1 (0.99) 19.0 (0.99) 19.0 (0.99) 19.6 (1.00) 29.3 (0.98) 31.1 (0.99) MUE (STD) 3.0 (0.46) 0.4 (0.17) 20.1 (1.24) 19.0 (1.39) 18.9 (1.62) 19.6 (1.41) 29.0 (4.35) 30.9 (3.58) MSE (STD) 23.0 (0.46) 0.2 (0.43) 220.1 (1.24) 219.0 (1.39) 218.9 (1.62) 219.6 (1.41) 229.0 (4.35) 230.9 (3.58) Sugar Pucker RMSE (R) 0.4 (1.00) 2.8 (1.00) 29.9 (1.00) 29.3 (1.00) 28.9 (1.00) 27.9 (1.00) 28.0 (1.00) 28.7 (1.00) MUE (STD) 0.4 (0.13) 2.7 (0.80) 29.2 (6.55) 28.6 (6.40) 28.2 (6.28) 27.2 (6.00) 27.4 (5.79) 28.0 (6.09) MSE (STD) 20.4 (0.13) 2.7 (0.80) 229.2 (6.55) 228.6 (6.40) 228.2 (6.28) 227.2 (6.00) 227.4 (5.79) 228.0 (6.09) Phosphate RMSE (R) 1.7 (1.00) 1.5 (1.00) 13.8 (0.96) 13.5 (0.96) 15.1 (0.94) 22.4 (0.95) 25.5 (0.96) 25.4 (0.96) MUE (STD) 1.6 (0.37) 1.3 (0.66) 13.2 (3.97) 12.9 (3.88) 14.4 (4.53) 22.0 (4.53) 25.0 (5.28) 24.9 (4.94) MSE (STD) 21.6 (0.37) 1.2 (0.77) 213.2 (3.97) 212.9 (3.88) 214.4 (4.53) 222.0 (4.53) 225.0 (5.28) 224.9 (4.94) Phosphorane RMSE (R) 1.5 (1.00) 2.6 (1.00) 15.4 (1.00) 15.9 (1.00) 17.5 (0.99) 27.8 (1.00) 28.0 (1.00) 28.1 (0.99) MUE (STD) 1.5 (0.32) 2.5 (0.72) 15.3 (1.29) 15.8 (1.90) 17.5 (1.64) 27.7 (2.82) 27.8 (3.69) 27.9 (3.62) MSE (STD) 21.5 (0.32) 2.5 (0.72) 215.3 (1.29) 215.8 (1.90) 217.5 (1.64) 227.7 (2.82) 227.8 (3.69) 227.9 (3.62) Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 5

FULL PAPER WWW.C-CHEM.ORG Table 5. Error statistics of binding energies (kcal/mol) of optimized nucleobase dimers. Error M062X-D3 M062X CAM-B3LYP-D3 B3LYP-D3 CAM-B3LYP B3LYP PM7 PM6-DH1 PM6-D3 PM6 All data RMSE (R) 2.7 (0.98) 3.1 (0.98) 2.8 (0.89) 2.3 (0.93) 3.9 (0.88) 5.6 (0.86) 4.1 (0.80) 3.0 (0.88) 3.4 (0.87) 8.7 (0.88) MUE (STD) 2.4 (1.0) 2.9 (1.2) 1.4 (2.5) 1.4 (1.8) 3.7 (1.4) 5.3 (1.7) 3.1 (2.7) 2.3 (1.9) 2.8 (1.9) 8.2 (2.9) MSE (STD) 2.3 (1.4) 2.8 (1.3) 20.1 (2.8) 0.7 (2.2) 2.7 (2.9) 4.7 (3.0) 21.6 (3.8) 1.0 (2.8) 1.7 (2.9) 8.2 (2.9) H-bonded RMSE (R) 2.8 (0.99) 3.3 (0.99) 0.9 (0.99) 1.4 (0.99) 3.7 (1.00) 5.4 (0.99) 3.2 (0.86) 2.6 (0.94) 3.0 (0.96) 9.0 (0.97) MUE (STD) 2.6 (1.0) 3.2 (1.0) 0.8 (0.5) 1.1 (0.8) 3.6 (0.9) 5.3 (1.2) 2.7 (1.8) 2.1 (1.5) 2.6 (1.4) 8.7 (2.5) MSE (STD) 2.6 (1.0) 3.2 (1.0) 0.6 (0.7) 1.1 (0.9) 3.6 (0.9) 5.3 (1.2) 21.1 (3.1) 1.3 (2.2) 2.4 (1.7) 8.7 (2.5) Stacked RMSE (R) 1.1 (0.99) 1.1 (0.99) 7.8 (0.98) 5.6 (0.69) 5.3 (0.96) 6.6 (0.80) 8.1 (0.97) 5.2 (0.66) 5.6 (0.98) 5.7 (0.97) MUE (STD) 1.0 (0.5) 1.0 (0.5) 6.1 (4.9) 3.7 (4.2) 4.2 (3.3) 5.4 (3.8) 6.1 (5.3) 3.8 (3.6) 4.2 (3.6) 4.6 (3.4) MSE (STD) 20.5 (1.0) 0.5 (1.0) 24.9 (6.1) 21.7 (5.3) 24.1 (3.4) 0.2 (6.6) 25.3 (6.1) 21.1 (5.1) 23.5 (4.3) 4.6 (3.4) N 2 1 4 4 2 1 2 2 Error AM1-D AM1/d AM1 mdc DFTB3-D DFTB3 DFTB2-D-c h DFTB2-D DFTB2-c h DFTB2 All data RMSE (R) 5.1 (0.85) 8.3 (0.84) 11.6 (0.86) 1.0 (0.99) 5.2 (0.90) 7.1 (0.86) 3.6 (0.91) 4.6 (0.97) 5.6 (0.86) 6.6 (0.87) MUE (STD) 4.7 (2.1) 7.8 (2.9) 11.2 (3.2) 0.9 (0.5) 4.9 (1.9) 6.8 (2.3) 3.2 (1.8) 4.1 (2.0) 5.2 (2.1) 6.2 (2.3) MSE (STD) 4.1 (3.0) 7.7 (3.3) 11.2 (3.2) 20.1 (1.0) 4.6 (2.5) 6.5 (3.0) 2.7 (2.5) 4.1 (2.0) 4.7 (3.0) 6.0 (2.9) H-bonded RMSE (R) 5.3 (0.95) 8.8 (0.93) 12.0 (0.92) 1.1 (0.99) 5.4 (0.96) 7.5 (0.95) 3.6 (0.97) 4.8 (0.98) 5.6 (0.96) 6.8 (0.97) MUE (STD) 5.0 (1.9) 8.5 (2.2) 11.7 (3.0) 0.9 (0.5) 5.2 (1.7) 7.2 (2.0) 3.2 (1.7) 4.5 (1.9) 5.2 (1.9) 6.5 (2.0) MSE (STD) 4.9 (2.0) 8.5 (2.2) 11.7 (3.0) 20.0 (1.1) 5.2 (1.7) 7.2 (2.0) 3.2 (1.8) 4.5 (1.9) 5.2 (1.9) 6.5 (2.0) Stacked RMSE (R) 3.0 (0.84) 3.6 (0.90) 8.0 (0.91) 0.7 (1.00) 3.2 (0.77) 3.6 (0.96) 3.8 (0.73) 2.2 (1.00) 5.6 (0.87) 5.2 (0.88) MUE (STD) 2.4 (1.8) 2.7 (2.4) 7.6 (2.4) 0.6 (0.3) 2.7 (1.7) 3.3 (1.3) 2.6 (2.7) 1.8 (1.3) 4.7 (2.9) 4.1 (3.2) MSE (STD) 22.1 (2.2) 1.5 (3.3) 7.6 (2.4) 20.3 (0.6) 0.2 (3.2) 1.0 (3.4) 20.9 (3.7) 1.8 (1.3) 0.7 (5.5) 2.0 (4.8) N 2 4 4 1 4 1 3 3 The N denotes the number of stacked dimers that devolved into hydrogen bonded configurations. 6 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER Figure 3. MUE and MSE with their STDs of the nucleobase dimer binding energies (kcal/mol) evaluated from geometry optimized dimers and monomers using semiempirical models with dispersion correction. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] 2D profiles of DMP conformation reference data The conformational energy profiles of dimethyl phosphate anion DMP are displayed in Figure 8 as 2D contours of the backbone torsion angles f and a (see Fig. 7a). The contours were constructed by scanning f and a in a series of constrained geometry optimizations from 08 to 3608 in steps of 58. The benchmark scans were performed using MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p), which we shall refer to as MP2 henceforth, unless specifically noted otherwise. Table 7 lists the torsion angles and the relative energies of the minima and transition states of DMP. The minima labeled gg, tg, and tt denote gauche-gauche, trans-gauche, and transtrans conformations of DMP, respectively. Similarly, gg-tg, tg-tt, tg-gt, and tt-tt indicate the transition states connecting the two corresponding minima. 2D profiles of sugar conformation reference data Figures 9 and 10 display sugar pucker energy profiles of the deoxyadenosine (da), deoxyguanosine (dg), deoxycytidine (dc), and thymidine (dt) DNA nucleosides and the adenosine (ra), guanosine (rg), cytidine (rc), and uridine (ru) RNA nucleosides, respectively. The Z x and Z y axis in these figures are pucker parameters, which are related to the proper torsions m 1 and m 3 in the sugar ring shown in Figure 7 by Z x 5 m 11m 3 2cos ð4p=5þ (2) Z y 5 m 12m 3 2sin ð4p=5þ : (3) The 2D scans are constructed from partial geometry optimizations, whereby the values of Z x and Z y are constrained to their desired values while enforcing the torsion constraints listed in Table 8 to mimic the nucleoside connection to the B- DNA or A-RNA backbone and to avoid intramolecular hydrogen bonding interactions. Following our previous work, [22] we use MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p) to construct the reference 2D profiles. Figure 4. MUE and MSE with their STDs of the nucleobase dimer binding energies (kcal/mol) evaluated from geometry optimized dimers and monomers using semiempirical models without dispersion correction. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 7

FULL PAPER WWW.C-CHEM.ORG Table 6. Error statistics of interaction energies (kcal/mol) of nucleobase dimers evaluated from single point calculations at the reference structure geometries. Error M062X-D3 M062X CAM-B3LYP-D3 B3LYP-D3 CAM-B3LYP B3LYP PM7 PM6-DH1 PM6-D3 PM6 All data RMSE (R) 1.2 (0.98) 1.5 (0.99) 1.1 (1.00) 0.6 (1.00) 4.0 (0.92) 6.2 (0.86) 2.9 (0.90) 1.6 (0.97) 1.7 (0.98) 8.2 (0.98) MUE (STD) 1.1 (0.6) 1.4 (0.4) 1.0 (0.5) 0.6 (0.3) 3.0 (2.7) 5.1 (3.5) 2.4 (1.7) 1.2 (1.0) 1.5 (0.8) 8.0 (1.4) MSE (STD) 0.5 (1.1) 1.1 (1.0) 20.9 (0.7) 20.4 (0.5) 3.0 (2.7) 5.1 (3.5) 21.0 (2.8) 0.5 (1.5) 1.1 (1.3) 8.0 (1.4) H-bonded RMSE (R) 1.0 (1.00) 1.5 (1.00) 1.2 (1.00) 0.6 (1.00) 2.1 (1.00) 3.9 (1.00) 3.1 (0.89) 1.6 (0.97) 1.8 (0.98) 8.2 (0.98) MUE (STD) 0.9 (0.4) 1.4 (0.4) 1.0 (0.5) 0.6 (0.3) 2.0 (0.5) 3.8 (0.6) 2.6 (1.7) 1.2 (1.1) 1.6 (0.8) 8.1 (1.4) MSE (STD) 0.9 (0.4) 1.4 (0.4) 21.0 (0.5) 20.5 (0.4) 2.0 (0.5) 3.8 (0.6) 21.1 (2.9) 0.5 (1.6) 1.3 (1.2) 8.1 (1.4) Stacked RMSE (R) 2.4 (1.00) 1.4 (1.00) 0.7 (0.99) 0.5 (0.99) 10.2 (0.88) 14.5 (0.80) 1.4 (0.98) 0.9 (0.98) 1.1 (1.00) 7.9 (0.94) MUE (STD) 2.4 (0.3) 1.4 (0.3) 0.6 (0.2) 0.5 (0.3) 10.0 (1.8) 14.3 (2.3) 1.3 (0.6) 0.8 (0.5) 1.0 (0.5) 7.8 (1.3) MSE (STD) -2.4 (0.3) 21.4 (0.3) 0.4 (0.5) 0.3 (0.4) 10.0 (1.8) 14.3 (2.3) 0.2 (1.4) 0.6 (0.7) 20.4 (1.0) 7.8 (1.3) Error AM1-D AM1/d AM1 mdc DFTB3-D DFTB3 DFTB2-D-c h DFTB2-D DFTB2-c h DFTB2 All data RMSE (R) 2.1 (0.95) 11.1 (0.92) 14.5 (0.82) 1.3 (0.99) 5.0 (0.96) 7.7 (0.95) 3.8 (0.96) 4.9 (0.95) 6.4 (0.94) 7.6 (0.95) MUE (STD) 1.8 (1.1) 10.9 (2.3) 14.1 (3.3) 1.1 (0.7) 4.7 (1.6) 7.5 (1.8) 3.3 (1.7) 4.6 (1.9) 6.1 (2.0) 7.3 (2.0) MSE (STD) 1.0 (1.8) 10.9 (2.3) 14.1 (3.3) 20.0 (1.3) 4.7 (1.6) 7.5 (1.8) 3.3 (1.7) 4.6 (1.9) 6.1 (2.0) 7.3 (2.0) H-bonded RMSE (R) 2.1 (0.96) 11.1 (0.93) 14.3 (0.85) 1.3 (0.99) 5.2 (0.98) 7.3 (0.97) 4.0 (0.97) 5.2 (0.97) 6.0 (0.96) 7.3 (0.96) MUE (STD) 1.8 (1.1) 10.8 (2.3) 13.9 (3.4) 1.1 (0.7) 5.0 (1.4) 7.1 (1.6) 3.6 (1.6) 4.9 (1.7) 5.7 (1.8) 7.0 (1.9) MSE (STD) 1.3 (1.7) 10.8 (2.3) 13.9 (3.4) 20.0 (1.3) 5.0 (1.4) 7.1 (1.6) 3.6 (1.6) 4.9 (1.7) 5.7 (1.8) 7.0 (1.9) Stacked RMSE (R) 1.6 (1.00) 11.7 (0.88) 16.4 (0.86) 1.5 (0.97) 2.3 (0.99) 9.9 (0.93) 1.4 (1.00) 1.8 (1.00) 9.1 (0.97) 9.5 (0.96) MUE (STD) 1.4 (0.8) 11.5 (2.4) 16.2 (2.3) 1.3 (0.8) 2.2 (0.6) 9.9 (1.2) 1.4 (0.3) 1.7 (0.5) 9.1 (0.8) 9.4 (0.9) MSE (STD) 21.3 (0.9) 11.5 (2.4) 16.2 (2.3) 0.1 (1.5) 2.2 (0.6) 9.9 (1.2) 1.4 (0.3) 1.7 (0.5) 9.1 (0.8) 9.4 (0.9) 8 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER Figure 5. MUE and MSE with their STDs of the nucleobase dimer interaction energies (kcal/mol) evaluated from single point calculations on the reference structure geometries using semiempirical models with dispersion correction. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] The pseudorotation angles, puckering amplitudes, and the relative energies of the stationary points of the DNA and RNA nucleosides are listed in Tables 9 and 10, respectively. MIN1 and MIN2 are the Eastern (large Z x ) and Western (small Z x ) minima observed in the 2D contours, and TS1 and TS2 are the Northern (large Z y ) and Southern (small Z y )transitionstates. 1D profiles of glycosidic torsion reference data Figure 11 displays 1D energy profiles of deoxyribonucleosides (da, dg, dc, and dt) and ribonucleosides (ra, rg, rc, and ru) as a function of the glycosidic torsion coordinate. The 1D profiles were constructed by scanning the glycosidic torsion v from 08 to 3608 in steps of 58. Several additional torsion constraints were imposed on the geometry optimizations to mimic the nuceleoside in the B-DNA or A-RNA and avoid intramolecular hydrogen bonding interactions. These torsion constraints are listed in Table 8 together with constraint parameters from NAB program. [78] The benchmark scans were performed using MP2/6 31111G(3df,2p)//MP2/6 3111G(d,p). Tables 11 and 12 list the glycosidic torsions and relative energies of the stationary points along the 1D reaction coordinate computed from M062X, AM1/d-PhoT, PM6, and DFTB3. Comparisons involving the remaining methods have been made available in the Supporting Information. MIN1 and MIN2 are local and global minima with syn and anti glycosidic torions, respectively. TS1 and TS2 are transition states with lower and higher barriers, which have glycosidic torsions of about 1208 and 08, respectively. 1D and 2D profiles of phosphoryl transfer reference data Figure 13 contains 2D contours of 2 0 O-transphosphorylation (RNA transesterification) model reactions: the departure of Figure 6. MUE and MSE with their STDs of the nucleobase dimer interaction energies (kcal/mol) evaluated from single point calculations on the reference structure geometries using semiempirical models without dispersion correction. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 9

FULL PAPER WWW.C-CHEM.ORG Figure 7. Definition of torsion angles in the a) dimethyl phosphate anion and b) nucleoside sugar ring. The sugar puckering parameters Z x and Z y are defined as, Z x 5ðm 1 1m 3 Þ=ð2cosð4p=5ÞÞ and Z y 5ðm 1 2m 3 Þ=ð2sinð4p=5ÞÞ. methoxide (MethO) from ethylene phosphate (EP), and the departure of ethoxide (EthO) from a 2 0,3 0 -cyclic phosphate model (SP). Henceforth, these reactions will be referred to as MethOEP and EthOSP, respectively. The transition state structures of the model reactions are shown in Figure 12. These energy profiles were constructed from relaxed potential energy scans of the P-O2 0 forming bond and the P-O5 0 breaking bond from 1.60 Å to 2.60 Å in steps of 0.05 Å. MP2 is used as the reference benchmark. Table 13 lists the bond lengths and relative energies of the stationary points. TS1 and MIN2 refer to the stationary points whose P-O2 0 forming bond is larger than the P-O5 0 breaking bond. TS2 and MIN3 denote the stationary points whose P-O2 0 forming bond is smaller than the P-O5 0 breaking bond. TS1 and TS2 are also called early and late TS, respectively. [51,55] Table 14 lists the values of the reaction coordinate and relative energy at the stationary points of 7 RNA phosphoryl transfer reaction profiles computed in the gas-phase. [51] Four of the reactions follow associative, concerted mechanisms: CH 3 OEP, PhOEP, 4-CNPhOEP, and CH 3 COOEP. Two of the reactions are associative, stepwise mechanisms: CH 3 CH 3 CHOEP and CH 3 CH 2 OEP. One of the reactions is a dissociative mechanism: CH 3 OPO 3. The associative reaction profiles are a function of DR, where DR 5 R P2O5 0 - R P2O2 0. The dissociative reaction profile is a function of the PAO bond length. Figures and error statistics of these energy profiles are provided in the Supporting Information. Results and Discussion Proton affinity Proton affinities and related metrics, such as gas phase acidities and basicities, have frequently been used to test and compare semiempirical models. [8,21,72 75] For example, Dewar [72] found AM1 to adequately reproduce the experimental proton affinities of neutral molecules. Burk [73,74] further compared the proton affinities of neutral compounds with AM1 and PM3 and concluded AM1 to be superior. More recently, Toomsalu [75] compared AM1, PM3, PDDG, and PM6; and found PM6 and AM1 to best reproduce gas phase basicities and AM1 and PM3 to best reproduce gas phase acidities. As described in the Methods section, the benchmark proton affinity data used as reference values in the comparisons that follow were performed at the CBS-QB3 level and reported previously. [71] Figure 8. 2D conformational contours of dimethyl phosphate anion DMP. The energy units along each contour curve are in kcal/mol and all energy values are calculated with respective to the global minimum. 10 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER Table 7. Torsion angles f and a (8), and relative energies DE (kcal/mol) of minima (gg, tg, and tt) and transition states (gg-tg, tg-tt, tg-gt, and tt-tt) of dimethyl phosphate anion (DMP). gg tg tt gg-tg tg-tt tg-gt tt-tt f a DE f a DE f a DE f a DE f a DE f a DE f a DE Methods MP2 69.9 69.9 0.0 2162.3 69.3 1.7 151.5 151.5 3.7 130.6 69.9 2.6 157.5 126.6 3.8 287.3 87.3 2.8 180.0 180.0 3.7 M062X 70.1 70.1 0.0 2166.4 70.1 1.5 154.7 154.7 3.5 133.1 70.7 2.5 159.1 132.3 3.6 285.5 86.0 2.6 180.0 180.0 3.4 B3LYP 73.6 73.6 0.0 2167.2 71.6 1.1 148.1 148.1 2.8 132.3 73.3 1.9 154.7 130.7 2.8 288.1 88.1 2.2 180.0 180.0 2.9 DFTB3-OP 78.2 78.2 0.0 2179.6 79.0 0.3 154.0 154.0 0.7 126.9 79.9 0.4 157.0 121.7 0.6 180.0 180.0 0.5 DFTB3 79.0 79.0 0.0 179.6 80.0 0.3 160.3 160.3 0.4 126.6 81.2 0.4 157.9 121.9 0.5 180.0 180.0 0.4 DFTB2 77.3 77.3 0.0 180.0 180.0 1.0 PM7 66.7 66.7 0.0 PM6 75.9 75.9 0.0 AM1/d 78.6 78.6 0.0 AM1 78.6 78.6 0.0 The results in Figure 1 and Table 3 show that all the DFT and semiempirical models evaluated here give excellent overall linear correlations. Both M062X and B3LYP are in close agreement with CBS-QB3 calculations with overall MUEs of 0.9 and 1.2 kcal/mol, respectively. The NDDO-based semiempirical models systematically reproduce the proton affinities better than approximate DFTB models. Among the semiempirical methods, AM1/d-PhoT and AM1 produce the lowest overall MUEs of 3.3 kcal/mol and 5.3 kcal/mol, respectively; which is consistent with previous studies. [72,74,75] The lowest STDs of overall MSEs were found for AM1/d-PhoT (4.9 kcal/mol) and PM6 (5.3 kcal/mol), suggesting that they are the best for estimating relative proton affinities, although PM6 systematically underestimate absolute proton affinities. Use of both NHmod and OPhyd DFTB3 parameters slightly improves the amino acid and phosphorane proton affinities, respectively. Abnormally large negative proton affinity errors occur for molecules involving sulfur in the phosphorane set using AM1/d- PhoT, PM7, and all DFTB models due to their failure to reproduce the expected pentavalent phosphorus geometrical structure, although low MUEs set were obtained for the phosphate set with these models except DFTB2. Note that AM1/d-PhoT has not been specifically parameterized for sulfur and the sulfur parameters in AM1 are used here without modification. When those molecules are removed from the statistics, both DFTB3-HNOP and DFTB3 systematically overestimate the proton affinities except the phosphorane set. Chemical modifications that involve the substitution of phosphoryl oxygens with sulfur represent important experimental probes to study metal ion binding and ribozyme mechanisms. [37,79,80] Consequently, the modeling of pentavalent phosphorus both with and without thio substitutions is an important area that warrants attention in the development of next-generation semiempirical quantum models for nucleic acid reactivity. Polarizability Figure 2 and Table 4 show good linear correlations between the model and reference isotropic polarizabilities for all models. Both M062X and B3LYP closely match MP2 calculations with overall MUEs of 21.5 and 1.5 a.u., although M062X and B3LYP slightly under- and over-estimate the static isotropic polarizabilities, respectively. All the semiempirical methods evaluated here systematically underestimate the reference isotropic polarizabilities by approximately 30%, which is consistent with previous studies. [81 83] The systematic underestimation of polarizability by semiempirical methods is attributed to the use of a minimal basis set, because the accuracy of polarizability depends largely on the size of the basis. [81,82,84] For the case of DFTB models, it is exacerbated by monopole representation of charge fluctuations. [82] Nucleobase dimer The JSCH-2005 dataset of hydrogen bonded and stacked nucleobase dimers [76] has been widely used to test noncovalent interactions. [3,17,24,27 29] The intermolecular interactions contained within JSCH-2005 involve both geometry optimized Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 11

FULL PAPER WWW.C-CHEM.ORG Figure 9. 2D contours of sugar pseudorotation of DNA nucleosides. The energy units along each contour curve are in kcal/mol and all energy values are calculated with respect to the global minimum. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] complexes and ad hoc, artificial geometries chosen to mimic the relative orientation of dimers within DNA or RNA. We limit the scope of our comparisons to those dimers whose reference structures were optimized to a stable minimum; however, we compare the models to the reference calculations twice. In the first comparison, the interactions are computed on optimization of the complex s geometry (Figs. 3 and 4, and Table 5). In the second comparison, we compute the model interactions from single point calculations at the reference geometry (Figs. 5 and 6, and Table 6). Large differences between these two comparisons are indicative of a significant change to the geometry on optimization. For example, previous comparisons of AM1, PM3, AM1-D, and PM3-D have found that the inclusion of dispersion corrections not only greatly improve the stacked dimer interactions, but it proved vital to prevent destacking of the dimers into hydrogen bonded complexes on geometry optimization. [24,25] Hydrogen-bonded DNA base pair interactions have also been shown to be improved with PM6- DH1 [29] and DFTB2-D. [35] Our results indicate that the models with dispersion correction and/or hydrogen bonding correction greatly improve the binding energies of hydrogen bonded and stacked dimmers. [16,24,27,28,85] However, most of dispersion corrected models, except M062X- D3, M062X, mdc, and DFTB2-D, still fail to correctly locate all of the stable stacked dimers on geometry optimization. Among the models examined here, mdc produces the lowest overall interaction energy MUE (0.9 kcal/mol) on geometry optimization. PM6-DH1 and DFTB2-D-c h give the lowest overall MUE amongst the NDDO and DFTB methods. The best models for hydrogen bonded nucleobase dimers are CAM- B3LYP-D3, mdc, and B3LYP-D3, whose MUEs are 0.7, 0.9, and 1.1 kcal/mol, respectively. The most accurate results for stacked nucleobase dimers are obtained with mdc, M062X-D3 and M062X with MUE of 0.6, 1.0, and 1.0 kcal/mol, respectively. The single point calculations indicate that all DFTB models underestimate the interaction energies. 2D profiles of DMP conformation The MP2 energy profile of dimethyl phosphate anion (DMP) in Figure 8 has a plane of symmetry and contains three unique minima (gg, tg, and tt) and four significant transition states 12 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER Figure 10. 2D contours of sugar pseudorotation of RNA nucleosides. The energy units along each contour curve are in kcal/mol and all energy values are calculated with respect to the global minimum. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] (gg-tg, tg-tt, tg-gt, and tt-tt). Their relative energies are all within 4 kcal/mol of the global minimum, as shown in Table 7. The MP2 stationary point relative mimima are slightly higher than those reported in previous works, [86,87] but agree to within 0.4 kcal/mol. The M062X energy profile is in excellent agreement with MP2; their stationary points are within 20.16 kcal/mol of each other, on average. The B3LYP and MP2 energy profiles qualitatively resemble, but the B3LYP stationary point relative energies are 25% lower. The DFTB3-OPhyd and DFTB3 energy profiles are very similar to each other, but are in poor agreement with the ab initio profiles. The relative energies of local minima and transition states are less than 0.8 kcal/mol, which are five times smaller than MP2 and M062X, and suggests a lack of preference among the gg, tg, and tt conformations. Furthermore, the tg-gt transition state is incorrectly predicted to be a shallow minimum. The DFTB2 and NDDO-based semiempirical models only correctly predict the gg global minimum and, like DFTB3, produce a shallow minimum where the tg-gt transition state would be predicted. The DFTB2 tt-tt conformer is actually a gg-gg transition state connecting the two gg conformers. 2D profiles of sugar conformation Figures 9 and 10 display the 2D profiles of DNA and RNA sugar pseudorotation potential energy surfaces, respectively. In our previous work, [22] AM1/d-PhoT, PM6, and DFTB3-mio have Table 8. Constrained dihedral angles in nucleosides. Dihedral angle B-DNA A-RNA b H5 0 -O5 0 -C5 0 -C4 0 2151.5 2179.9 c O5 0 -C5 0 -C4 0 -C3 0 30.9 47.4 e C4 0 -C3 0 -O3 0 -H3 0 159.1 2151.7 C3 0 -C2 0 -O2 0 -H2 0 2169.7 v R O4 0 -C1 0 -N9-C4 299.4 2166.1 v Y O4 0 -C1 0 -N1-C2 299.4 2166.1 m 1 O4 0 -C1 0 -C2 0 -C3 0 28.7 225.8 m 3 C2 0 -C3 0 -C4 0 -O4 0 29.1 236.2 The constrained torsion values are taken from NAB program in the AmberTools 13 program suite. [78] The v and m 1, m 3 torsion constraints are imposed on geometry optimizations in the construction of sugar conformation and glycosidic torion profiles, respectively. For glycosidic bond torsion (v) numbering scheme for purines (R) and pyrimidines (Y) are labeled by subscripts and indicated separately. Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 13

FULL PAPER WWW.C-CHEM.ORG Table 9. Pseudorotation phases P h (8), pucker amplitudes A r (8) and relative energies DE (kcal/mol) of the minima (MIN) and transition states (TS) of DNA nucleosides. MIN1 MIN2 TS1 TS2 P h A r DE P h A r DE P h A r DE P h A r DE da MP2 358.7 34.3 2.9 189.8 34.7 0.0 83.7 40.6 4.5 295.0 13.4 4.9 DFTB3 [a] 347.0 23.4 0.6 194.6 28.3 0.0 66.5 28.6 1.1 289.1 8.1 0.9 DFTB3 [b] 349.3 22.4 0.3 191.9 26.9 0.0 58.7 26.9 0.6 299.9 8.3 0.6 PM6 286.6 11.0 0.0 PM7 251.9 8.7 0.0 dg MP2 357.8 33.9 3.1 190.6 34.8 0.0 83.2 40.4 4.7 293.7 14.2 5.0 DFTB3 [a] 345.1 23.4 0.7 195.3 28.6 0.0 67.8 28.3 1.3 290.8 8.9 1.0 DFTB3 [b] 346.8 22.3 0.4 193.1 27.2 0.0 61.9 26.7 0.7 283.7 7.9 0.7 PM6 280.4 11.4 0.0 PM7 243.2 10.0 0.0 dc MP2 36.8 37.5 2.8 180.2 35.4 0.0 89.6 39.9 3.9 344.2 3.6 4.9 DFTB3 [a] 353.1 18.4 0.5 190.3 27.1 0.0 91.4 29.0 1.1 282.6 4.2 0.7 DFTB3 [b] 357.1 18.1 0.1 187.6 25.5 0.0 235.2 4.0 0.4 PM6 303.4 10.1 0.0 PM7 315.9 7.9 0.0 dt MP2 26.5 33.5 3.2 180.8 35.4 0.0 90.6 38.3 4.5 4.9 3.4 5.0 DFTB3 [a] 354.9 17.3 0.7 189.3 27.5 0.0 95.2 22.5 1.7 310.2 4.0 0.8 DFTB3 [b] 358.2 17.0 0.3 186.9 26.0 0.0 269.9 2.4 0.5 PM6 299.8 8.1 0.0 PM7 302.5 3.6 0.0 [a] DFTB3-mio. [b] DFTB3-3ob. been evaluated by comparing to the benchmark MP2 calculations. Here, we focus on the latest semiempirical methods, PM7 and DFTB3-3ob. The DFTB3-3ob energy profiles resemble DFTB3-mio and their agreement with MP2 energy profiles is poor. The conspicuous feature of DFTB3-3ob and DFTB3-mio DNA profiles is the flatness of the potential energy surface. The relative energies of DFTB3-mio local minima and transition states are 3 to 5 times smaller than MP2, suggesting a lack of preference between the C2 0 -endo and C3 0 -endo conformation. The relative energies of DFTB3-3ob stationary points are less than 0.8 kcal/mol, even smaller than those of DFTB3-mio. The puckering amplitude of DFTB3-3ob stationary points are also slightly smaller than DFTB3-mio. The northern pseudorotation path shown in DFTB3-3ob and DFTB3-mio profiles is not preferred, which is counter to steric arguments [44] and previous ab initio computation. [22,88] Pyrimidine deoxynuclosides from DFTB3-3ob do not have a northern TS1. In the DFTB3-mio RNA nucleoside profiles, the pyrimidines have only one minimum, Table 10. Pseudorotation phases P h (8), pucker amplitudes A r (8) and the relative energies DE (kcal/mol) of the minima (MIN) and transition states (TS) of RNA nucleosides. MIN1 MIN2 TS1 P h A r DE P h A r DE P h A r DE ra MP2 8.5 42.5 0.0 200.8 33.1 0.0 87.9 32.2 3.7 DFTB3 [a] 14.3 31.8 2.2 172.8 30.2 0.0 36.5 27.8 2.4 DFTB3 [b] 166.9 29.6 0.0 PM6 206.4 15.8 0.0 PM7 204.7 24.8 0.0 rg MP2 9.1 42.5 0.1 197.9 33.3 0.0 85.5 34.0 3.7 DFTB3 [a] 14.5 31.6 2.6 173.0 30.5 0.0 36.2 27.1 2.7 DFTB3 [b] 167.2 30.0 0.0 PM6 203.2 17.9 0.0 PM7 204.1 25.3 0.0 rc MP2 13.5 41.4 0.0 207.5 31.1 0.9 91.2 12.6 3.8 DFTB3 [a] 193.2 24.2 0.0 DFTB3 [b] 189.7 21.9 0.0 PM6 197.7 11.1 0.0 PM7 199.9 20.1 0.0 ru MP2 12.7 42.0 0.0 207.8 31.9 0.9 89.2 14.3 4.3 DFTB3 [a] 193.4 26.1 0.0 DFTB3 [b] 190.9 24.1 0.0 PM6 198.2 12.6 0.0 PM7 200.0 21.3 0.0 [a] DFTB3-mio. [b] DFTB3-3ob. 14 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

WWW.C-CHEM.ORG FULL PAPER Figure 11. 1D conformational contours of glycosidic torion of nucleosides. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] whereas the local minimum of purine nucleosides are barely bound. The DFTB3-3ob ribose energy profiles have only one minimum. The PM7 energy profiles are similar to PM6; they both predict a single circular-shaped minimum for both DNA and RNA nucleosides. The puckering amplitudes of the PM6 stationary points are 2 3 times smaller than MP2, indicative of a preference for a flat sugar ring. The PM7 DNA and RNA nucleoside pucker amplitudes are smaller and larger than PM6, respectively. We hypothesize that the failure of semiempirical methods to reproduce sugar ring conformation is largely caused by the use of a minimal basis set and, in the case of NDDO-based models, further aggravated by the atomic orbital orthogonality condition. An ad hoc correction potential has recently been developed to improve sugar pucker profiles for both NDDO and DFTB semiempirical models. [22] 1D profiles of glycosidic torsion The MP2 nucleoside glycosidic torsion profiles shown in Figure 11 have two minima: an anti-conformation global minimum (MIN2) and a syn-conformation local minimum (MIN1). The two minima are connected by two transition states (TS1 and TS2). The TS1 pathway is favored by 6.93 kcal/mol, on average. The M062X energy profiles of DNA and RNA nucleosides closely match the benchmark MP2 results. The energy profiles Table 11. Glycosidic torsion v (8) and relative energies DE (kcal/mol) of the minima (MIN) and transition states (TS) of DNA nucleosides. MIN1 MIN2 TS1 TS2 v DE v DE v DE v DE da MP2 87.2 2.8 239.3 0.0 117.6 3.3 346.9 7.9 M062X 88.3 3.4 239.4 0.0 112.6 3.6 347.2 8.2 AM1/d 242.6 0.0 351.1 7.7 PM6 29.0 7.0 239.7 0.0 50.4 7.3 344.8 8.2 DFTB3 223.1 0.0 352.3 5.1 dg MP2 68.5 3.3 245.9 0.0 116.8 4.3 351.2 8.0 M062X 62.0 3.5 242.5 0.0 113.6 4.6 351.4 8.1 AM1/d 244.7 0.0 354.1 6.6 PM6 241.0 0.0 351.4 7.8 DFTB3 228.8 0.0 2.6 5.3 dc MP2 67.5 6.5 195.2 0.0 102.8 8.1 353.5 14.9 M062X 63.9 6.5 194.3 0.0 103.7 8.2 350.7 15.1 AM1/d 225.1 0.0 351.4 13.5 PM6 19.7 10.7 232.0 0.0 38.2 11.0 348.9 11.5 DFTB3 193.7 0.0 349.0 11.3 dt MP2 68.1 6.2 235.8 0.0 104.9 7.0 346.6 13.2 M062X 62.8 6.0 235.3 0.0 103.1 7.0 346.1 12.9 AM1/d 31.3 9.8 238.9 0.0 44.3 9.9 352.6 11.1 PM6 19.1 8.8 238.9 0.0 40.3 9.2 345.0 10.0 DFTB3 225.0 0.0 346.3 9.2 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 15

FULL PAPER WWW.C-CHEM.ORG Table 12. Glycosidic torsion v (8) and relative energies DE (kcal/mol) of the minima (MIN) and transition states (TS) of RNA nucleosides. MIN1 MIN2 TS1 TS2 v DE v DE v DE v DE ra MP2 93.8 2.8 192.8 0.0 133.5 4.4 19.0 12.9 M062X 95.4 3.4 193.1 0.0 132.7 4.7 16.1 12.6 AM1/d 101.2 3.5 205.8 0.0 121.4 3.7 19.3 12.1 PM6 102.2 2.4 199.8 0.0 126.3 2.7 18.8 13.6 DFTB3 103.6 2.4 194.5 0.0 119.9 2.5 21.5 8.1 rg MP2 91.6 3.6 196.5 0.0 133.1 5.4 11.8 11.3 M062X 94.0 4.2 196.8 0.0 132.9 5.7 7.6 11.1 AM1/d 100.2 4.1 206.8 0.0 122.7 4.3 19.6 10.3 PM6 102.9 3.1 202.5 0.0 125.1 3.3 16.1 12.5 DFTB3 103.1 3.1 199.2 0.0 121.9 3.3 15.3 7.6 rc MP2 97.2 7.6 193.4 0.0 135.7 10.3 10.2 21.2 M062X 97.0 7.9 193.2 0.0 136.9 10.3 7.7 21.0 AM1/d 195.8 0.0 17.3 21.2 PM6 111.8 4.6 195.7 0.0 116.8 4.6 17.0 18.1 DFTB3 103.7 6.0 191.9 0.0 124.9 6.4 14.9 16.6 ru MP2 95.7 5.9 195.3 0.0 139.9 8.5 8.3 17.3 M062X 96.1 6.3 194.7 0.0 139.7 8.7 4.2 17.2 AM1/d 203.0 0.0 18.3 17.4 PM6 102.7 3.7 198.7 0.0 123.1 3.9 15.3 15.8 DFTB3 101.7 4.8 194.9 0.0 127.5 5.3 18.9 14.0 from semiempirical methods are in qualitative agreement with MP2 and M062X. The semiempirical models correctly predict the locations of the global MIN2 and TS2; however, AM1/d and DFTB3 fail to predict at least one of the MIN1 local minima within the set of DNA nucleosides, as indicated in Table 11, and AM1/d fails to predict a local minimum for the rc and ru RNA nucleosides. Even when the semiempirical models do correctly predict the existence of a MIN1 local minimum, their depth is far too shallow (0.5 kcal/mol) in comparison to the ab initio results. Amongst the semiempirical models, DFTB3 underpredicts the height of the TS2 barrier by approximately 3 kcal/mol, whereas AM1/d and PM6 agree with MP2 to within 1 kcal/mol. 1D and 2D profiles of RNA phosphoryl transfer reactions Figure 13 and Table 13 describe the RNA phosphoryl transfer model reactions shown in Figure 12. The late transition state and close barrier observed in the MP2, M062X, and B3LYP 2D energy profiles suggest that both of the model reactions undergo an associative asynchronous concerted mechanism. [51,55] The AM1/d energy profiles are in good agreement with MP2; the TS barrier difference between AM1/d and MP2 methods is 2.29 kcal/mol on average. The DFTB3-OPhyd profiles resemble the ab initio results; however, formally, DFTB3- OPhyd predicts an associative stepwise mechanism with a slightly bound intermediate. The agreement between DFTB3, DFTB2, PM7, PM6, and AM1 energy profiles with MP2 is poor. The DFTB3 energy profiles show both dissociative and associative mechanisms. Although DFTB2 predicts associative concerted mechanisms, in agreement with the MP2 results, the DFTB2 reaction paths are noticeably different from MP2; the DFTB2 TS is 8 kcal/mol lower than MP2. The PM7, PM6, and AM1 energy profiles show associative stepwise mechanisms passing through an intermediate bound by more than 7 kcal/mol. Table 14 compares stationary points for several RNA model reaction profiles. The MP2 reference calculations of the associative RNA model reactions involving poor leaving groups predict either no stable intermediate or a shallow Figure 12. Structures of transition states for RNA phosphoryl transfer model reactions a) CH 3 OEP and b) CH 3 CH 2 OSP, and c) a schematic for the native reaction in RNA. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] 16 Journal of Computational Chemistry 2015, DOI: 10.1002/jcc.23933 WWW.CHEMISTRYVIEWS.COM

Figure 13. 2D contours of RNA phosphoryl transfer model reactions CH 3 OEP and CH 3 CH 2 OSP. The relative potential energies are calculated with respective to reactant energy of each model reaction in unit of kcal/mol. Table 13. The bond lengths of P-O2 0 forming bond R1 (Å) and P-O5 0 breaking bond R2 (Å), and the relative energy DE (kcal/mol) of stationary points of RNA phosphoryl transfer model reactions, CH 3 OEP and CH 3 CH 2 OSP. TS1 MIN2 TS2 MIN3 Methods R1 R2 DE R1 R2 DE R1 R2 DE R1 R2 DE CH 3 OEP MP2 1.81 2.56 43.49 M062X 1.78 2.53 43.59 B3LYP 1.84 2.46 43.04 AM1/d 1.88 2.33 42.22 DFTB3-OP 2.17 1.82 34.59 1.92 1.85 33.64 1.84 2.27 37.76 DFTB3 Mech1 [a] 2.67 2.24 38.58 2.60 2.55 36.31 2.18 2.62 41.32 DFTB3 Mech2 [a] 1.91 2.06 48.65 DFTB2 2.15 2.25 31.05 PM7 2.52 1.72 43.62 1.81 1.77 32.83 1.74 2.52 47.61 PM6 2.77 1.79 28.03 1.94 1.85 17.48 1.84 2.86 38.62 AM1 2.89 1.70 23.46 1.80 1.76 6.53 1.73 2.94 26.44 CH 3 CH 2 OSP MP2 1.84 2.48 29.88 M062X 1.80 2.49 30.78 B3LYP 1.86 2.38 28.87 AM1/d 1.88 2.37 26.58 DFTB3-OP 2.15 1.82 18.79 1.95 1.84 18.33 1.85 2.29 23.11 DFTB3 Mech1 [a] 2.66 2.24 24.36 2.60 2.54 22.35 2.17 2.62 27.59 1.79 2.74 23.51 DFTB3 Mech2 [a] 1.96 2.09 34.71 DFTB2 1.86 2.55 21.71 PM7 2.51 1.71 22.29 1.84 1.76 15.13 1.75 2.53 29.61 PM6 2.83 1.78 10.14 2.04 1.83 0.56 1.85 2.87 21.74 AM1 2.90 1.69 14.91 1.81 1.75 22.53 1.73 2.95 19.86 [a] Mech1 and Mech2 refer to dissociative and associative mechanisms, respectively.