Computation of Large Systems with an Economic Basis Set: Structures and Reactivity Indices of Nucleic Acid Base Pairs from Density Functional Theory

Similar documents
Methionine Ligand selectively promotes monofunctional adducts between Trans-EE platinum anticancer drug and Guanine DNA base

Planar Pentacoordinate Carbon in CAl 5 + : A Global Minimum

Truong Ba Tai, Long Van Duong, Hung Tan Pham, Dang Thi Tuyet Mai and Minh Tho Nguyen*

3,4-Ethylenedioxythiophene (EDOT) and 3,4- Ethylenedioxyselenophene (EDOS): Synthesis and Reactivity of

Supplementary information

Decomposition!of!Malonic!Anhydrides. Charles L. Perrin,* Agnes Flach, and Marlon N. Manalo SUPPORTING INFORMATION

SUPPORTING INFORMATION

A dominant homolytic O-Cl bond cleavage with low-spin triplet-state Fe(IV)=O formed is revealed in the mechanism of heme-dependent chlorite dismutase

Supplemental Material

Supporting Information

Synergistic Effects of Water and SO 2 on Degradation of MIL-125 in the Presence of Acid Gases

Spin contamination as a major problem in the calculation of spin-spin coupling in triplet biradicals

Supporting Information

Electronic Supplementary information

Supporting Information. for. Silylation of Iron-Bound Carbon Monoxide. Affords a Terminal Fe Carbyne

Group 13 BN dehydrocoupling reagents, similar to transition metal catalysts but with unique reactivity. Part A: NMR Studies

Aluminum Siting in the ZSM-5 Framework by Combination of

Photoinduced intramolecular charge transfer in trans-2-[4 -(N,Ndimethylamino)styryl]imidazo[4,5-b]pyridine:

Metal Enhanced Interactions of Graphene with Monosaccharides. A Manuscript Submitted for publication to. Chemical Physics Letters.

Supporting Information. spectroscopy and ab initio calculations of a large. amplitude intramolecular motion

Supporting Information

Supporting Information For. metal-free methods for preparation of 2-acylbenzothiazoles and. dialkyl benzothiazole-2-yl phosphonates

Electronic supplementary information (ESI) Infrared spectroscopy of nucleotides in the gas phase 2. The protonated cyclic 3,5 -adenosine monophosphate

University of Groningen

Ferromagnetic Coupling of [Ni(dmit) 2 ] - Anions in. (m-fluoroanilinium)(dicyclohexano[18]crown-6)[ni(dmit) 2 ]

Supporting Information

Effect of Ionic Size on Solvate Stability of Glyme- Based Solvate Ionic Liquids

Analysis of Permanent Electric Dipole Moments of Aliphatic Amines.

Supporting Information For

China; University of Science and Technology, Nanjing , P R China.

Supporting Information

A theoretical study on the thermodynamic parameters for some imidazolium crystals

Supplementary Material

Supporting Information

The Activation of Carboxylic Acids via Self Assembly Asymmetric Organocatalysis: A Combined Experimental and Computational Investigation

Supplementary Information

Supporting Information

SUPPLEMENTARY INFORMATION

Ali Rostami, Alexis Colin, Xiao Yu Li, Michael G. Chudzinski, Alan J. Lough and Mark S. Taylor*

The Chemist Journal of the American Institute of Chemists

Supporting Information. 4-Pyridylnitrene and 2-pyrazinylcarbene

Supporting Information

Computational Material Science Part II

Superacid promoted reactions of N-acyliminium salts and evidence for the involvement of superelectrophiles

Electronic Supplementary Information (ESI) for Chem. Commun.

(1) 2. Thermochemical calculations [2,3]

Dynamics of H-atom loss in adenine: Supplementary information

Ligand-to-Metal Ratio Controlled Assembly of Nanoporous Metal-Organic Frameworks

Phosphine Oxide Jointed Electron Transporters for Reducing Interfacial

Supporting information on. Singlet Diradical Character from Experiment

A Computational Model for the Dimerization of Allene: Supporting Information

Calculating Accurate Proton Chemical Shifts of Organic Molecules with Density Functional Methods and Modest Basis Sets

Experimental Evidence for Non-Canonical Thymine Cation Radicals in the Gas Phase

Supporting information

Supporting Information. Copyright Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2006

Supporting Information

Supporting Information

Theoretical ab Initio Study the Hydrogen Bonding Nature of the A: T Base Pair

Supporting Information

Cationic Polycyclization of Ynamides: Building up Molecular Complexity

A Redox-Fluorescent Molecular Switch Based on a. Heterobimetallic Ir(III) Complex with a Ferrocenyl. Azaheterocycle as Ancillary Ligand.

Electronic Supplementary Information for:

Quantum Chemical DFT study of the fulvene halides molecules (Fluoro, Chloro, Bromo, Iodo, and stato fulvenes)

How Large is the Elephant in the Density Functional Theory Room?

Elucidating the structure of light absorbing styrene. carbocation species formed within zeolites SUPPORTING INFORMATION

Two-Dimensional Carbon Compounds Derived from Graphyne with Chemical Properties Superior to Those of Graphene

Molecular Modeling of Photoluminescent Copper(I) Cyanide Materials. Jasprina L Ming Advisor: Craig A Bayse

Supporting Information. {RuNO} 6 vs. Co-Ligand Oxidation: Two Non-Innocent Groups in One Ruthenium Nitrosyl Complex

Ab Initio and Density Functional Study

Supporting Information. Copyright Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, 2008

ЖУРНАЛ СТРУКТУРНОЙ ХИМИИ Том 51, 2 Март апрель С

Supporting Information. Synthesis, Molecular Structure, and Facile Ring Flipping of a Bicyclo[1.1.0]tetrasilane

STRUCTURAL DETERMINATION OF A SYNTHETIC POLYMER BY GAUSSIAN COMPUTATIONAL MODELING SOFTWARE

3D Structure Based Atomic Charge Calculation for Molecular Mechanics and Molecular Dynamics Simulations

Supporting Information Computational Part

INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN

Supporting Information

Concerted Attack of Frustrated Lewis Acid Base Pairs on Olefinic Double Bonds: A Theoretical Study

Supplementary Information:

Concerted halogen and hydrogen bonding in RuI 2 (H 2 dcbpy)(co) 2 ] I 2 (CH 3 OH) I 2 [RuI 2 (H 2 dcbpy)(co) 2 ]

DFT STUDY OF THE ADDITION CYCLIZATION ISOMERIZATION REACTION BETWEEN PROPARGYL CYANAMIDES AND THIOL OR ALCOHOL: THE ROLE OF CATALYST

Yan Zhao and Donald G. Truhlar Department of Chemistry and Supercomputing Institute, University of Minnesota, Minneapolis, MN

Preprint. This is the submitted version of a paper published in Journal of Computational Chemistry.

SUPPORTING INFORMATION. Ammonia-Borane Dehydrogenation Promoted by a Pincer-Square- Planar Rhodium(I)-Monohydride: A Stepwise Hydrogen Transfer

Supporting Information. O-Acetyl Side-chains in Saccharides: NMR J-Couplings and Statistical Models for Acetate Ester Conformational Analysis

Mechanism of Hydrogen Evolution in Cu(bztpen)-Catalysed Water Reduction: A DFT Study

Electrophilicity and Nucleophilicity of Commonly Used. Aldehydes

SUPPLEMENTARY INFORMATION

ELECTRONIC SUPPLEMENTARY INFORMATION

Supporting Information

Supporting Information. A rare three-coordinated zinc cluster-organic framework

A mechanistic study supports a two-step mechanism for peptide bond formation on the ribosome

Medical University of Warsaw, Faculty of Pharmacy, 1 Banacha St., Warszawa, Poland 2

Oligo(N-aryl glycines): A New Twist on Structured Peptoids

DFT and TDDFT calculation of lead chalcogenide clusters up to (PbX) 32

SUPPORTING INFORMATION. Modeling the Peroxide/Superoxide Continuum in 1:1 Side-on Adducts of O 2 with Cu

Electronic Supplementary Information. Electron Mobility for All-Polymer Solar Cells

Ab Initio Molecular Orbital Study of the Reactivity of Active Alkyl Groups. V. Nitrosation Mechanism of Acetone with syn-form of Methyl Nitrite

Supporting Information

Supporting Information

Transcription:

Computation of Large Systems with an Economic Basis Set: Structures and Reactivity Indices of Nucleic Acid Base Pairs from Density Functional Theory W. J. FAN, 1 R. Q. ZHANG, 1 SHUBIN LIU 2,3 1 Center of Super-Diamond and Advanced Films (COSDAF) & Department of Physics and Materials Science, City University of Hong Kong, Hong Kong SAR, People s Republic of China 2 Division of Research Computing, Information Technology Services, University of North Carolina, Chapel Hill, North Carolina 27599-3455 3 College of Chemistry and Chemical Engineering, Hunan Normal University, Changsha, Hunan 410081, People s Republic of China Received 24 June 2006; Revised 23 November 2006; Accepted 4 December 2006.20670 Published online 31 January 2007 in Wiley InterScience (www.interscience.wiley.com). Abstract: We show here that an economic basis set can describe nucleic acid base pairs involving the hydrogen bond interactions in density functional calculations. The economic basis set in which the polarization function is added only to oxygen and nitrogen atoms of strong electronegativity can predict reliable geometric structures and dipole moment of nucleic acid base pairs, comparable to those obtained from the basis set of 6-31G* in B3LYP calculations. Combining single point calculations with the standard basis set on the geometric structures optimized by the economic basis set, the present approach has predicted accurate natural bond orbital charge, binding energy, electronegativity, hardness, softness, and electrophilicity index. The principle for basis selection presented in this study can be regarded as a general guideline in the computation of large biological systems with considerably high accuracy and low computational expense. q 2007 Wiley Periodicals, Inc. J Comput Chem 28: 967 974, 2007 Key words: density functional theory; nucleic acid; economic basis set; reactivity indices Introduction Nucleic acids have been an active subject of great interest in the literature both experimentally 1 10 and theoretically, 11 30 largely due to their considerable importance in chemistry and biology. Enormous efforts have been made in the past to improve our understanding of the structural, dynamical, and functional properties of nucleic acid base pairs. The pioneering work of Watson and Crick 11 reveals that adenine (A) pairs with thymine (T) and guanine (G) pairs with cytosine (C). AT and GC base pairs are stabilized by two and three hydrogen bonds, respectively. Accurate data on gas phase are of great interest, since it is very difficult to extract the intrinsic base base interactions from condensed phase and/or crystal data. The ab initio quantum-chemical method is a basic tool to investigate the nucleic acids in this regard, due to the absence of sufficient gas phase experimental data. 18 Hobza and coworkers have reported numerous theoretical studies on the nucleic acid bases, 17 23 and their early works have been summarized in a recent review, 17 focusing mainly on the optimization of geometries and calculations of interaction energies at the minima of nucleic acid base pairs. Recently, they have investigated the stabilization energy of H-bonded and stacked structures in crystal geometries of GC and AT pairs using complete basis set (CBS) at MP2 and CCSD(T) levels. 23 Using MP2 and 6-31G*(0.25) and modified aug-cc-pvdz basis sets, Toczyłowski and Cybulski have carried out theoretical studies on 32 nucleic acid H-bonded base pairs, showing that electrostatic and exchange energy components for the rotation of the monomers were most important in the overall interaction energy and that the exchange energy was the most anisotropic component. 24 Gorb et al. looked into the double-proton-transfer process in A T and G C base pairs at the B3LYP/6-31G(d) and MP2/ Correspondence to: R. Q. Zhang or S. Liu; e-mail: aprqz@cityu.edu.hk or shubin@email.unc.edu Contract/grant sponsor: Research Grants Council of the Hong Kong Special Administrative Region, China; contract/grant number: CityU 103305 Contract/grant sponsor: Major State Research Development Program of China; contract/grant number: 2004CB719903 q 2007 Wiley Periodicals, Inc.

968 Fan, Zhang, and Liu Vol. 28, No. 5 6-31G(d) levels of theory and argued that the reason why H- bonded bases possess nonplanar structures is because nitrogen atoms undertake sp 3 hybridization and there exist soft intermolecular vibrations in the molecular systems. 25 Mallajosyula et al. have inspected H-bonding patterns in A T and G C base pairs for B-DNA, using density functional theory (DFT) and 6-31þG(d) basis set. They found that the H-bonding patterns are highly nonlocal and cooperative, and that the H-bond pattern from the crystal geometry is significantly different from that from the optimized structure in gas phase. 26 Using DFT, Sahu et al. have examined the cytosine dimer structure from both keto and enol tautomers and investigated energetic properties, harmonic vibrational frequencies, and binding energies, concluding that of the four different isomers, the planar cytosine dimer with the C 2h symmetry is thermodynamically most stable and has the highest binding energy. 27 Tsolakidis and Kaxiras have recently analyzed the optical response of gas-phase DNA bases and base pairs in both their normal and tautomeric forms with timedependent DFT and proposed that the difference between absorption spectra of normal and tautomeric forms can be used for their identification. 28 Most aforementioned calculations are mainly focused on the geometry optimization of the isolated nucleic acid base pairs using medium-sized basis sets and ab initio/dft approaches, followed by single-point calculations with a higher level basis set and theory of electron correlation. 29 Due to the expensive computational cost, such methodology remains largely impractical for systems with more than two nucleic acid bases, though proven valid in the study of nucleic acid base pair dimers. Sherer et al. 29 have calculated interaction enthalpies for six base pairs at different levels of theory and basis set, revealing that mpwpw91/midi! which has relatively low computational cost performed most satisfactorily, as judged by the comparison to the available experimental data. They also reported that the reparameterized semiempirical model PM3 BP was able to save a great amount of computational expense without loss of accuracy. 29 Very recently, using the PM3 BP approach, Giese et al. 30 have analyzed the H-bonding pattern in nucleic acid base dimers and trimers and unraveled that the PM3 BP Hamiltonian can produce an accuracy comparable to that of DFT approaches with 3 orders of magnitude less of computational cost. Although various investigations both experimentally and theoretically have been carried out on the nucleic acid base pair systems, our understanding of the structures and properties of such systems is still far from complete. To achieve satisfactory description of nucleic acid base pair systems, high level basis sets are often required, resulting in the overuse of the computational facilities and increase of the computation expense. It is highly desirable to find economic basis sets that can sustain the accuracy but at the meanwhile keep the computational cost manageable. In this work, the Watson Crick base pairs A T and G C as well as free nucleic bases were investigated by using DFT and different composite basis sets designed from the standard 3-21G or 6-31G basis set. Comparisons of the structural and electronic properties of these molecular systems demonstrate that the economic basis set can reproduce reliable structures and properties of the nucleic acid base pairs. Combining single point calculations with a larger standard basis set on the geometric structure optimized by the economic basis set, the present approach has predicted accurate natural bond orbital (NBO) charges, binding energies, and numerous DFT reactivity indices. We expect that the methodology demonstrated here can be more effective in the calculation of even larger biological systems. Computational Details For the computation of a heteroatomic system, economic basis sets have been proposed by taking into consideration of the different roles of different basis functions adopted in the basis set. 31 37 To effectively describe an atom, the number and type of basis functions may be selected according to its nature and environment. The more negative charge it has, the higher level (the angular moment type) of basis functions it requires. Specifically, for an atom with smaller electronegativity such as carbon, its outer electrons will completely or partially be lost when it participates in forming a molecule. In this case, no polarization and diffuse functions are needed to describe the atom and thus the number of basis set functions can subsequently be reduced. For atoms of larger electronegativity such as oxygen, they accept electrons in the formation of a molecule and thus more basis functions with higher angular momentum are needed to properly describe the changed shape and asymmetricity of the electron density. The idea of an economic basis set has successfully been applied to explore structures and properties of a few systems, such as weakly bonded systems, alkyl-alkene series, and metallic compounds. 31 37 We have recently employed the economic basis set to examine the structure, Mulliken charge, and the highest occupied molecular orbital (HOMO) the lowest unoccupied molecular orbital (LUMO) gap for nucleic acid base pairs. 38 In this paper, we further apply the idea to investigate binding energies, reactivity indices, and other properties for the system. To verify the reliability and effectiveness of our approach, concepts from DFT, 39,40 such as chemical potential (), hardness (), etc., were calculated and compared for the systems concerned. In DFT, ¼ ¼ @E @N and ¼ 1 @ 2 E 2 @N ¼ 1 @ 2 2 @N, where E is the total energy, N the total number of electrons, and u the external potential of the system. is the negative of electronegativity () defined by Iczkowski and Margrave. 41 According to Mulliken, 42 one has ¼ 1 2 (I þ A), and according to Parr and Pearson, 43 ¼ 1 2 (I A), where I and A are the first ionization potential and electron affinity, respectively. In the finite different approximation, I and A can be approximated by the energies of HOMO and LUMO, respectively, giving ¼ ¼ 1 2 (" HOMO þ " LUMO) ) and ¼ 1 2 (" LUMO " HOMO ). Chemical softness (S) is the reciprocal of the chemical hardness, 40 S ¼ 1/. Parr et al. 44 have recently proposed a new DFT reactivity concept called electrophilicity index!. In terms of and,! ¼ 2 /2, measuring the capacity of an electrophile to accept the maximal number of electrons in a neighboring reservoir of electron sea. Notice that here ¼ (I A). All DFT calculations were performed using the B3LYP (Becke three-parameter and Lee Yang Parr) exchange-correlation functional which combines the Becke three-parameter

Computation of Large Systems with an Economic Basis Set 969 Table 1. Optimized H-Bond Length (Å) and Angle (8) for A T and G C Determined with Different Basis Sets in B3LYP Calculations. 6-31G# 0 6-31G# 6-31G 6-31G* Bond length A T N1(A)H3(T) 1.80 1.83 1.72 1.83 H6(A)O4(T) 1.91 1.93 1.88 1.93 N1(A)N3(T) 2.85 2.87 2.87 2.88 N6(A)O4(T) 2.93 2.95 2.95 2.95 G C O6(G)H4(C) 1.90 1.92 1.86 1.93 H1(G)N3(C) 1.89 1.91 1.85 1.92 H2(G)O2(C) 1.76 1.79 1.71 1.78 O6(G)N4(C) 2.80 2.82 2.75 2.82 N1(G)N3(C) 2.93 2.94 2.89 2.95 N2(G)O2(C) 2.92 2.95 2.88 2.94 Bond angle A T ffn3 H3(T)N1(A) 180.00 179.64 179.69 179.72 ffn6 H6(A)O4(T) 175.05 175.35 173.59 174.74 G C ffn4 H4(C)O6(G) 178.75 178.24 179.01 179.72 ffn2 H2(G)O2(C) 177.63 177.20 177.01 178.32 ffn1 H1(G)N3(C) 176.61 175.83 177.11 177.11 6-31G# 0 : 6-31G* for O and N, 3-21G for C and H; 6-31G#: 6-31G* for O and N, 6-31G for C and H. for the proper description of nucleic acid base pair structures and that the basis set has little effect on atoms of small electronegativity such as C and H. The structures optimized by 6-31G# are chosen for the rest of the present work since they agree slightly better with those by standard 6-31G*, which has been reported to be reliable for the structural optimization of the nucleic acid base pairs. 25 The optimized geometric configurations for A T and G C are exhibited in Figure 1. The A T base pair includes two hydrogen bonds and G C has three. The most sensitive bond to the basis set is the H-bond connecting the two base pairs. From Table 1, we find that the change of bond lengths by including polarization functions in the basis set is significant. Noticeable differences of the bond length have been observed between the ones from the 6-31G basis set and those from 6-31G*. As large as 0.11 Å discrepancy between 6-31G and 6-31G* basis sets has been found in the H-bond length. However, the largest deviation of the H-bond length from the 6-31G# economic basis set is less than 0.01 Å, compared with the results from 6-31G*, exhibiting that the addition of the d polarization function to O and N atoms has resulted in observable improvements. For bond angles, the addition of polarization functions can also improve the H-bond angles. The bond angles from 6-31G# agree well with the corresponding data from 6-31G*. Although the 6-31G basis set gives slightly better angular parameters than exchange functional with the gradient-corrected correlation functional of Lee, Yang, and Par. 45,46 All the calculations were conducted by the Gaussian03 package. 47 Results and Discussion Based on the methodology described earlier and previous studies, 37,38,40 a composite basis set 6-31G# has been designed, using 6-31G* for N and O atoms and 6-31G for others. To examine the effect of the basis set on the H-bond regions, precursor tests were performed for the A T base pair. The precursor computation results show that when 6-31G* is applied to N, O, and H-bonded H atoms and 6-31G is used for C and the remaining H atoms, the structure obtained agrees very well with that optimized by 6-31G#. It can thus be concluded that the addition of the polarization function to the H-bonded H atoms has little impact on the optimized structure. Furthermore, to demonstrate the importance of polarization functions for atoms of strong electronegativity, we have designed another composite basis set, 6-31G# 0, in which 3-21G is used for C and H atoms and 6-31G* for O and N atoms. Table1 presents some selected bond lengths and bond angles related to the H-bonds of A T and G C base pairs with different basis sets. It is seen from the table that the optimized structures from the two economic basis sets agree very well with each other and the largest deviation in bond length and angle is just 0.03 Å and 0.788, respectively, indicating that use of a high level basis set and polarization function for atoms of large electronegativity is very important Figure 1. The geometric structures of the base pair A T and G C optimized at B3LYP/6-31G# level. [Color figure can be viewed in the online issue, which is available at www.interscience. wiley.com.]

970 Fan, Zhang, and Liu Vol. 28, No. 5 Table 2. Comparison of the Dipole Moments. Dipole moment (debye) Method A T A T G C G C B3LYP/6-31G* 2.3684 4.1422 1.6476 6.4815 6.3804 6.1384 B3LYP/6-31G 2.4788 4.6234 1.7655 7.2053 6.9989 6.7106 B3LYP/6-31G# 2.3970 4.1940 1.6809 6.4114 6.4448 6.1483 6-31G#, the economic basis set can satisfactorily describe the bond angles as well. Overall, the basis set 6-31G# is able to generate reliable geometric structures in B3LYP calculations. It is well known that dipole moment of a system is very sensitive to the chosen method and basis set. Table2 lists the calculated dipole moments of the systems considered here. The sensitivity of the dipole moment on the choice of the basis set is seen. From Table 2, we found that when the composite basis set is used with the polarization function added only to O and N atoms, the calculated values of the dipole moment agree better with those from the 6-31G* basis set than those from 6-31G. The results confirm that an accurate dipole moment is accessible from the present composite basis set. When two atoms form a bond with each other, charge transfer takes place, leading to the polarization of the electron density and the enhancement of H-bonds. To estimate the amount of the charge transfer due to the formation of H-bonds, we have calculated the NBO charges for the donor and acceptor sites in free bases as well as in the base pairs. The 6-31G basis set was chosen for the NBO charge analysis. Single point calculations using 6-31G were conducted on the structures optimized by 6-31G# and 6-31G* basis sets, respectively. Table3 lists the NBO charges of the donor and accepter sites in the H-bonded base pairs and the free bases with different basis sets. The largest deviation of the charge predicted from the structures optimized by the economic basis set is 0.002 au, compared with that by B3LYP/6-31G//B3LYP/6-31G*, which can be attributed to the reliable structures optimized by the composite basis set. When B3LYP/6-31G is used for structure optimization, the largest deviation of the charge is up to 0.014 au, due to the poor geometry determined by B3LYP/6-31G. We can find that for all of the five H-bonds, there exists considerable amount of charge transfer between the donor and accepter during the formation of the H-bonds in the pair. Take N HO in A T pair as an Table 3. The NBO Charges (au) for Donor and Accepter Sites in the Free Bases and the H-Bonded Base Pairs Determined in B3LYP Calculations with Different Basis Sets. Charge on H-bond Charge on free base pair A T I N H...O N H O B3LYP/6-31G//B3LYP/6-31G* 0.797, 0.451, 0.630 0.805, 0.430, 0.576 B3 LYP/6-31G 0.795, 0.454, 0.642 0.804, 0.431, 0.584 B3LYP/6-31G//B3LYP/6-31G# 0.798, 0.452, 0.632 0.807, 0.431, 0.577 II N H...N N H N B3LYP/6-31G//B3LYP/6-31G* 0.663, 0.479, 0.594 0.665, 0.457, 0.536 B3LYP/6-31G 0.659, 0.480, 0.598 0.659, 0.460, 0.538 B3LYP/6-31G//B3LYP/6-31G# 0.663, 0.480, 0.594 0.665, 0.458, 0.535 G C I a N H...O N H O B3LYP/6-31G//B3LYP/6-31G* 0.776, 0.462, 0.650 0.808, 0.434, 0.570 B3LYP/6-31G 0.774, 0.465, 0.664 0.808, 0.423, 0.581 B3LYP/6-31G//B3LYP/6-31G# 0.777, 0.462, 0.652 0.810, 0.435, 0.570 II N H...N N H N B3LYP/6-31G//B3LYP/6-31G* 0.646, 0.467, 0.635 0.651, 0.444, 0.575 B3LYP/6-31G 0.636, 0.470, 0.644 0.651, 0.441, 0.575 B3LYP/6-31G//B3LYP/6-31G# 0.646, 0.468, 0.635 0.652, 0.445, 0.573 III N H...O N H O B3LYP/6-31G//B3LYP/6-31G* 0.830, 0.448, 0.654 0.840, 0.411, 0.600 B3LYP/6-31G 0.829, 0.452, 0.666 0.837, 0.422, 0.606 B3LYP/6-31G//B3LYP/6-31G# 0.832, 0.447, 0.655 0.842, 0.410, 0.601 a The upper N H...O bond in G C base pair shown in Figure 1.

Computation of Large Systems with an Economic Basis Set 971 example, the charge on O atom changes from 0.576 au (in free A) to 0.630 au (in the A T base pair). For the H atom, in free T, the charge is 0.430 au and it becomes 0.451 au in the A T pair. Figure 2 shows the HOMO of the A T base pair obtained from B3LYP/6-31G# and B3LYP/6-31G* calculations. We can see that the economic basis set gives reliable description of HOMO for the base pair, comparable to that of 6-31G*. Shown in Figure 3 are HOMO and LUMO of A T and G C base pairs, whose structures are from B3LYP/6-31G#. It is seen that in the A T base pair the HOMO electrons are localized on the A base, whereas the LUMO electrons mostly sit on the T base instead. When there is electron excitation within the base pair, one will observe electron transfer from A to T and electrons be accumulated on T. Similar is the G C pair, where electrons initially localized on G (in HOMO) are transferred to C (in LUMO) along the H-bonds during the course of electron excitation. To assess the performance of our approach in predicting the H-bond interaction energy, we have also calculated binding energies using different basis sets. Previous studies have shown that the B3LYP method can yield accurate H-bond energies when large basis sets are employed. In this paper, single point energy calculations were performed using 6-31þG** and 6-31G* on the structures optimized by 6-31G#, respectively. The results are shown in Table4, together with the experimental data and the data from the literature. Zero-point energy corrections at the B3LYP/6-31G# level are included for the calculations based on the structures obtained by B3LYP/6-31G#. It is seen from Table 4 that the basis set has a strong influence on the calculated binding energy, and larger basis sets containing polarization and/or diffuse functions can give more accurate results. At the B3LYP/6-31G# level, the calculated binding energies are 15.2 and 29.2 kcal/mol for A T and G C, respectively, which deviate a lot from the experimental findings. When Figure 3. Plots for the HOMO and LUMO wave functions for A T and G C bases pairs. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] 6-31G* is used to conduct the single-point energy calculation with the structure from 6-31G#, no visible improvement can be found. However, when the single-point calculations are performed with 6-31þG(d,p) and the same structures from 6-31G#, the decreases of the binding energies will become notable, a change of 3.5 kcal/mol for A T and 3.8 kcal/mol for G C from the 6-31G# result. The calculated H-bonding binding energies using 6-31þG(d,p)//6-31G# in B3LYP calculations are 11.7 and 25.4 kcal/mol for A T and G C, respectively, and both values agree well with those reported by 6-31þG(d,p)//6-31G(d,p) in B3LYP calculations. 48 Although our results still deviate considerably from the experimental findings, the binding energies calculated by our approach are about 2 kcal/mol lower than the best values reported by Hobza and coworkers 22 using Table 4. Binding Energies for A T and G C at Different Basis Sets in B3LYP Calculations. Binding energy (kcal/mol) Method A T G C 6-31G# 15.2 29.2 6-31G* 15.0 28.5 6-31þG(d,p)//6-31G(d,p) a 11.7 25.0 6-31G*//6-31G# 15.2 29.4 6-31þG(d,p)//6-31G# 11.7 25.4 Experiment a 13.0 21.0 Figure 2. Comparisons of the HOMOs using 6-31G# and 6-31G* in B3LYP calculations. The values in parenthesis are the HOMO energies, and the unit is ev. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] Zero-point energies (ZPE) at the B3LYP/6-31G# were involved for the calculations at 6-31G#, 6-31G*//6-31G#, and 6-31þG(d,p)//6-31G#. ZPE at B3LYP/6-31G* was included for the calculations at 6-31G*. a Reference 48.

972 Fan, Zhang, and Liu Vol. 28, No. 5 Table 5. Comparisons of Some DFT s Concepts for A T and G C Base Pair [for HOMO, LUMO, Electronegativity (), Hardness (), and Electrophilicity (!) (in ev); for softness (S) (in ev 1 )]. Basis set HOMO LUMO S! A T pair B3LYP/6-31G* 5.777 0.900 3.338 2.438 0.410 1.143 B3LYP/6-31G 5.966 1.128 3.547 2.419 0.413 1.301 B3LYP/6-31G# 5.802 0.987 3.395 2.408 0.415 1.197 B3LYP/6-31G*//B3LYP/6-31G 5.795 1.018 3.407 2.388 0.419 1.215 B3LYP/6-31G*//B3LYP/6-31G# 5.779 0.958 3.369 2.410 0.415 1.177 G C pair B3LYP/6-31G* 5.027 1.213 3.120 1.907 0.524 1.276 B3LYP/6-31G 5.155 1.461 3.308 1.847 0.541 1.481 B3LYP/6-31G# 5.100 1.305 3.202 1.898 0.527 1.351 B3LYP/6-31G*//B3LYP/6-31G 5.015 1.307 3.161 1.854 0.539 1.347 B3LYP/6-31G*//B3LYP/6-31G# 5.062 1.274 3.168 1.894 0.528 1.325 the CBS extrapolation at the RI-MP2 level. Based on what we have observed here, it is believed that to yield accurate binding energies in the single point calculation we should use the 6-31þG(d,p) basis set and the optimized structure from our economic basis set. The calculated values of DFT global reactivity descriptors, namely, chemical potential, hardness/softness, and electrophilicity index, with different basis sets are presented in Table5. It has been shown before 49,50 that the basis set only has a slight impact on HOMO, but different basis sets can produce tremendously different LUMO. Also, the addition of the diffuse function will significantly decrease the LUMO energy. Orbital eigenvalues calculated by HF/6-31G** often agree well with experimental findings. Since calculated orbital energies with 6-31G* are well consistent with those with 6-31G**, we conducted single-point energy calculations using 6-31G* to compute the reactivity indices, whose results are shown in Table 5. It is found from the table that 6-31G# gives better values of the DFT reactivity indices than 6-31G though there exists discrepancy between the 6-31G# and 6-31G* results. The chemical potential, hardness, softness, and electrophilicity data calculated with a higher level basis set 6-31G* based on the structures optimized by 6-31G# agree well with the corresponding data by B3LYP/6-31G*. However, when 6-31G* single-point calculations were performed on the structures optimized by 6-31G, the deviation of the corresponding data from the values of B3LYP/6-31G* is larger. Hence, a single-point calculation with 6-31G* on the structure optimized by the economic basis set can produce accurate results for chemical potential, hardness, softness, and electrophilicity index of the base pair system of nucleic acids. To the best knowledge of the present authors, the DFT reactivity indices of the nucleic acids base pairs have not been reported before in the literature. What we can see from Table 5 is that (i) the A T base pair has a larger than the G C pair, indicating that in gas phase the former should be more stable than the latter, and (ii) on the other hand, the G C base pair possesses a bigger electrophilicity index, indicating that it has better capability to accept electrons than the other pair, in agreement with the experimental finding. The basis set effect analysis as examined earlier has demonstrated that the addition of the polarization function to only a few atoms of particular chemical surroundings for the nucleic acid base system can yield accurate results of the molecular and electronic structures. This provides scientific basis for reducing the total number of basis functions and henceforth improving the computational efficiency, making it possible to calculate Table 6. The Number of Basis Functions and Total SCF Cycles for the Base Pair A T and G C, Together with the Free Bases in B3LYP Calculations with Various Basis Sets. Number of basis functions Total SCF cycles 6-31G* 6-31G 6-31G# 6-31G* 6-31G 6-31G# A 160 100 125 71 (0.31) a 75 (0.13) 74 (0.22) C 130 82 102 46 (0.18) 71 (0.06) 71 (0.09) G 175 109 139 644 (0.38) 93 (0.16) 207 (0.24) T 147 93 113 759 (0.32) 70 (0.12) 722 (0.17) A T 307 193 238 934 (1.28) 530 (0.48) 677 (0.72) G C 305 191 241 394 (1.29) 200 (0.52) 243 (0.93) a The data in parentheses are the CPU time (min per SCF cycle).

Computation of Large Systems with an Economic Basis Set 973 large biological molecules with good accuracy and relative low computational cost. Table6 lists the number of basis functions and the SCF cycles on a Pentium IV 3.0 GHz/256 MB personal computer during the optimizations for the base pairs A T and G C as well as the free bases with different basis sets. It shows that compared with 6-31G* the composite basis set can save 20 25% in basis function size. As to the SCF cycles, except for A and C, the SCF cycles of 6-31G# are apparently smaller than those of 6-31G*, and about 30 50% CPU time per SCF cycle is saved for molecules considered here. Similar trend is found for the optimization cycles (not listed), and about 25 50% CPU time per optimization cycle is reduced. Thus, the economical basis set is effective in reducing the CPU time both per SCF cycle and per optimization cycle. When the economic basis set is used to optimize the structures of the nucleic acid base system, the overall CPU time can significantly be saved, compared with standard basis set 6-31G*. It is anticipated that this composite basis set will be even more economic and efficient in calculations for larger nucleic acid base pairs systems, such as the base tetramers, pentamers, stacked structures of G C and A T, and their methyl derivatives. Conclusions The present economic basis set using polarization functions only for large electronegativity atoms such as O and N in computation of large biological systems involving hydrogen bond interactions can predict reliable geometric structures and electronic properties with much CPU time saved and the basis function size reduced. Combining a single-point calculation with the standard basis set such as 6-31G* on the structure optimized by the economic basis set, the present approach is able to predict accurate NBO charge, binding energy, electronegativity, hardness, softness, and electrophilicity index. Satisfactory agreement of the present results with those by standard basis set has been achieved. The principle of basis set selection presented in this work can be regarded as a general guideline in computation of large biological systems involving nucleic acid base pair interactions and may likely be extended to other biologically important systems such as proteins, DNA, and RNA. References 1. Nowak, M. J.; Lapinski, L.; Fulara, J. Spectrochim Acta A 1989, 45, 229. 2. Gehring, K.; Leroy, J. L.; Gueron, M. Nature 1993, 363, 561. 3. Kettani, A.; Kumar, R. A.; Patel, D. J. J Mol Biol 1995, 254, 638. 4. Berger, I.; Egli, M.; Rich, A. Proc Natl Acad Sci USA 1996, 93, 12116. 5. Nonin, S.; Leroy, J. L. J Mol Biol 1996, 261, 399. 6. Wahl, M. C.; Sundaralingam, M. Biopolymers 1997, 44, 45. 7. Strahan, G. D.; Keniry, M. A.; Shafer, R. H. Biophys J 1998, 75, 968. 8. Nir, E.; Pluetzer, C.; Kleinermanns, K.; de Vries, M. Eur Phys J D 2002, 20, 317. 9. Nir, E.; Pluetzer, C.; Kleinermanns, K.; de Vries, M. Phys Chem Chem Phys 2003, 5, 4780. 10. Bouř, P.; Andrushchenko, V.; Kabeláč, M.; Maharaj, V.; Wieser, H. J Phys Chem B 2005, 109, 20579. 11. Watson, J. D.; Crick, F. H. C. Nature 1953, 171, 737. 12. Aida, M. J Comput Chem 1988, 9, 362. 13. Alhambra, C.; Luque, F. J.; Gago, F.; Orozco, M. J Phys Chem B 1997, 101, 3846. 14. Singh, S. B.; Kollman, P. A. J Am Chem Soc 1999, 121, 3267. 15. Li, X.; Cai, Z.; Sevilla, M. D. J Phys Chem A 2002, 106, 1596. 16. Xiao, X.; Cushman, M. J Am Chem Soc 2005, 127, 9960. 17. Hobza, P.; Šponer, J. Chem Rev 1999, 99, 3247. 18. Šponer, J.; Hobza, P. J Phys Chem A 2000, 104, 4592 and references therein. 19. Hobza, P.; Šponer, J.; Cubero, E.; Orozco, M.; Luque, F. J. J Phys Chem B 2000, 104, 6286. 20. Jurečka, P.; Hobza, P. J Am Chem Soc 2003, 125, 15608. 21. Jurečka, P.; Šponer, J.; Hobza, P. J Phys Chem B 2004, 108, 5466. 22. Šponer, J.; Jurečka, P.; Hobza P. J Am Chem Soc 2004, 126, 101420. 23. Dabkowska, I.; Gonzalez, H. V.; Jurečka, P.; Hobza, P. J Phys Chem A 2005, 109, 1131. 24. Toczyłowski, R. R.; Cybulski, S. M. J Phys Chem A 2003, 107, 418. 25. Gorb, L.; Podolyan, Y.; Dziekonski, P.; Sokalski, W. A.; Leszczynski, J. J Am Chem Soc 2004, 126, 10119. 26. Mallajosyula, S. S.; Datta, A.; Pati, S. K. Synth Met 2005, 155, 398. 27. Sahu, P. K.; Mishra, R. K.; Lee, S.-L. J Phys Chem A 2005, 109, 2887. 28. Tsolakidis, A.; Kaxiras, E. J Phys Chem A 2005, 109, 2373. 29. Sherer, E. C.; York, D. M.; Cramer, J. C. J Comput Chem 2002, 24, 57. 30. Giese, T. J.; Sherer, E. C.; Cramer, C. J.; York, D. M. J Chem Theory Comput 2005, 1, 1275. 31. Zhang, R. Q.; Lifshitz, C. J Phys Chem 1996, 100, 960. 32. Zhang, R. Q.; Xie, X. G.; Liu S. X.; Lee, C. S.; Lee, S. T. Chem Phys Lett 2000, 330, 484. 33. Zhang, R. Q.; Wong, N. B.; Lee, S. T.; Zhu, R. S.; Han, K. L. Chem Phys Lett 2000, 319, 213. 34. Zhang, R. Q.; Huang, J. H.; Bu, Y. X.; Han, K. L.; Lee, S. T.; He, G. Z. Sci China Ser B 2000, 43, 375. 35. Zhang, R. Q.; Chu, T. S.; Lee, S. T. J Chem Phys 2001, 114, 5531. 36. Zhang, R. Q.; Lu, W. C.; Cheung, H. F.; Lee, S. T. J Phys Chem B 2002, 106, 625. 37. Zhang, R. Q.; Lu, W. C.; Lee, C. S.; Hung, L. S.; Lee, S. T. J Chem Phys 2002, 116, 8827. 38. Fan, W. J.; Zhang, R. Q. J Theor Comp Chem 2006, 5, 411. 39. Pearson, R. G. Chemical Hardnesss Applications from Molecules to Solids; VCH-Wiley: Weinheim, 1997. 40. Parr, R. G.; Yang, W. Density Functional Theory of Atoms and Molecules; Oxford University Press: Oxford, 1989. 41. Iczkowski, R. P.; Margrave, J. L. J Am Chem Soc 1961, 83, 3547. 42. Mulliken, R. S. J Chem Phys 1934, 2, 782. 43. Parr, R. G.; Pearson, R. G. J Am Chem Soc 1983, 105, 7512. 44. Parr, R. G.; Szentpaly, L. V.; Liu, S. J Am Chem Soc 1999, 121, 1922. 45. Lee, C.; Yang, W.; Parr, R. G. Phys Rev B 1988, 37, 785. 46. Becke, A. D. J Chem Phys 1993, 98, 5648. 47. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Montgomery, J. A., Jr.; Vreven, T.; Kudin, K. N.; Burant, J. C.; Millam, J. M.; Iyengar, S. S.; Tomasi, J.; Barone, V.; Mennucci, B.; Cossi, M.; Scalmani, G.; Rega, N.; Petersson, G. A.; Nakatsuji, H.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.;

974 Fan, Zhang, and Liu Vol. 28, No. 5 Nakai, H.; Klene, M.; Li, X.; Knox, J. E.; Hratchian, H. P.; Cross, J. B.; Adamo, C.; Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O.; Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Ayala, P. Y.; Morokuma, K.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Zakrzewski, V. G.; Dapprich, S.; Daniels, A. D.; Strain, M. C.; Farkas, O.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Ortiz, J. V.; Cui, Q.; Baboul, A. G.; Clifford, S.; Cioslowski, J.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Gonzalez, C.; Pople, J. A.Gaussian 03, Revision B 05. Gaussian, Inc.: Pittsburgh, PA, 2003. 48. Herbert, H. E.; Halls, M. D.; Hratchian, H. P.; Raghavachari, K. J Phys Chem B 2006, 110, 3336. 49. Zhang, R. Q.; Chu, T. S.; Lee, C. S.; Lee, S. T. J Phys Chem B 2000, 104, 6761. 50. Zhang, R. Q.; Chan, K. S.; Zhu, R. S.; Han, K. L. Phys Rev B 2001, 63, 085419.