Cooperativity and Specificity of Cys 2 His 2 Zinc Finger Protein-DNA Interactions: A Molecular Dynamics Simulation Study

Similar documents
Supplementary Information

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

ONETEP PB/SA: Application to G-Quadruplex DNA Stability. Danny Cole

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Peptide folding in non-aqueous environments investigated with molecular dynamics simulations Soto Becerra, Patricia

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

SUPPLEMENTARY INFORMATION

MM-PBSA Validation Study. Trent E. Balius Department of Applied Mathematics and Statistics AMS

2: CHEMICAL COMPOSITION OF THE BODY

Molecular dynamics simulation of Aquaporin-1. 4 nm

Insights into Protein Protein Binding by Binding Free Energy Calculation and Free Energy Decomposition for the Ras Raf and Ras RalGDS Complexes

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

All-atom Molecular Mechanics. Trent E. Balius AMS 535 / CHE /27/2010

Applications of Molecular Dynamics

Section Week 3. Junaid Malek, M.D.

Lecture 2-3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

From Amino Acids to Proteins - in 4 Easy Steps

Why Proteins Fold? (Parts of this presentation are based on work of Ashok Kolaskar) CS490B: Introduction to Bioinformatics Mar.

Chapter 1. Topic: Overview of basic principles

Proton Acidity. (b) For the following reaction, draw the arrowhead properly to indicate the position of the equilibrium: HA + K + B -

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules

DISCRETE TUTORIAL. Agustí Emperador. Institute for Research in Biomedicine, Barcelona APPLICATION OF DISCRETE TO FLEXIBLE PROTEIN-PROTEIN DOCKING:

Solutions and Non-Covalent Binding Forces

T H E J O U R N A L O F G E N E R A L P H Y S I O L O G Y. jgp

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Lots of thanks for many reasons!

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Intermolecular Forces & Condensed Phases

NH 2. Biochemistry I, Fall Term Sept 9, Lecture 5: Amino Acids & Peptides Assigned reading in Campbell: Chapter

The protein folding problem consists of two parts:

It s the amino acids!

Unfolding CspB by means of biased molecular dynamics

Example questions for Molecular modelling (Level 4) Dr. Adrian Mulholland

Introduction to Polymer Physics

Affinity and Specificity of Protein U1A-RNA Complex Formation Based on an Additive Component Free Energy Model

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

Medical Research, Medicinal Chemistry, University of Leuven, Leuven, Belgium.

Biomolecules. Energetics in biology. Biomolecules inside the cell

Properties of amino acids in proteins


An introduction to Molecular Dynamics. EMBO, June 2016

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme

Problem Set 1

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

The MOLECULES of LIFE

Chapter 9 DNA recognition by eukaryotic transcription factors

A. Reaction Mechanisms and Catalysis (1) proximity effect (2) acid-base catalysts (3) electrostatic (4) functional groups (5) structural flexibility

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Molecular Mechanics. I. Quantum mechanical treatment of molecular systems

Hyeyoung Shin a, Tod A. Pascal ab, William A. Goddard III abc*, and Hyungjun Kim a* Korea

Tailoring the Properties of Quadruplex Nucleobases for Biological and Nanomaterial Applications

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Structure Investigation of Fam20C, a Golgi Casein Kinase

Chapter 002 The Chemistry of Biology


Explicit Ion, Implicit Water Solvation for Molecular Dynamics of Nucleic Acids and Highly Charged Molecules

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Why Proteins Fold. How Proteins Fold? e - ΔG/kT. Protein Folding, Nonbonding Forces, and Free Energy

schematic diagram; EGF binding, dimerization, phosphorylation, Grb2 binding, etc.

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

Towards accurate calculations of Zn 2+ binding free energies in zinc finger proteins

Discrimination of Near-Native Protein Structures From Misfolded Models by Empirical Free Energy Functions

Finite Ring Geometries and Role of Coupling in Molecular Dynamics and Chemistry

Aqueous solutions. Solubility of different compounds in water

Lec.1 Chemistry Of Water

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Biochemistry 530: Introduction to Structural Biology. Autumn Quarter 2014 BIOC 530

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Molecular dynamics simulations of a single stranded (ss) DNA

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

Model Worksheet Teacher Key

Enhancing Specificity in the Janus Kinases: A Study on the Thienopyridine. JAK2 Selective Mechanism Combined Molecular Dynamics Simulation

Advanced in silico drug design

Molecular Mechanics, Dynamics & Docking

Destruction of Amyloid Fibrils by Graphene through Penetration and Extraction of Peptides

Supporting Material for. Microscopic origin of gating current fluctuations in a potassium channel voltage sensor

Lecture 11: Potential Energy Functions

Hands-on : Model Potential Molecular Dynamics

β1 Structure Prediction and Validation

Microbiology with Diseases by Taxonomy, 5e (Bauman) Chapter 2 The Chemistry of Microbiology. 2.1 Multiple Choice Questions

Problems from Previous Class

Computational Modeling of Protein Kinase A and Comparison with Nuclear Magnetic Resonance Data

Lecture 15: Enzymes & Kinetics. Mechanisms ROLE OF THE TRANSITION STATE. H-O-H + Cl - H-O δ- H Cl δ- HO - + H-Cl. Margaret A. Daugherty.

Statistical Mechanics for Proteins

NB-DNJ/GCase-pH 7.4 NB-DNJ+/GCase-pH 7.4 NB-DNJ+/GCase-pH 4.5

Supporting Information. The hinge region strengthens the nonspecific interaction. between lac-repressor and DNA: a computer simulation.

MM-GBSA for Calculating Binding Affinity A rank-ordering study for the lead optimization of Fxa and COX-2 inhibitors

Comparing crystal structure of M.HhaI with and without DNA1, 2 (PDBID:1hmy and PDBID:2hmy),

Exploring the Sequence Dependent Structure and Dynamics of DNA with Molecular Dynamics Simulation

1/23/2012. Atoms. Atoms Atoms - Electron Shells. Chapter 2 Outline. Planetary Models of Elements Chemical Bonds

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Molecular Dynamics. Coherency of Molecular Motions. Local Minima & Global Minimum. Theoretical Basis of Molecular Dynamic Calculations

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th

Ch. 2 BASIC CHEMISTRY. Copyright 2010 Pearson Education, Inc.

Transcription:

7662 J. Phys. Chem. B 2010, 114, 7662 7671 Cooperativity and Specificity of Cys 2 His 2 Zinc Finger Protein-DNA Interactions: A Molecular Dynamics Simulation Study Juyong Lee, Jin-Soo Kim, and Chaok Seok* Department of Chemistry, Seoul National UniVersity, Seoul, Republic of Korea ReceiVed: February 25, 2010; ReVised Manuscript ReceiVed: April 20, 2010 Cys 2 His 2 zinc finger proteins are one of the most frequently observed DNA-binding motifs in eukaryotes. They have been widely used as a framework for designing new DNA-binding proteins. In this work, the binding affinity and conformational change of the Zif268-DNA complex were successfully reproduced with MD simulations and MM-PBSA analysis. The following new discoveries on the zinc finger protein-dna interactions were obtained by careful energy decomposition analysis. First, a dramatic increase in the binding affinity was observed when the third zinc finger is added, indicating a cooperative nature. This cooperativity is shown to be a consequence of the small but distinctive conformational change of DNA, which enables a tight fit of the protein into the major groove of DNA. Second, specificity of the amino acid-nucleotide recognitions observed in the crystal structure is explained as originating from the ability of specific side chains and bases to take the optimal geometries for favorable interactions between polar groups. The success of the current approach implies that similar methods could be further applied to the study of protein-dna interactions involving longer polyfingers or different linkers between fingers to provide insights for design of novel zinc finger proteins. Introduction Cys 2 His 2 zinc finger proteins constitute one of the most common classes of regulatory proteins found in eukaryotes. These proteins consist of highly conserved ββr fold zinc finger domains in which each zinc atom is coordinated by two cysteines and two histidines, stabilizing the structure. Three residues located near or at the R-helix make direct contact with three nucleotide bases on the major groove side of DNA, as revealed by the X-ray crystal structures. 1 Because of the simple nature of the zinc finger-dna interactions, zinc finger proteins have been used as a framework for engineering proteins that recognize predetermined DNA sequences, which can be conjugated with other functional protein domains for specific gene activation/repression, DNA modification, gene editing, etc. 2-5 Such methods may be further utilized in a number of therapeutic applications. 6-10 There are two key requirements for zinc finger proteins to have wider applications in vivo: one is specificity for the desired sequence, and the other is a strong affinity to overcome competition with other DNA-binding proteins. To engineer a novel zinc finger protein, a modular assembly method is typically employed. For example, a protein which recognizes a unique 18 base-pair target site can be constructed by linking six known fingers. 11 However, it is known that the binding affinity cannot be explained by a simple additive manner. For example, according to experiments with two-, three-, four-, and five-finger proteins, the binding constant reaches a plateau after the three-finger protein. 12 In this work, we calculated the binding affinity for one-finger, two-finger, and three-finger protein-dna complexes based on the crystal structure of the Zif268-DNA complex, by employing molecular dynamics (MD) simulations and MM-PBSA (molecular mechanics-poisson-boltzmann surface area) analysis. * To whom correspondence should be addressed. E-mail: chaok@snu.ac.kr. Phone: 82-2-880-9197. Fax: 82-2-871-8119. Our simulation reproduces the known experimental binding affinity for the three-finger protein-dna complex. Our results further show that the binding affinity changes nonlinearly as the number of fingers increases. The nonlinearity is explained as coming from the strong enthalpy change that arises due to a small conformational change in DNA when the three-finger protein is bound by a careful analysis. The DNA bound to Zif268 is slightly unwound and has an enlarged major groove compared to canonical B-DNA. It was therefore assumed that the strain in DNA would increase for polyfinger proteins longer than three-finger when connected by the canonical linker and thus be responsible for the lack of gain in binding affinity. 13 On the basis of this hypothesis, a six-finger protein with 6000-fold stronger affinity was designed by using a longer linker. 14 However, based on the later crystal structure of a complex with two separate three-finger proteins bound to DNA in tandem, the high binding affinity for the six-finger protein with a longer linker is probably due to a smaller entropy loss. 15 In this paper, we show that the conformational change of DNA is essential for the high binding affinity of the Zif268-DNA complex, unlike previous assumptions. Interestingly, the binding affinity increases dramatically when the three-finger protein is bound compared to the one- or two-finger proteins, and only the three-finger protein can induce this conformational change of DNA. The polar interaction energy (Coulomb energy plus polar solvation free energy) between protein and DNA is the major source of the strong binding of the three-finger protein. Thus, the tighter binding of the three-finger-dna complex is a consequence of the DNA conformational change. We also introduce a new analysis scheme, an atom-pair energy decomposition and a nucleobase-mutation analysis. By careful investigation of strong interatom interactions, key functional groups related to specific recognition are identified. It is also confirmed that specific amino acid-nucleotide pairs have optimal geometries for favorable polar interactions. This 10.1021/jp1017289 2010 American Chemical Society Published on Web 05/14/2010

Cys 2 His 2 Zinc Finger Protein-DNA Interactions J. Phys. Chem. B, Vol. 114, No. 22, 2010 7663 analysis is conceptually similar to the well-known computational alanine scanning analysis, 16 but our current scheme has the advantage of being able to investigate contributions at atomic levels of detail as well as at the residue level. The MM-PBSA technique we employ is more efficient than the traditional free energy calculation methods such as free energy perturbation (FEP) or thermodynamic integration (TI) that require sampling of intermediate states. This method allows us to carry out analyses of protein-dna interactions easily by decomposition into free energy components including entropy and also into substructure interactions. So far, a majority of studies using the MM-PBSA approach have dealt with the stability of single biomolecules, 17,18 protein-ligand binding, 19-21 or protein-protein interactions. 22 A limited number of studies have been published on the protein-nucleic acid binding problem to the best of our knowledge. 23-25 The relatively smaller number of MM-PBSA studies on protein-nucleic acid complexes is probably due to the difficulty in the accurate treatment of electrostatics and solvation effects with the implicit solvent model. 26 A number of studies have been published on prediction of protein-dna interactions based on physical energy models or statistical models derived from the known protein-dna complex structures. 27-36 These methods are very useful for prediction of protein-dna binding affinity and specificity and are also computationally efficient. However, such methods often introduce drastic approximations on the effects of conformational change, solvation, and/or entropy for simplicity. Statistical models are also limited by the size of the database of protein-dna complex structures. Therefore, investigation of protein-dna interactions in an ab initio manner as in the current study is worthwhile to pursue, as a complement to the above empirical approaches. Methods All-Atom, Explicit Water Molecular Dynamics Simulations for Seven Systems. The initial structures for molecular dynamics (MD) simulations presented in this paper are all based on the 1.6 Å resolution X-ray crystal structure of the Zif268-DNA complex whose protein database ID is 1AAY. 13 Independent MD simulations were performed for each of the following seven systems: three unbound zinc finger proteins (one-finger, two-finger, and three-finger proteins) derived from the Zif268 protein, three zinc finger protein-dna complexes (one-finger-dna, two-finger-dna, and three-finger-dna complexes), and the unbound DNA. Throughout the paper, the one-finger protein refers to the zinc finger motif that consists of amino acids Arg3 to Lys33 of Zif268 (also denoted by finger 1), the two-finger protein refers to a chain that consists of finger 1 and finger 2, where finger 2 consists of Pro34 to Lys61 of Zif268, and the three-finger protein refers to the full Zif268 that includes finger 3 which consists of Pro62 to Arg87. The crystal structure of the Zif268-DNA complex and the positions of the recognition sites of each finger, positions -1, 3, and 6, are shown in Figure 1. All the MD simulations were carried out using the AMBER 10 package. The ff03 force field 37 with the Barcelona modifications, 38 which improves description of the R/γ conformers of nucleic acids, was employed. It is known that zinc ion binding plays an important role in the stability and dynamics of the zinc finger proteins, which may affect the protein-dna interactions. 39-41 To maintain the tetrahedral structures of the zinc binding sites of the Cys 2 His 2 coordination, the cationic pseudoatom representation 42 was employed instead of a simple Figure 1. (a) Structure of the native Zif268-DNA complex. Each finger is drawn in different color: green, purple, and blue for finger 1, finger 2, and finger 3, respectively. Zinc ions are shown as red spheres. Side chains at the recognition sites are shown explicitly. (b) Sequence and secondary structure of the three fingers of native Zif268. Residues that make base-contacts are shown in bold and color-coded by position: -1 position in red, 2 in yellow, 3 in green, and 6 in dark blue. The definition of the helical position is written below the sequence. Cysteine and histidine residues coordinated to zinc are shown in pale blue. nonbonded representation. In the cationic pseudoatom representation, the zinc ion is replaced by a neutral zinc atom centered at a tetrahedron formed by four massless pseudoatoms with charge +0.5e. The +2e charge of the zinc ion is restored for the subsequent MM-PBSA analysis. The zinc ion coordination geometry obtained in this way was confirmed to be stable around 0.5 Å from the crystal structure throughout the MD simulations for all of the three zinc binding sites. Every biomolecule was solvated with the TIP3P water molecules in a truncated octahedral box under periodic boundary conditions. The minimum distance from the solute to the boundaries of the simulation box was set to 10 Å. Sodium or chloride ions were added to neutralize the systems depending on the charge of solute. The generation of the solvation box and placement of counterions were achieved using the tleap program. The particle mesh Ewald method 43 was employed to treat the long-range electrostatics. The nonbonded cutoff for the long-range interactions was set to 10 Å. Before performing MD simulations, the force field energy of each starting structure was minimized by progressively relaxing atoms in the following four steps: relaxation of (1) zinc ions and their accompanying pseudocationic ions, (2) Cys 2 His 2 sites including zinc ions, (3) the whole solute, and (4) the whole system including water molecules. Harmonic restraint potentials were imposed on the atoms that were not targets of relaxation with a force constant of 100 kcal/å 2 mol in step 1 and 10 kcal/ Å 2 mol in steps 2 and 3. In the minimization steps 1, 2, and 3, the steepest descent and conjugated gradient minimizations were performed for 1500 cycles in tandem. A longer minimization of 10 000 cycles was carried out without restraints in step 4. After this initial relaxation, each system was heated to 300 K, applying harmonic restraints with a force constant of 2 kcal/ Å 2 mol on the solute atoms. An unrestrained 1.1 ns MD simulation at 300 K and 1 atm was subsequently performed to

7664 J. Phys. Chem. B, Vol. 114, No. 22, 2010 Lee et al. equilibrate the system and to adjust the density of the solvation box. The lengths of bonds involving hydrogen atoms were constrained by SHAKE 44 in order to use a longer time step, 2 fs. A Langevin thermostat with 2.5 ps -1 of collision frequency and a weak coupling based barostat 45 with 2.0 ps of relaxation time were employed. Production MD simulation was then carried out for 30 ns except for the free DNA for which 20 ns simulation was enough to get a well-converged trajectory. During the production runs, MD trajectories were recorded at every 5 ps. Estimation of the Binding Free Energy and Energy Decomposition Analysis. The MM-PBSA methodology 18 was employed to estimate the binding free energy. The free energy at temperature T corresponding to a state represented by a snapshot of a solvated molecule is expressed as G molecule ) H int + H vdw + H Coul + G PB(or GB) + G SA - TS The first three terms, H int, H vdw, and H Coul, represent internal strain energy, van der Waals energy, and Coulomb electrostatic energy of the solute, respectively. G PB(or GB) is an estimate of the polar solvation free energy under a mean field approximation such as Poisson-Boltzmann (PB) or Generalized Born (GB) solvation model. This term was calculated with three different solvation models, GB OBC (I), 46 GB OBC (II), 47 and PB, 48 to assess the accuracies of the different models. The ionic strength was set to 65 mm to mimic the experimental conditions. 49 The term G SA stands for the nonpolar solvation free energy including the free energy of cavity formation in the solvent medium and is approximated to be proportional to the solvent-accessible surface area, SA, i.e., G SA ) γsa + b. 50 The surface area SA was calculated using the LCPO method, 19 and the coefficients were set as follows: γ ) 0.0072 kcal/å 2 mol and b ) 0.0 kcal/mol for the GB models, and γ ) 0.00542 kcal/å 2 mol and b ) 0.92 kcal/mol for PB. 51 The entropy, S, consists of the translational, rotational, and vibrational entropy of the solute and was calculated with the nmode program. The vibrational modes were obtained under harmonic approximation after energy minimization in a distance-dependent dielectric environment, with ε ) 4r. The binding free energy of the n-finger-dna complex was then calculated as G n-finger ) H n-finger - T S n-finger ) G n-finger complex - G DNA - G n-finger protein, where G molecule is the average of G molecule over the snapshots collected every 100 ps. Only one-sixth of the snapshots were used for normal-mode analysis, which is a common practice because of the high computational demands. Considering the conformational changes occurring during simulations, as presented in Results and Discussion, snapshots from 0-10 ns, 15-30 ns, 10-30 ns, and 0-10 ns were used for one-finger, two-finger, three-finger, and one-finger-dna, respectively. Full trajectories were used for two-finger-dna, three-finger-dna, and the free DNA. The residue pairwise energy decomposition analysis was carried out with the mm_pbsa.pl and mm_pbsa_statistics.pl scripts in the Amber 10 suite. Theoretical descriptions of both methods can be found in the references. 16,22 All the energy components including the self-energy term of GB and the nonpolar part of solvation free energy were decomposed, except for the internal strain energy. The sum of the free energy contributions from the free energy decomposition analysis deviates from the total free energy by sub kcal/mol due to the loss of precision during addition of the values written in the files, but this fact does not pose any problems in drawing the conclusions of this study. The atom pairwise energy decomposition analysis was performed by modifying the Amber 10 source code to print out atom pair contributions. Since only the solvation energy calculated from the Generalized Born model can be directly decomposed into the individual atom pair contributions, the GB OBC (I) model was employed for this analysis. It was confirmed that the differences in the residue pair interaction energies obtained by GB and PB models for different residue pairs are nearly constant with only small variances. Calculation of DNA Helical Parameters and the Student s t Test. DNA helical parameters were calculated for the DNA conformations in the three complexes and in the free DNA by using the 3DNA program 52 along the MD trajectories. The Student s t test was performed to test the statistical hypothesis that the mean values of the two distributions for a DNA helical parameter obtained from two different simulations are equal. From the mean values (µ 1 and µ 2 ) and the standard deviations (σ 1 and σ 2 ) of two distributions, the t-statistic is calculated as t ) (µ 1 - µ 2 )/(σ 2 1 /n 1 + σ 2 /n 2 ) 1/2, where n 1 and n 2 are the numbers of observations. Considering the number of data points in this study, the mean values can be considered significantly different at 99.95% confidence level if the absolute value of the t-statistic is over 5.0. Results and Discussion Convergence of the MD Simulations and Conformational Changes of Unbound Zinc Finger Proteins. To check the convergence of the MD simulations, the root mean square deviations (RMSDs) of heavy atoms are plotted in Figure 2. The rmsd values of the two-finger-dna and three-finger-dna complexes (red lines in Figure 2b and 2c) and the unbound DNA (Figure 2d) remain stable around 2 Å during the entire simulation time. However, the rmsd of the one-finger-dna complex increases steadily (red line in Figure 2a) because the one-finger protein and DNA slowly depart from each other, indicating that the one-finger protein is unable to form a stable complex with DNA. This can be verified by the coherent change of rmsd (red line) and the distance between DNA and the onefinger protein (green line) in Figure 2a. Therefore, only the snapshots sampled from the first 10 ns, which represents a proper complex form, were used for the MM-PBSA calculation for the one-finger-dna complex. The rmsd of the unbound one-finger protein remains around 2 Å although it increases after 18 ns due to fluctuations in the terminal residues (blue line in Figure 2 a). In contrast to that, the unbound two-finger and three-finger proteins suffer much larger conformational changes from the initial bound conformations (blue lines in Figure 2b and 2c) and stabilize at around rmsd 7 Å. The average structures for the 15-30 ns portion of the two-finger protein and that for the 10-30 ns portion of the three-finger protein show interesting conformational changes in the TGQKP linker region between finger 1 and finger 2, as illustrated in Figure 3. In both proteins, the large conformational change of the linker results in close contact between the helix of finger 1 and the sheet region of finger 2. The polar interaction between R27 of finger 1 and Q36 of finger 2 is regarded as a key interaction responsible for this change. The second linker connecting finger 2 and finger 3 of the three-finger protein does not show any significant conformational change. This difference between the two linkers can be understood from the fact that finger 3 has alanine, A64, at the position equivalent to Q36 of finger 2, and close interaction with arginine is not possible. From these observations, it is suggested that a mutation of Q36 may affect the binding affinity of the complex by changing the

Cys 2 His 2 Zinc Finger Protein-DNA Interactions J. Phys. Chem. B, Vol. 114, No. 22, 2010 7665 Figure 2. Time series of the rmsd values of molecules investigated in the present paper from the initial structures extracted from the Zif268-DNA complex. Note the different scales for y-axes in different plots. The red lines in plots a, b, and c correspond to RMSDs of the one-finger-dna, the two-finger-dna, and the three-finger-dna complexes, respectively. The blue lines in a, b, and c represent RMSDs of unbound one-finger, two-finger, and three-finger proteins, respectively. Plot d shows rmsd of the free DNA with no bound protein. The green line in plot a shows the distance between the centers of mass of one-finger protein and DNA. Nearly identical behavior of the red line and the green line in plot a indicates that the one-finger-dna complex starts to dissociate. Figure 3. Conformational changes of (a) the two-finger and (b) the three-finger proteins in the free form are illustrated. The MD average structures for unbound proteins and the crystal structures for bound proteins are drawn in purple and cyan, respectively. The structures for finger 1 on the left are superimposed for comparison. In both proteins, large deviations from the crystal structure occur around Lys33-Pro34, the linker region connecting fingers 1 and 2, while no severe change is observed in the linker between fingers 2 and 3. Close contacts between Arg27 of finger 1 and Gln36 of finger 2, displayed explicitly, can be observed. This interaction is identified as responsible for finger domain reorientation in the unbound state. stability of the unbound zinc finger protein. This kind of role for Q36 has not yet been discussed in the literature. Estimation of the Binding Affinity of the Zif268-DNA Complex. The binding affinity values for the Zif268-DNA complex calculated with the three continuum solvent models, PB, GB OBC (I), and GB OBC (II), are compared with the experimental value 49 in Table 1. The binding affinity was also calculated by extracting the free energy of the unbound species from the trajectory of the complex (referred to as singletrajectory method), as in many applications of MM-PBSA methodology, and is presented together with those obtained from separate simulations for unbound species (referred to as separate-trajectory method) in the table. Out of the four methods, the separate-trajectory approach with the PB solvation model produces the best result, -10.1 kcal/ mol, which agrees with the experimental value -13.4 kcal/mol TABLE 1: Experimental Binding Affinity and the Values Calculated by Using Four Different Methods calculation a (kcal/mol) experiment single-trajectory b separate-trajectory c (kcal/mol) PB PB GB OBC (I) GB OBC (II) -13.4-38.8 ( 1.2-10.1 ( 3.4 12.6 ( 3.3-166.3 ( 4.2 a Calculated binding free energies and the standard errors for four different methods. b The free energies of unbound species were calculated from the snapshots extracted from the single complex trajectory. c The free energies of unbound species were calculated from the separate trajectories of the unbound protein and DNA. within standard error. Taking into account that conformational entropy loss is not included in the current calculations, it would be fair to consider that the current approach gives a somewhat overestimated binding affinity. It is still noticeable that this PB result is close to experiment without introducing any postprocesses, such as the linear response 53 approach that has been employed frequently in conjunction with MM-PBSA or MM- GBSA calculations. 18,54,55 The two GB models employed in this study are known to give a reliable approximation to PB for globular proteins and protein-protein complexes with much less computational burden. 47 However, according to the current calculation, neither of the GB models is able to give a reliable estimate: GB OBC (I) gives a positive binding affinity and GB OBC (II) produces too negative a value. A similar result was also reported in the MM- GBSA study of protein-rna complex formation. 55 The binding free energy estimated by the single-trajectory method with PB is lower than that from the separate-trajectory method with PB by -28.7 kcal/mol. This large free energy difference corresponds to the adaptation energy, the free energy contribution due to conformational change of the Zif268 protein

7666 J. Phys. Chem. B, Vol. 114, No. 22, 2010 Lee et al. TABLE 2: Results of the Student s t-tests on the Major Groove Width for the Three Complexes with Different Number of Fingers complex base pair step t-value width a (Å) width complex b (Å) one-finger-dna 3 GT/AC 1.89 0.3 19.2 4 TG/CA 3.87-0.6 19.7 5 GG/CC 5.36-0.9 20.2 6 GG/CC 2.82 0.5 20.9 7 GC/GC 1.16-0.2 20.4 two-finger-dna 3 GT/AC 0.64 0.1 19.0 4 TG/CA 1.98-0.3 20.0 5 GG/CC 0.68 0.1 21.2 6 GG/CC 6.75 1.1 21.5 7 GC/GC 1.34 0.2 20.8 three-finger-dna 3 GT/AC 17.15 2.6 21.5 4 TG/CA 1.53 0.2 20.5 5 GG/CC 12.16-1.7 19.4 6 GG/CC 5.29 0.8 21.2 7 GC/GC 13.74 2.2 22.8 a (Major groove width of bound DNA) - (Major groove width of unbound DNA). b Major groove width of bound DNA. and DNA induced during the binding process. As described in the previous section, the relative orientations of the finger modules in the unbound finger proteins are quite different from those in the complexes, and the current result reveals that such conformational adaptation contributes significantly to the binding affinity. The DNA undergoes a relatively smaller conformational change into B enlarged groove -DNA upon binding in our simulation, which was also observed in the crystal structure of the complex 13 and will be analyzed in more detail below. Conformational Change of DNA upon Binding with Different Number of Zinc Fingers. In the crystal structure of the Zif268-DNA complex, DNA adopts a conformation with a deeper and wider groove, termed B enlarged groove -DNA. 13 We have examined how the conformational change to the enlarged groove structure is affected by the number of binding fingers, focusing on two major helical parameters, 1,13 the width of the major groove, and the helical twist. To confirm the statistical significance of the differences in the helical parameters from different simulations, a Student s t test was carried out. The t test results for the major groove width and the helical twist are summarized in Tables 2 and 3, respectively. From the t-values and the magnitudes of changes in the helical parameters in Tables 2 and 3, it is apparent that a distinctive conformational change occurs only when the three-finger protein is bound, and that the one-finger and two-finger proteins cannot induce such conformational change in DNA. According to Table 2, four out of five base pair steps of the three-finger-dna complex show large t-values over 5, which means that the differences in the major groove width between bound DNA and unbound DNA are statistically significant at 99.95% confidence level at those base pair steps. Three of the four steps are significantly enlarged. On the other hand, only one base pair step has a t-value greater than 5 in the one-finger and the two-finger complexes, indicating that they can induce only local change in DNA conformation. The unbound DNA is slightly unwound with an average twist angle of 33, less than the 36 for canonical B-DNA, from our simulation. This undertwist of the unbound DNA can be attributed to a sequence-dependent effect. Binding of the onefinger or two-finger protein does not affect the helical twist significantly, unwinding by only 0.5 on average, as can be seen from Table 3. However, the three-finger complex shows further unwinding of the helix by 1.5 on average. The most evident change occurs at the first, the third, and the eighth base pair TABLE 3: Results of Student s t-tests on the Helical Twist for the Three Complexes with Different Number of Fingers complex base pair step t-value twist a (deg) twist complex b (deg) one-finger-dna 1 GC/GC 2.71-1.5 33.1 2 CG/CG 0.22-0.1 37.6 3 GT/AC 0.05 0.0 34.1 4 TG/CA 0.08 0.0 30.2 5 GG/CC 4.59 2.2 34.7 6 GG/CC 2.45 0.9 32.9 7 GC/GC 0.14-0.1 33.9 8 CG/CG 1.75-0.9 34.7 9 GT/AC 5.32 3.5 29.7 average 0.5 33.4 two-finger-dna 1 GC/GC 1.19-0.7 33.9 2 CG/CG 1.58 0.8 38.5 3 GT/AC 0.28 0.1 34.2 4 TG/CA 2.39 1.2 31.4 5 GG/CC 4.91 2.1 34.6 6 GG/CC 3.78-1.4 30.6 7 GC/GC 2.12 0.9 34.9 8 CG/CG 5.19-2.7 33.0 9 GT/AC 5.68 3.6 29.8 average 0.5 33.4 three-finger-dna 1 GC/GC 17.40-10.7 23.9 2 CG/CG 1.33 0.6 38.3 3 GT/AC 8.65-3.8 30.3 4 TG/CA 2.51 1.2 31.3 5 GG/CC 3.19 1.5 34.0 6 GG/CC 1.50 0.5 32.5 7 GC/GC 1.24-0.6 33.4 8 CG/CG 8.31-5.7 29.9 9 GT/AC 5.56 3.6 29.8 average -1.5 31.5 a (Helical twist of bound DNA) - (helical twist of unbound DNA). b Helical twist of bound DNA. TABLE 4: The Enthalpy and Entropy Contributions to the Calculated Binding Free Energy for the Three Zinc Finger-DNA Complexes with Different Number of Fingers one-finger a two-finger a three-finger a H n-finger -36.0 ( 3.7-44.3 ( 3.9-86.0 ( 3.4 -T S n-finger 37.1 ( 1.4 51.7 ( 1.2 75.8 ( 0.5 G n-finger ) H n-finger - T S n-finger 1.1 ( 4.0 7.4 ( 4.1-10.1 ( 3.4 a Ensemble average over the snapshots in kcal/mol with the standard error of the mean. steps, and the first and the third steps are also the most unwound steps in the high resolution crystal structure. 13 Cooperative Nature of the Zinc Finger-DNA Binding. Since the experimental binding affinity of the Zif268-DNA complex and the conformational change of DNA upon binding can be reproduced with our MM-PBSA approach, we now proceed with a detailed analysis of the effects of the number of fingers and contributions from different free energy components. The binding free energies of the one-finger- and twofinger-dna complexes are estimated to be positive, 1.1 and 7.4 kcal/mol, respectively, as presented in Table 4, which means that binding of one- or two-finger proteins with DNA is not favored, in agreement with the experimental observation that such zinc-finger-dna complexes have lower binding affinity. 12 The one-finger-dna complex started to dissociate in our MD simulation trajectory, indicating that this complex is also kinetically very unstable. However, when finger 3 is attached to the two-finger protein, the protein binds tightly to DNA with an estimated binding free energy of -10.1 kcal/mol. To understand this cooperative binding behavior of the three-finger protein with DNA, enthalpy and entropy contributions are also examined in Table 4. When the number of binding

Cys 2 His 2 Zinc Finger Protein-DNA Interactions J. Phys. Chem. B, Vol. 114, No. 22, 2010 7667 TABLE 5: Difference, H, between the Binding Enthalpy Changes Due To One Finger Addition to the One-Finger Complex, H 12 ) H 2-finger - H 1-finger, and to the Two-Finger Complex, H 23 ) H 3-finger - H 2-finger H ) H 23 - H 12 substructure interaction H vdw a H polar H SA H int sum within finger -8.2 3.5 1.7 1.8-1.2 within DNA 8.1-0.9 3.4 8.0 18.6 finger-dna -1.5-40.8-8.0 N/A -50.3 total -1.6-38.2-2.9 9.8-32.9 b a Difference between changes in polar interactions, H polar ) H Coul + H PB. b This value is different from that calculated from Table 4, 33.4 kcal/mol, by 0.5 kcal/mol, and this error is due to the accumulation of rounding-off errors that occur during summation of the large number of pair interactions written in the files. fingers is increased from one to two, the loss in the binding entropy, - T S 12 )-T[ S 2-finger - S 1-finger ] )+14.6 kcal/ mol, is larger than the gain in the binding enthalpy, H 12 ) H 2-finger - H 1-finger )-8.3 kcal/mol, resulting in an unstable two-finger-dna complex. However, when the third finger is attached, the binding affinity increases dramatically, especially due to the increased enthalpy contribution: the binding enthalpy and entropy changes for the process of the third finger addition are H 23 ) H 3-finger - H 2-finger )-41.7 kcal/mol and - T S 23 )-T[ S 3-finger - S 2-finger ] )+24.1 kcal/mol, respectively. Moreover, the calculated enthalpy contribution to the threefinger-dna binding, -86.0 kcal/mol, is 1.94 times that to the two-finger-dna binding, -44.3 kcal/mol, while the number of interacting side chains with DNA for the three-finger-dna complex is approximately 1.5 times that for the two-finger-dna complex. This strong increase in the enthalpy contribution can therefore be attributed as being responsible for the cooperative binding of the three-finger protein to DNA. To obtain a more detailed picture of the source of the cooperativity in the zinc-finger-dna binding, the enthalpy contribution is decomposed into contributions from interactions within finger protein, within DNA, and between finger protein and DNA in Table 5. The difference between the binding enthalpy changes for the process of the second finger addition, H 12 ) H 2-finger - H 1-finger, and for that of the third finger addition, H 23 ) H 3-finger - H 2-finger, referred to as H ) H 23 - H 12, is examined. This value accounts for the difference between the contribution of finger 3 to the binding enthalpy of the three-finger complex and that of finger 2 to the binding enthalpy of the two-finger complex. The contribution of each substructure interaction to H is decomposed again into energy components, van der Waals energy, polar interaction energy (Coulomb energy plus polar solvation free energy), nonpolar solvation free energy, and internal energy, as shown in Table 5. The Coulomb and polar solvation terms have large absolute values and opposite signs, and the sum of the two terms represents the net effect of polar interaction. The data used to produce Table 5 are provided in Supporting Information Tables S1-S5. According to Table 5, total H is -32.9 kcal/mol, which means that enhancement in the binding enthalpy by addition of the third finger is larger than that by the second finger by 32.9 kcal/mol. When H is decomposed into contributions from different substructure interactions, the largest contribution comes from finger-dna interaction, -50.3 kcal/mol. Interaction within DNA disfavors addition of the third finger compared to that of the second finger by 18.6 kcal/mol. Interaction within zinc finger proteins contributes very little to H. The energy components responsible for these contributions are discussed below. The polar interaction energy between finger and DNA, -40.8 kcal/mol, is the component that contributes by far the most to H in Table 5. This large value cannot be attributed solely to the cooperativity effect because there are other factors such as protein and DNA sequences and position of binding. The amino acid sequence of finger 3 is very similar to finger 1, in particular in the key residues for binding, but is dissimilar to finger 2. Finger 3 recognizes the same base sequence (dgdcdg) as finger 1, but finger 2 recognizes a different base sequence (dtdgdg). The difference in the binding position in the DNA chain can also affect H because finger 3 binds to the terminal and finger 2 binds to the middle of the DNA chain. The contribution of these effects may be estimated as the difference between finger 3-DNA (or finger 1-DNA) and finger 2-DNA interaction energies in the three-finger-dna complex. It is then estimated to be -19.8 to -20.6 kcal/mol. Considering that only the three-finger protein induces a small, but distinctive conformational change in DNA, we suggest that this conformational change is closely connected to the cooperative binding of the three-finger protein with DNA. The large cooperativity effect of 20 kcal/mol from the polar interaction between a zinc finger protein and DNA is consistent with this proposal. Since this value is obtained after eliminating sequence and position dependence and intraprotein and intra-dna contributions, its origin can be solely attributed to change in interaction geometry, or in other words, tighter interaction between finger and DNA assisted by enlargement of the major groove and unwinding of the DNA helix. The next largest contributions to H come from changes in nonpolar solvation free energy due to zinc finger-dna interactions, -8.0 kcal/mol, and in the van der Waals interaction within the finger, -8.2 kcal/mol, as shown in Table 5. The negative value for H of nonpolar solvation free energy due to zinc finger-dna interactions indicates that more protein and DNA atoms become inaccessible to the solvent in the threefinger complex because of tighter binding. This result is also consistent with the fact that only the three-finger protein induces DNA conformational change. The contribution from van der Waals interaction within the finger is partly due to a sequence effect, which is estimated to be -3.1 to -5.1 kcal/mol, if the same method is used as for estimating the sequence and position effect in the polar interaction between finger and DNA. The remaining contribution of 3-5 kcal/mol may be explained by the fact that less van der Waals contact is lost when the third finger binds because finger 1 and finger 2 make closer contact than finger 2 and finger 3 in the unbound protein simulations. The two largest unfavorable contributions to H come from changes in vdw interaction within DNA, 8.1 kcal/mol, and in the internal energy of DNA, 8.0 kcal/mol, as shown in Table 5. These values can also be explained in terms of conformational change of DNA in the three-finger complex: vdw interaction becomes weaker because of larger distances between pairing nucleotides, and internal strain energy increases in the noncanonical enlarged B-DNA structure. In summary, the cooperativity in Zif268-DNA binding, or the large enhancement of binding affinity by addition of the third finger, may be explained as a consequence of a small but distinctive conformational change of DNA upon binding of the three-finger protein. The enhancement in finger-dna polar

7668 J. Phys. Chem. B, Vol. 114, No. 22, 2010 Lee et al. TABLE 6: The 10 Strongest Attractive Pair Interactions between Zif268 and DNA Estimated from Energy Decomposition of the MM-PBSA Result amino acid-nucleotide pair H(kcal/mol) position a Arg74-dG4-24.3 ( 0.1 finger 3, -1 Arg46-dG7-21.9 ( 0.1 finger 2, -1 Arg24-dG8-20.9 ( 0.1 finger 1, 6 Arg55-dC8-18.5 ( 0.1 - Arg18-dG10-17.7 ( 0.2 finger 1, -1 Arg14-dG7-16.2 ( 0.1 - Arg80-dG2-15.1 ( 0.2 finger 3, 6 Lys79-dC6-14.4 ( 0.1 - Arg42-dG4-13.2 ( 0.1 - Arg3-dG8-11.6 ( 0.3 - a Position of the recognition site of the amino acid expressed as the finger number followed by the recognition site number. Nonspecific sites are marked with bars. Figure 4. A schematic picture of the strongest amino acid-nucleotide interaction for each amino acid at the recognition sites of the three fingers. The amino acids are labeled according to the calculated interaction strength: residues with stronger interaction than -10 kcal/ mol are drawn in a large type, and those with interaction in the -4 to -10 kcal/mol range are drawn in a medium type. Bases are colored in order to distinguish the three base-pair subsites. interaction energy due to tighter binding of finger domains to the wider major groove of DNA overwhelms the small energy cost due to the conformational change of DNA. Pairwise Decomposition of Amino Acid-Nucleotide Interaction Energy: Role of Each Recognition Site. One of the most important questions involved with Cys 2 His 2 zinc finger proteins is the basis of binding specificity. Since the first crystal structure of the Zif268-DNA complex was revealed, 1 a number of plausible suggestions about the roles of specific residues have been made based on atomic contacts observed in the crystal structures. To our knowledge, however, there has been no quantitative study on the contribution of each amino acid residue based on an all-atom physical energy model. Here we ask quantitative questions such as how strong each amino acidnucleotide interaction is and what the atomistic basis of sequence specificity for each recognition site is. We have decomposed the interaction energy in a pairwise manner to evaluate the roles of the recognition sites, -1, 2, 3, and 6 positions of the helix, in each finger of Zif268. A schematic representation of strong interactions between amino acids of the recognition sites and the DNA bases is shown in Figure 4. The 10 strongest amino acid-nucleotide pairs are listed in Table 6. It is obvious that positively charged residues, arginine and lysine, make the strongest attractive interactions. All of the arginine-nucleotide pairs, except for Arg70-dA1, which make direct base or phosphate contacts identified in the high resolution crystal structure 13 were calculated to form strong attractive interactions of <-11 kcal/mol. In particular, the five arginine residues occupying the -1 and 6 positions of zinc finger helices interact with guanines relatively strongly compared to other charged amino acid-nucleotide pairs that are not in the recognition sites, implying that these residues play a key role both in maintaining the complex and in recognizing guanine bases. To understand the atomic basis of these strong interactions, the 10 strongest amino acid-nucleotide interactions, shown in Table 6, were further decomposed into atom-pair interactions, which are grouped into interactions between the following substructures: amino acid side chain-nucleotide, amino acid backbone-nucleotide, nucleotide backbone-amino acid, and nucleotide base-amino acid. The energy components for the amino acid-nucleotide pairs in the recognition sites and those for the other five pairs, displayed in Supporting Information Tables S6 and S7, respectively, show an interesting contrast. The net interaction energy between nucleotide backbone and amino acid is surprisingly weak (>-1 kcal/mol) for the five strongest interactions in the recognition sites, despite the negative charge on the backbone phosphate group and the positive charge on the arginine side chain. The polar solvation free energy cancels the Coulomb energy almost completely in this case. On the other hand, the net interaction for nucleotide base-amino acid in the recognition sites is strong (<-5 kcal/ mol) due to the fact that the solvation energy is insufficient to cancel the Coulomb attraction. All the strong interaction pairs in the nonrecognition sites share the opposite trend, i.e., the nucleotide backbone-amino acid interaction is strong (<-5 kcal/mol), and the nucleotide base-amino acid interaction is weak (>-1 kcal/mol). This analysis confirms that arginine residues in the recognition sites do indeed strongly contribute to recognition of specific guanine bases out of the numerous negative charges on DNA backbone, by appropriately positioning the positive side chains toward the target bases with the help of hydrogen bonds and dipole-dipole interactions. To explain the specificity of Arg-dG recognition in more detail on the atomic basis, atom-pair decomposition results were further examined. Supporting Information Table S8 shows the top 20 attractive and top 20 repulsive atom pair interactions between the Arg18 residue and the dg10 base as an example. The strongest atom pair interaction is the interaction between arginine CZ and guanine N2. Guanine N2 is involved in 18 out of the top 20 attractive interactions. The remaining two interactions are interactions between two hydrogen atoms in the guanidinium group of arginine and O6 and N7 of guanine. According to this analysis, hydrogen bonds contribute less than dipole-dipole interactions, which is more numerous in number. The interaction geometry for Arg18-dG10 is shown in Supporting Information Figure S1. The above observation indicates that guanine N2 is the key atom responsible for Arg-dG interaction. However, in order to explain the Arg-dG specificity properly, the relative strength of the Arg-dG interaction compared to the interaction of Arg with other types of nucleotides must be considered. In the case of da, there is no polar atom at the site equivalent to N2, so the polar interaction is expected to be much weaker. In the case of dc, the following computational mutation experiment was

Cys 2 His 2 Zinc Finger Protein-DNA Interactions J. Phys. Chem. B, Vol. 114, No. 22, 2010 7669 TABLE 7: The Strongest Attractive Nucleotide-Amino Acid Interaction at the Recognition Sites 2 and 3 of Each Finger amino acid-nucleotide pair H(kcal/mol) position a Asp20-dA11-6.9 ( 0.1 finger 1, 2 Glu21-dC9-1.6 ( 0.1 finger 1, 3 Asp48-dC8-8.6 ( 0.1 finger 2, 2 His49-dG6-4.6 ( 0.1 finger 2, 3 Asp76-dA5-6.8 ( 0.1 finger 3, 2 Glu77-dC3-11.5 ( 0.1 finger 3, 3 a Position of the recognition site of the amino acid expressed as the finger number followed by the recognition site number. performed because it is not easy to conclude by simple reasoning when the purine ring is replaced with a pyrimidine ring. The G-C pair in the corresponding position of DNA was mutated to C-G, and the atom pair interaction for Arg-dC was analyzed. The top 20 attractive and the top 20 repulsive atom pair interactions between the Arg18 residue and the dc10, which was mutated from dg10, are shown in Supporting Information Table S9. It can be seen that attractive interactions between the opposite charges (cytosine N4 and atoms in arginine guanidinium group) are stronger than the corresponding interactions in Arg-dG, but the repulsive interactions between the same charges become also stronger, which nearly cancel the gain from the attractive interactions. In addition, cytosine has no polar atoms at the corresponding positions that can make hydrogen bonds with arginine. In the case of dt, the polar interaction is expected to be much weaker because of the less polar base ring. Therefore, it can be concluded that the specific Arg-dG recognition comes from the appropriate size and positioning of the polar atoms in the guanine base that maximizes attractive interactions with arginine. To understand the contributions of the interactions at the other recognition sites, the strengths of the most favorably interacting amino acid-nucleotide pairs at recognition sites 2 and 3 are displayed in Table 7. Aspartate occupies position 2, and glutamate or histidine occupies position 3. The contribution of the acidic, negatively charged amino acids, aspartate and glutamate, in DNA recognition was previously discussed in terms of favorable hydrogen bonds mediated by crystallographic water molecules and electrostatic interactions with nearby positively charged amino acids, based on the high-resolution crystal structure. 13 Our simulation result, given in Table 7, also confirms that the acidic amino acids make a significant contribution to Zif268-DNA binding. In our calculations, explicit water molecules are included in the MD simulation steps, and implicit solvation models, PB or GB, are used in the free energy analysis. The favorable role of the solvent can be clearly recognized from our analysis, as explained below, although there may be an underestimation of the absolute strength of the solvation energy by the implicit solvation models. The most favorable interactions in the recognition sites 2 and 3 are the amino acid side chain-nucleotide and the nucleotide base-amino acid interactions, according to Supporting Information Tables S10 and S11, as in the recognition sites -1 and 6 that involve positively charged amino acids. The repulsive Coulomb interaction between the negatively charged amino acid side chain in position 2 or 3 and the contacting nucleotide is more than compensated for by the solvation energy, resulting in a net attraction. The origin of the favorable nucleotide base-amino acid interaction is a little different between Asp-dA interactions and Asp-dC or Glu-dC interactions. The polar solvation free energy contributes more to the favorable adenine base-amino acid interaction for Asp20-dA11 and Asp76-dA5 pairs, and the Coulomb energy dominates in the cytosine base-amino acid interaction for Asp48-dC8, Glu21-dC9, and Glu77-dC3 pairs. Specific recognition of the Glu-dC pair was identified together with Arg-dG recognition in a recent study based on known binding affinities and statistics on frequencies of amino acid-base contacts present in protein-dna complex structures. 27,56 Our analysis above also shows that the Glu-dC pair has an attractive Coulomb interaction between the nucleotide base and the negatively charged amino acid. To explain the atomic basis of this phenomenon, atom pair interaction energies of the Glu21-dC9 pair were examined as an example, shown in Supporting Information Table S12. A similar analysis for Glu77-dC3 is also shown in Supporting Information Table S13. The strongest attraction for these Glu-dC recognition pairs is the interaction between glutamate CD and cytosine N4. In addition to this pair, cytosine N4 also makes a strong favorable interaction with other glutamate atoms. It is interesting to note that positively charged hydrogen atoms in glutamate forms favorable interactions with negatively charged N4 through solvation energy. The cytosine C4 and carboxyl oxygen atoms, OE1 and OE2, also make strong attractive interactions. The above-mentioned atoms are highly polar, and the geometry in the crystal structure establishes a favorable dipole-dipole interaction, as shown in Supporting Information Figure S2. A similar pattern of interactions is also observed in the Asp-C pair, as displayed in Supporting Information Table S13 and Figure S3. If the cytosine contacting glutamate or aspartate were substituted by thymine, the polar interaction described above would become much weaker because the O4 oxygen, which replaces N4, H41, and H42, has less negative charge than N4. In addition, attractive interactions between positive hydrogen atoms and negative carboxyl oxygen atoms would disappear. To identify the effect of the replacement of cytosine by purine bases, the G-C pair contacting the glutamate in the recognition site was mutated into C-G in the complex at the position of DNA recognized by Glu21. In contrast to the Glu-dC pair, the mutated Glu-dG pair shows net repulsive interaction between the amino acid and nucleotide base. The strongest atompair attraction is between glutamate CD and guanine N2, as shown in Supporting Information Table S15. However, the strength, -18.9 kcal/mol, is much weaker than the strongest interaction for Glu-dC, -25.7 kcal/mol, because of the longer distance between the glutamate CD and guanine N2. (See Supporting Information Figure S4.) If an adenine base were present, the interaction geometry may allow favorable polar interactions involving N6, as evidenced by the existence of Asp-dA recognition pairs. In summary, the specificity of the Glu-dC interaction originates from the favorable Coulomb interaction and solvation energy due to large charges on cytosine N4 and C4, which more than compensates for the repulsive interaction involving the negative phosphate group. Effect of the Ionic Strength on the Binding Affinity of the Zif268-DNA Complex. It would be of interest to examine the effect of the ionic strength on the binding affinity of the zincfinger protein-dna complex because the polar electrostatic interaction is the key energy component that influences the binding cooperativity and specificity, as discussed above. We have calculated the binding affinity of Zif268-DNA under different ionic strengths, and the results are summarized in Table 8. Only the PB calculations were repeated with the different