Coarse-graining with the relative entropy. M. Scott Shell * Department of Chemical Engineering, University of California Santa Barbara

Size: px

Start display at page:

Download "Coarse-graining with the relative entropy. M. Scott Shell * Department of Chemical Engineering, University of California Santa Barbara"

Darren Williams
6 years ago
Views:

1 Coarse-graining with the relative entropy M. Scott Shell * Department of Chemical Engineering, University of California Santa Barbara Santa Barbara, CA *shell@engineering.ucsb.edu December 20, 2015 SUMMARY We discuss a general approach to multiscale modeling based on the idea of variational minimization of the information loss upon coarse-graining, which is naturally measured by an equilibrium, thermodynamic quantity called the relative entropy. This formalism offers a particularly broad statistical-mechanical framework that can be used to design both theoretical (analytical) and computational (simulation) models of complex systems. We provide several connections between the relative entropy and general properties of coarse-grained models that offer distinct interpretations of its relevance and role in the multiscale picture. We then discuss conceptual, theoretical, and numerical aspects of relative entropy minimization, and illustrate its use in case studies spanning water, proteins & peptides, and liquid state dynamics. We also discuss recent extensions of the relative entropy approach that offer new ways to select degrees of freedom in coarse grained molecular models and to parameterize behavior in nonequilibrium, dynamic settings.

2 I. INTRODUCTION Recent years have seen the emergence of many different multiscale modeling techniques and methodologies that seek, through the development of accurate coarse-grained (CG) models, to expand the application of molecular simulation techniques to increasingly larger and more complex systems [1 6]. The so-called bottom-up approach, in which CG models are parameterized on the basis of more detailed models, has seen particular interest due to the attractiveness of firstprinciples-type predictions extending from quantum chemical or classical all-atom physics, without the need for macroscopic empirical or constitutive relations. Indeed a variety of techniques now exist for generating coarse particle-based models for simulation purposes, including the Boltzmann Inversion, Iterative Boltzmann Inversion [7 9], Reverse Monte Carlo [10], and Force Matching (or Multiscale Coarse-Graining ) approaches [11 14]. An excellent review of these methods and more broadly of the field was recently published by Noid [15], to which the reader is enthusiastically referred. Beyond predictive and practical reasons for coarse-graining, another growing motivation in this area has been the development and understanding of fundamental statistical mechanical theory associated with the coarse-graining process. Indeed, coarse-graining can be viewed in terms of ideas that have long held preeminent roles in molecular thermodynamics, such as reaction coordinates, free energy landscapes, microstates versus macrostates, and integrating out various degrees of freedom. This review details efforts from our group that have sought a basic theoretical coarse-graining thermodynamic framework along such lines [16 25]. The central feature of our approach has been a quantity called the relative entropy that describes the quality of a CG model that attempts to reproduce the properties of a given all-atom (AA) or otherwise higher resolution one. At the simplest level, it is given by S rel = AA (i) ln AA (i) CG (i) i (1)

3 Here, AA (i) and CG (i) give the ensemble, equilibrium probabilities for configuration i in the corresponding all-atom and coarse-grained systems, and the sum is performed over all molecular microstates. For systems with continuous degrees of freedom (e.g., atomic positions r), this expression becomes S rel = AA (r) ln AA (r) dr (2) CG (r) One immediately notices that the relative entropy expression resembles the usual Gibbsian formula for the thermal entropy, S = k B i (i) ln (i), except that it compares probabilities between two systems. Both quantities measure a kind of information loss. Gibb s entropy measures how much is unknown about the molecular state of the system from a macroscopic perspective, i.e., there are effectively Ω degenerate microstates, with S = k B ln Ω. In contrast, the relative entropy measures the information loss in moving from an all-atom to a CG resolution with a simpler representation and potentially fewer degrees of freedom. Our hypothesis has been that the relative entropy provides a natural and potentially universal way to score the quality of a CG model in terms of the information loss incurred upon coarse-graining. Indeed, the relative entropy has long been studied in statistics and information theory, where it is known as the Kullback-Leibler divergence [26], as a way to quantify the relevance of one model or probability distribution to another. In the context here, higher values of S rel indicate a poorer CG model (greater information loss), while lower ones describe increasingly accurate representation of the AA ensemble. The minimum value of the relative entropy is zero, in which case the CG model perfectly recapitulates the probability of every configuration and thus captures averages and distributions of every structural property in the original system. While the relative entropy is well-known in statistical science, historically it has played a smaller role in molecular modeling, although there are several notable early applications in this area. For example, Cilloco proposed a method for determining pair potentials in condensed-phase matter by minimizing a relative entropy associated with the pair correlation function, as an

4 approach to recover effective molecular interactions from scattering data [27]. Hummer et al. used the relative entropy concept to parameterize a statistical mechanical model of hydrophobic solvation in water [28,29]. Wu and Kofke examined the convergence, bias, and asymmetry of simulation free energy methods and found the relative entropy to be a robust indicator of statistical errors [30,31]. Quan showed that the relative entropy is associated with the free energy rise upon perturbing a system away from equilibrium [32,33]. When taken together, these early efforts point to a much broader role for the relative entropy in both modeling and statistical thermodynamics. Indeed, recent years have uncovered deep connections of the relative entropy to many established theoretical concepts and multiscale simulation algorithms. Work from our own group has used the relative entropy to guide coarse-graining algorithms that parametrize the force fields of CG models of water [16,17,23,24] and peptides [21,25], as well as to optimize theoretical models of protein folding [20] and liquid state dynamics [22]. Others have similarly used relative entropy strategies to optimize various kinds of CG models, including single-site models for confined water [34], models for bulk water that include explicit tetrahedral interactions [35], implicit and polarizable explicit potentials for aqueous solutions [36,37], theoretically-motivated models of polymer melts [38], cluster-expansions for binary alloys [39], and mean-field-motivated models [40]. Indeed, several general relative entropy coarse-graining algorithms have now been proposed [41,42] and one has been incorporated into the publically-available coarse-graining suite VOTCA [43]. More fundamentally, the relative entropy has emerged as an important concept in nonequilibrium thermodynamics [32,44 47], and some have argued that it may provide a more natural starting point for statistical thermodynamics (at and beyond equilibrium) than the usual Gibbs-Jaynes entropy [48]. Efforts have also sought to extend relative entropy concepts to the time domain; it has it has already been used to test the quality of reaction coordinates [49], to coarsen Markov state models [50,51], and to measure the general fitness of coarsened dynamic equations [52 54].

5 In this chapter, we describe our work that has sought to use information loss concepts and the relative entropy as a thermodynamic framework for multiscale modeling, with a particular focus on the bottom-up development of CG molecular models. We first give a number of results that show the relevance of the relative entropy to established thermodynamic ideas and that support its role in measuring the fidelity of CG models. Because Eqn. (2) compares AA and CG models with identical degrees of freedom r, we also discuss the important issue of how to make the relative entropy appropriate to coarse models with fewer degrees of freedom. We then describe methodologies for optimizing CG force fields in practical coarse-graining settings, as well as selected case studies that demonstrate such efforts. Finally, we comment on future directions for relative-entropy-based coarse graining that may broaden its role in the development of multiscale models. II. II.1 FUNDAMENTALS Basics and notation of the coarse-graining problem The bottom-up paradigm seeks to use various first principles type physiochemical molecular models (e.g., ab inito or classical molecular dynamics) to generate coarse-grained (CG) models with reduced numbers of degrees of freedom. Let us lay out some preliminaries for such techniques. We presume that a CG model is developed to mimic the behavior of a classical all-atom (AA) system, and that both models are off-lattice in nature with n and N n sites, respectively. However, for what follows the notations CG and AA could easily indicate any pair of models at low and high resolutions, respectively, regardless of their absolute scales or details. For the relative entropy technique, there is also no requirement that either or both models be off-lattice. We do require that the CG and AA models have well-defined equilibrium microstate ensembles. Let the interaction potential for the AA system be U AA, and that for the CG one U CG. In the canonical ensemble at constant temperature T, the configurational distributions follow

6 AA (r) = V n exp[βa AA βu AA (r)] CG (R) = V N exp[βa CG βu CG (R)] where β = 1 k B T as usual, r and R give the 3n and 3N atomic coordinate vectors of the AA and CG systems, respectively, and the A i give the associated Helmholtz free energies of the ensembles: (3) e βa AA = V n e βu AA(r) dr e βa CG = V N e βu CG(R) dr (4) In both Eqns. (3) and (4), the factors related to the system volume V are included to ensure dimensional consistency, such that the A i represent excess Helmholtz free energies. It should also be noted that the canonical ensemble is a convenient choice for coarse-graining, but does not represent the only possibility; indeed, algorithms have been formulated in other ensembles [55] and the relative entropy framework described later has no specific ensemble requirement. all-atom system coarse-grained hydrogens coarse-grained functional groups coarse-grained amino acid residues Figure 1: Several distinct coarse-graining approaches for the polyalanine tripeptide. Each representation demonstrates how atoms might be grouped into effective CG pseudoatom sites. The number and assignment of atoms to sites constitutes the specification of the mapping function, M.

7 In common practice, the CG model consists of sites or pseudoatoms that represent groups of atoms in the AA system. These sites may be defined by a common chemistry, like methyl or carbonyl groups, or they could contain many functional units, as is common in polymer modeling. Indeed there are many different choices for pseudoatom resolution and composition, and this constitutes an important part of the coarse-graining problem itself. Figure 1 shows some examples for polyalanine. It should be noted that a CG model need not have a lower resolution than the AA one; the CG system may be coarse in the sense that it uses a simpler or smoother force field, even if its degrees of freedom number the same. The development of a pseudoatom description of an AA system requires the definition of a mapping function M that translates a set of 3n atomic coordinates r to a unique CG configuration R, R = M(r) (5) Notice that there may be more than one atomistic configuration mapping to any given coarsegrained one since the number of AA degrees of freedom and size of its microstate space typically exceed those of the CG model. Commonly, pseudoatoms sites are defined as center-of-mass coordinates of groups of atoms in the all-atom representation as R I = ( m i r i ) ( m i ) i atoms for I i atoms for I 1 (6) where the sums are performed over all AA sites i contained with a CG pseudoatom I, and where m i gives the mass of AA site i. The center-of-mass prescription is justified, and perhaps even required, for at least two reasons [13,15]: it ensures that the CG system behaves normally with uniform translations in space, and it gives consistent AA and CG momentum distributions. Eqn. (6) gives rise to CG coordinates that can always be expressed as linear combinations of atomic ones, such that mapping function more specifically is a mapping matrix, with R = Mr and in which the matrix M has dimensions (3N, 3n). Note that there is no requirement that every atom in the AA system be

8 mapped to a pseudoatom in the CG model; in fact, one might explicitly want to omit AA degrees of freedom that are physically irrelevant to large-scale behavior. To then develop the details of a CG model namely, to specify the potential function U CG most modern efforts in bottom-up coarse-graining begin with a probabilistic consistency criterion [13]. The argument is that a perfect CG model, insofar as statistical thermodynamics is concerned, will exactly replicate the multidimensional potential of mean force (PMF) of the AA system along the CG degrees of freedom. The PMF results from a partial integration of the AA partition function to project the more detailed configuration space r onto the coarse coordinates R. In the canonical ensemble, the all-atom PMF is given by W(R) = k B T ln [V N n e βu AA(r) δ[m(r) R]dr] (7) where again the factors of V ensure dimensional consistency, but otherwise play no special role. The projected AA configurational distribution is given by where as usual AA (R) = V N n exp[βa AA βw(r)] (8) A AA = k B T ln [V N e βw(r) dr] = k B T ln [V n e βuaa(r) dr] (9) In principle, W depends on both temperature and the system volume through the integration limits, and is formally treated as a free, not a potential, energy. It also tends to be highly multibody in nature due to the complex coupling of the remaining degrees of freedom to those that are integrated out. The PMF provides the ideal CG force field U CG because it guarantees that the coarse model will sample the same configurational distribution, namely, U CG (R) W(R) implies CG (R) AA (R) (10) Note that additive shifts in U CG have no effect on the success of the coarse-graining procedure, as they are removed in the normalization of the configurational probability distributions. In the remainder of this chapter, we therefore ignore any constant terms that might contribute to U CG.

9 In almost all cases, Eqn. (10) it not practically realizable because W is by nature a highly multibody interaction that is not well described by pair or otherwise computationally efficient force field terms. Thus most coarse-graining methods instead focus on developing approximations to W(R), with differences pertaining to the nature of the assumptions and closures. A related complication is the issue of transferability; the PMF is state-dependent and simple CG potentials at one set of conditions are not generally optimal at another. Here we do not discuss at length either the issue of multibody PMF interactions or transferability, but we note that many in the field have studied these issues [56 67]. II.2 Development and interpretation of the relative entropy The relative entropy offers a powerful tool in bottom-up coarse-graining by providing a universal way to score or quantify the appropriateness of a putative CG model in capturing the equilibrium behavior of a given AA system. In this perspective, better CG models always have lower values of S rel, with perfect models at a value of zero, such that minimization of the relative entropy gives a systematic CG-model-optimization strategy. Before discussing the details of such tasks, we first provide several ways to understand the significance of the relative entropy in the bottom-up context and its relevance to model scoring. For the purposes of illustration, we describe four distinct interpretations of S rel that emerge for cases in which the AA and CG systems have exactly the same degrees of freedom and microstates, which are discrete. In the next section, we extend this analysis to address off-lattice situations and CG models with reduced degrees of freedom. One approach to the development of the relative entropy stems from a log-likelihood comparison of the AA and CG models. Consider the following thought experiment: the CG model is probed by drawing n random configurations i from it, according to the microstate ensemble distribution CG (i). Let n(i) represent the number of times that configuration i is picked, with

10 i n(i) = n. On average for large n, one expects n(i) = n CG (i). However, a more precise analysis characterizes the likelihood of any measured distribution n(i), given by the multinomial L = n! CG (i)n(i) n(i)! i ( n n(i) n(i) ) CG (i) n(i) i in which the second line uses Stirling s approximation. If the CG model is a good representation of the AA one, we would expect a high likelihood that the frequencies of each configuration i drawn from the CG ensemble would mimic the average predicted by the AA ensemble. The log likelihood that n(i) = n AA (i) is then (11) ln L = n AA (i) ln CG (i) = ns AA (i) rel (12) i We see that the log likelihood decreases linearly with the number of test configurations n, but the relative entropy measures the rate at which it falls off and serves as an intrinsic measure of the relevance of the CG model. In an information theoretic perspective, S rel is a measure of the information loss when representing data described by AA with the model CG (e.g., in bits if the logarithm is base-two). It is also related to the additional bits that would be needed to store information about the AA distribution with a compression code optimized for the CG one. A simple one-dimensional illustration is provided in Figure 2.

11 P(x) P(x) all-atom S rel = all-atom S rel = coarsegrained coarsegrained x x Figure 2: One-dimensional illustration of the relationship of the relative entropy to configurational probability distributions. Both the all-atom and coarse-grained systems have a single degree of freedom x in this case, and S rel measures the overlap of the distributions (x). In reality, the distributions are highly multidimensional such that the number of x axes is proportional to the number of sites. This approach also demonstrates that the relative entropy is strictly zero or positive. The likelihood is one if and only if S rel = 0 and CG (i) = AA (i) for all i, i.e., if the CG model perfectly reproduces the AA one. Otherwise, it is always asymptotically zero in the n limit. This property ultimately follows from Jensen s inequality and the concavity of the logarithm, but a simpler argument uses the identity ln x x 1 to show that S rel = AA (i) ln CG (i) AA (i) i AA (i) [ CG (i) AA (i) 1] = CG (i) AA (i) = 0 i i A second approach to the relative entropy connects it to a nonequilibrium free energy [32]. Here we describe this perspective in the canonical ensemble such that the equilibrium distribution for a system of interest is given by the usual Boltzmann expression: (13) (i) = e βe(i) e βe(j) j = e βa βe(i) (14)

12 in which A is the ensemble free energy. Consider the case that the system is prepared in a nonequilibrium ensemble described by an arbitrary set of probabilities NE that does not follow (14). The average energy and entropy of the nonequilibrium ensemble can be expressed as E NE = E(i) NE (i) = A k B T NE (i) ln (i) In turn, the nonequilibrium free energy follows i S NE = k B NE (i) ln NE (i) i A NE = E NE TS NE = A + k B T NE (i) ln NE (i) (i) i i (15) (16) (17) Rearranging the last of these expressions gives the instructive relation A NE A k B T = NE (i) ln NE (i) (i) i = S rel (18) In other words, the relative entropy measures the additional decrease in free energy, in thermal energy units, that the system obtains as it approaches its true equilibrium state. Equivalently, this gives the minimum dimensionless thermodynamic work that would need to be applied to the system to force it to adopt the ensemble NE. In the context of coarse-graining, the system of interest is the CG one and the nonequilibrium state is given by the AA ensemble, such that the relative entropy measures the minimum work required to make the CG model perform exactly. Similar ideas have been explored in greater rigor by Sivak and Crooks, and the reader is encouraged to consult these excellent papers [46,47]. A more practical interpretation of the relative entropy involves an analogy with simulation reweighting protocols. Imagine that a molecular dynamics simulation of a system with potential energy function U 0 is performed and its trajectory recorded. Of course, simple averages of a simulation configurational property X can be computed using X 0 = 1 n X i frames i (19)

13 where n is the number of trajectory frames and X i gives the value of X for frame i. If each frame is uncorrelated, then one expects the statistical error in X to scale as n 1 2. In reweighting, it is desirable to examine the behavior of such averages with small perturbations to the potential. Classic thermodynamic perturbation theory shows that one can perform a reweighted average from the original simulation to a new potential U 1 : X 1 = w i X i (20) frames i Here the sum is performed over the original trajectory (state 0) and the weight of the ith snapshot in state 1 is determined by w i = e β(u 1,i U 0,i ) e β(u 1,j U 0,j ) j = 1 (i) 0 (i) 1 (j) 0 (j) j = e βδu i e βδu j j where ΔU i gives the change in energy for configuration i moving from state 0 to 1. The second line provides a more formal expression in terms of ensemble probabilities. Clearly, w i = 1 n if U 1 = U 0 and each frame contributes equally to the average. On the contrary, as U 1 deviates, some frames begin to make significantly more contributions at the expense of others. Eventually, very few frames will have nonzero w i and influence the average. At this point, statistical errors in the reweighted average become very large as the average is dominated by a few rare configurations that do not reflect a well-sampled distribution. This practical limit on the reweighting approach is well known in simulations [68]. To quantify the success of the reweighting procedure, one can define an effective number of trajectory frames that make a contribution to the reweighting average. A natural way to calculate n eff is (21) ln n eff = w i ln w i frames i (22)

14 Note that Eqn. (22) naturally possesses the no-reweighting limit in that n eff n when ΔU 0 and w i 1 n for all i. Also note that n eff 1 when ΔU is large and w i becomes nonzero for only a single configuration. A statistical efficiency in the reweighting problem can then be defined as the fraction of the original trajectory that contributes to the perturbed average, n eff n. Using (22), the logarithm of this fraction is ln n eff n = w i ln(nw i ) frames i = w i ln 1 (i) 0 (i) frames i + ln 1 1 (i) n 0 (i) frames i Eqns. (19) and (20) allow us to interpret the sums as ensemble averages (i.e., the expected behavior over all such reweighting efforts), giving finally (23) ln n eff n = ln 1 + ln = 1 (i) i ln 1 (i) 0 (i) = S rel Note that the sum no longer corresponds to trajectory frames but to the entire configurational (24) ensemble (i.e., all configurations of the system, each appearing once). Eqn. (24) shows that the reweighting statistical efficiency is directly related to the relative entropy; it decreases as S rel grows. In the context of coarse-graining, the AA model can be considered to be the reweighted ensemble, while the CG one gives the reference trajectory that is to be perturbed. Thus, S rel measures the statistical difficulty with which the CG model can be reweighted back to the underlying AA system and used to predict its properties. Wu and Kofke first proposed the connection between the relative entropy and errors in simulation reweighting and free energy calculations [30,31]; in particular, their detailed tests on particle insertion/deletion methods show that the relative entropy well-predicts errors in computed chemical potentials. Interestingly, very similar ideas involving the relative entropy have been applied to the relationship

15 between forward and reverse work distributions in nonequilibrium scenarios and their connections to underlying equilibrium free energies [44,45]. A final interpretation shows that the relative entropy can be related to errors due to the CG model, in terms of averages and macroscopic observables. Define the error in an observable X as ε X = AA (i)x(i) CG (i)x(i) = X AA X CG (25) i i where X(i) gives the value of X for configuration i in the ensemble and is measurable in both the CG and AA systems. Under a few general conditions on the nature of X (e.g., that it be bounded), the socalled Csiszar-Kullback-Pinsker inequality then restricts the error to lie within the bounds ε X c X 2S rel (26) where c X is an X-dependent constant. Rather remarkably, the bound in (26) is general and applies to a very broad range of observables X. Unfortunately the bound is weak as the actual value of c X can be large. More recently, however, Dupis et al. proposed a new bound, based on a novel variational derivation, that is much tighter [69]. This bound is asymptotically valid in the limit S rel 0 and is given by ε X var CG (X) 2S rel = X 2 CG X 2 CG 2S rel (27) where it is seen that the constant of proportionality can be related to the variance of X in the CG ensemble. In summary, at least four distinct interpretations of the relative entropy illustrate its role in a coarse-graining context: (1) it gives the likelihood that a CG model will reproduce the correct AA configurational distribution; (2) it measures the non-dimensionalized minimum thermodynamic work that would need to be applied to the CG model in order to shift configurational populations to the correct AA distribution; (3) it predicts the statistical efficiency or practical difficulty required to reweight sampled CG trajectories to the AA system; and, (4) it bounds the errors in observables computed through averages in the CG ensemble. Certainly there may be more interpretations

16 beyond these, but the broad role that it plays in these selected examples suggests a fundamental metric of model fidelity in terms of statistical thermodynamic properties. II.3 Application: quantifying errors in theoretical models with the relative entropy Several of our early applications of relative entropy theory were to simple theoretical models as instructive case studies. For example, we studied the behavior of S rel for coarsegraining the classic 2D lattice gas from explicit nearest neighbor interactions to the case in which this system is described by a mean-field site occupancy [18,19]. Here, the system exists in the grand ensemble at constant chemical potential μ and temperature T, such that the particle number n and hence site occupancy fluctuates. While there is no particle coarse-graining in this case (the degrees of freedom remain the same), the relative entropy quantifies the effectiveness of the mean field as a thermodynamic closure. We considered both cases in which the mean occupancy is determined by the usual self-consistency criterion and in which it is determined by explicit minimization of S rel at each state point, which provides a lower bound for any mean field treatment of this model. Figure 3 shows the behavior of the relative entropy in (T, μ) state space. As one might expect, it shrinks as the temperature increases and correlations weaken, where one intuitively would expect the mean-field closure to be a good approximation. Interestingly, however, S rel also decreases at lower temperatures such that it is most pronounced in an intermediate temperature range. In fact, this behavior signals the onset of critical correlations such that S rel consistently grows as one approaches the critical temperature and critical chemical potential; in fact it nearly diverges as the system approaches the critical temperature along the μ = μ c isoline. In this case, therefore, the relative entropy signals the well-known failure of the mean-field description in capturing behavior near the critical point. A second interesting result from this lattice gas study is that S rel quantitatively predicts mean-field errors in thermodynamic quantities over a broad range of state space; namely, it

17 strongly correlates with the difference between the original and mean-field system of particle number fluctuations. It is even possible to develop analytical scaling laws for the magnitude of these errors using modest statistical-mechanical approximations; one finds that errors in the nearest-neighbor correlation function and in the global particle number variance are proportional to TS rel to first order. Explicit calculation of the errors using simulations supports the analytical results, as shown in Figure 4. Thus in this particular case, it is possible to show that smaller values of the relative entropy directly indicate reduced errors in thermodynamic properties of interest μ-μ c β c s rel 0.08 β β Fig. 3: The relative entropy as a function of inverse temperature for coarse-graining the lattice gas from explicit nearest-neighbor interactions to a mean-field representation. Here, the system is maintained at constant temperature and chemical potential μ. As the system approaches its critical point, characterized by the critical chemical potential (μ c ) and inverse temperature (β c ), the per-site relative entropy increases sharply, a reflection of the classic failure of the mean-field closure in the critical regime. The arrow indicates increasing μ μ c, and two curves are shown for each case. The dashed curves show cases in which the mean field is determined by the usual variational or self-consistency closure, whereas the solid curves determine the field through explicit relative entropy minimization and thus reflect the lowest possible values for any mean field representation. Reproduced with permission from [18]. Copyright 2010, American Physical Society.

18 [<n 0 n 1 >-<n> 2 ] err [<n 2 >-<n> 2 ] err μ-μ c s rel /β s rel /β Figure 4: Errors in particle number fluctuations for the lattice gas described in Figure 3. Here the error is defined as the difference in the variance in particle number (at a given β and μ) between the original lattice gas with explicit interactions and the version described by a mean-field closure. The top panel shows local errors for the variance in the number of nearest-neighbor particles, while the bottom panel gives the error in global fluctuations over the total particle number. The dotted lines represent analytic relations that can be developed using simple perturbation and other statistical-mechanical approximations, as described in [18]. Reproduced with permission from [18]. Copyright 2010, American Physical Society. In a distinct study, we used the relative entropy to extract analytical protein folding landscapes from detailed structure predictions [20]; this in turn helped identify which structures were most likely to be native, without any knowledge of the true folds. For each amino acid sequence in a blind test of over 80 proteins, we collected about 250 structure predictions from various webservers and evaluated the average energies of those structures with short molecular dynamics simulations using an all-atom force field with an implicit solvent. Our hypothesis was that the most native-like structures are likely to sit close to a global energy minimum, such that movements away from it in the configurational landscape, on average, increase the energy. Therefore, for each protein structure prediction, we fit its local energy landscape namely, the energies of nearby structures and their relative configuration-space distances to a simple

19 potential energy U analytical funnel model characterized by a depth, steepness, and random energy ruggedness (Figure 5). The relative entropy then provided a way to measure how funnel-like each prediction appeared, and the results showed that S rel had a much stronger correlation with near-nativeness than either minimized or average energies, for the majority of protein test cases. In some cases, the minimum-s rel structure was indeed the best out of the entire ~250 predictions for a given protein. The success of this approach likely originates in the ability of the relative entropy model fit to synthesize distance-energy relationships among all structures, by projection onto a coarse landscape, in contrast to ranking each by its single-structure energy alone. In this case, coarsening all-atom structures into a highly simplified and functionally-constrained analytical model enabled one to filter out ruggedness and other landscape features not of interest. min E 0 configuration space distance D configuration space distance D Figure 5: Parameterization of an analytical protein folding funnel model using an ensemble of protein structure predictions. The energies of a large number of structure predictions for a particular amino acid sequence lie on a rugged energy landscape (left). A much simpler landscape model describes a funnel-shaped energy distribution with parameters that capture the minimum energy and structure, slope, and ruggedness in terms of a random energy model (middle). Each protein structure is considered in turn as a putative location of the funnel minimum, and then the theoretical model is fit to relative energies and distances of other structures. In turn, the value of S rel for each structure provides a measure of how well the funnel model

20 describes its local energy landscape. The relative entropy (right bottom) has a stronger correlation with distance to the native structure than energetic measures (right top and right middle), and is effective as a scoring and selection tool in protein structure prediction. Reproduced with permission from [20]. Copyright 2011, Biophysical Society. III. III.1 CLASSICAL COARSE-GRAINED SYSTEMS The relative entropy upon removing degrees of freedom The interpretations in Section II have yet to consider real coarse-graining, in which the CG model has fewer degrees of freedom than the AA system and thus the configurational probability distributions exist in spaces of different dimensionalities. The subtleties associated with this scenario are best addressed in the off-lattice case, in which a classical system of n atoms and 3n configurational coordinates r is coarse-grained to one of N atoms with 3N positions R. The continuous probability densities AA (r) and CG (R) then necessarily have dimensions of V n and V N where V is the system volume. If n = N, no special treatment for the relative entropy is needed and the integral version in Eqn. (2) is appropriate. On the other hand, if n > N then a common configuration space is needed to compare the two configurational distributions. The simplest and most obvious choice is the CG space R, which both models have in common. This requires a projection of the AA probabilities per AA (R) = AA (r)δ[r M(r)]dr (28) Here, the delta function filters for atomic configurations mapping to the same CG configuration R = M(r). In effect, probabilities are integrated along n N dimensional hypervolumes, each corresponding to a unique CG configuration. Then, the relative entropy takes the form S rel = AA (R) ln AA (R) dr (29) CG (R) This form clearly allows for comparison of the two models when they contain different degrees of freedom. Indeed, note that the choice U CG (R) = W AA (R) in the canonical ensemble demands that

21 S rel = 0 through Eqn. (10), which is consistent with the notion that the PMF gives the ideal CG potential. Thus, it is easy to see that absolute minimization of the relative entropy to a value of zero will demand that the CG force field be equal to the PMF. It is possible and quite informative to re-express the integral of (29) in terms of the AA configuration space. Inserting Eqn. (28), S rel = AA (r)δ[r M(r)] ln AA (R) CG (R) drdr = AA (r)δ[r M(r)] ln AA(M(r)) CG (M(r)) drdr = AA (r) ln AA(M(r)) CG (M(r)) dr (30) = AA (r) ln AA (r) CG (r) dr The last line involved the introduction of a new quantity CG (r) to convert distributions inside the logarithm in R-space to distributions in r. Specifically, we defined AA (r) CG (r) CG (R) AA (r )δ[r M(r with R = M(r) )]dr = CG (R) AA (r) AA (R) (31) = CG (R) (r; R) The interpretation of CG (r) is that it give the probability for atomic configuration r as predicted by the coarse model. Since the CG system lacks all of the degrees of freedom of the more detailed AA configuration space, an assumption must be made about how it predicts more detailed configurations r. Eqn. (31) shows naturally that CG (r) should scale with the probability of the corresponding coarse-grained configuration CG (R). Within CG configuration R, however, there are many AA sub-states; the factor on the RHS of (31) shows that the probability of each sub-state is modulated by the original population in the AA ensemble. Another way of expressing the relationship is shown in the last line, using a bridge function (r; R) that describes how AA configurations are distributed within CG microstates.

22 In our early work, we proposed an alternative definition for (r; R) that led to a slightly different form for the relative entropy, which here we call S rel. This was motivated by the desire to remove any AA properties from CG (r) so that it is entirely specified by the CG model. In this case, one assumes an alternate form for (r; R) such that any configuration within the AA subspace of R is equally likely, rather than proportional to AA (r). The alternative function follows (r; R) = 1 Ω map (R) and CG (r) = CG (R) Ω map (R) with R = M(r) (32) where Ω map (R) = 1 δ[r M(r)]dr Vn N (33) Here Ω map (R) gives the volume of AA configuration space mapping to the same CG microstate. In simple terms, it measures the degeneracy of the mapping: the number of AA states that map to the same CG one. Formally Ω map is an R-dependent quantity. However, one might anticipate for systems with periodic boundary conditions that the integral is the same for every coarse configuration, and equal to Ω map = V n N, as illustrated in the last line above. We make this assumption henceforward, although it is important to note that complexities may arise for nonperiodic systems, as discussed by Rudzinski and Noid [70]. By adopting the ansatz of Eqn. (32), the alternative relative entropy S rel becomes S rel With rearrangement, we arrive at the key result that = AA (r) ln AA (r)ω map(m(r)) dr (34) CG (M(r)) S rel = AA (R) ln AA (R) CG (R) dr + AA (r) ln AA (r)ω map(m(r)) dr AA (M(r)) (35) = S rel + S map where S rel gives the first convention for the relative entropy (Eqn. (29)), and S map is a new term that accounts for its difference with S rel. In fact, S map can be interpreted as a special kind of relative

23 entropy associated with mapping the AA distribution not to any particular CG model, but to a version of the AA ensemble itself that is filtered through the lens of the coarser degrees of freedom R. Namely, we can express the last term in (35) as S map = AA (r) ln AA (r) dr (36) AA (r) Here, AA represents a coarsened or pixelated AA distribution, and is given by AA (r) = AA (R) Ω map (R) = AA (r )δ[r M(r )]dr δ[r M(r )]dr with R = M(r) (37) where the effect of Ω map in the denominator is to redistribute the integrated probability for state R equally over all component AA configurations r. Eqn. (36) thus shows that the mapping entropy, and hence the difference between S rel and S rel, compares the true AA distribution to one in which the probabilities are smeared out and reapportioned equally to all detailed configurations r in each coarse configuration R. As a result, it quantifies the effect of the mapping alone on information loss, without further regard to the particulars of the CG model, such as its force field. One interesting feature is that S rel vanishes when the CG force field becomes ideal, U CG = W, regardless of the mapping. On the other hand, S map and hence S rel generally remain positive in such cases, and presumably measure the quality of the mapping itself. These ideas and relationships were originally identified and discussed by Rudzinski and Noid [70]. It is worthwhile noting that the S map as defined in Eqn. (36) differs by an additive geometric factor from their work, and it is also distinct and more general than our earlier definition of the mapping entropy [16].

24 Figure 6: Illustration of the mapping volume Ω map. When reducing degrees of freedom, coarse-grained configurations R are degenerate in the all-atom space r, and the potential of mean force W integrates over these degeneracies. The volume of AA configuration space that maps to the same CG configuration is Ω map. In general, this quantity is dependent on the particular coarse-grained configuration in non-periodic systems, but in periodic ones it should be equal to the configuration-independent quantity V n N. Both S rel and S rel provide suitable starting points for measuring the fitness of off-lattice, reduced degree of freedom CG systems. Which is the most appropriate? It turns out that the choice may be immaterial in practical settings. The difference between the two, S map, depends only on the choice of the mapping, that is, the degrees of freedom and resolution of the CG model; it is entirely independent of the CG energy function U CG. Because the primary objective of many applications is to determine effective CG potentials for pre-determined CG model architectures, both S rel and S rel will then equally discriminate between different candidate parameterizations; minimization of either will return identical results because S map will remain constant. On the other hand, the relative entropy provides useful information as to the effect of resolution itself. In this case, it is may be most natural to examine S map directly, which characterizes the effectiveness of projecting the AA model into various CG mappings/representations, regardless of the details of their interaction potentials. That issue is considered in greater detail below.

25 III.2 The relative entropy in the canonical ensemble While the relative entropy formalism can be used in any thermodynamic ensemble, the canonical one prescribes specific forms that are particularly useful in analysis and practical applications. In this case, the configurational distributions are governed by the usual Boltzmann expressions of Eqn. (3) and substitution into Eqn. (29) gives S rel = β U CG U AA AA β(a CG A AA ) S map S rel = β U CG U AA AA β(a CG A AA ) Importantly, the relative entropy involves both an average energy difference, which is taken in the AA ensemble, and the free energy difference of the two systems. Here the average CG energy, similar to other expectation CG quantities in the AA ensemble, follows the form (38) U CG AA = AA (r)u CG (M(r))dr = AA (R)U CG (R)dR (39) Note that the excess free energy difference is given by β(a CG A AA ) = ln [ V N e βu CG(R) dr V n e βu AA(r) dr ] (40) One can simplify Eqn. (40) further using free energy perturbation: V N e βu CG(R) dr V n e βu AA(r) dr = e β[ucg(r) UAA(r)] e βuaa(r) dr e βu AA(r) dr = e β(u CG U AA ) AA (41) In combination with Eqn. (38), we then obtain the important result S rel = β U CG U AA AA + ln e β(u CG U AA ) AA = ln e Δ Δ AA AA (42) in which we have defined a dimensionless potential energy difference between the two models that depends on the AA configuration, Δ(r) β[u AA (r) U CG (R)] with R = M(r) (43) Eqn. (42) shows that the relative entropy measures differences in the potential energy landscapes of the AA and CG models. Additive shifts in either energy function are removed by the quantity

26 Δ AA, which is consistent with the insensitivity of configurational probabilities to the zero of the energy. On first look, Eqn. (42) seems particularly appealing for practical calculation of the relative entropy since it involves a simple average. One would simply generate a reference trajectory from the AA system and reprocess it to calculate CG energies, using calculated Δ values to evaluate S rel. In reality, this approach is impeded by averaging of the exponential, which introduces significant statistical errors because rare, large values of Δ that are not frequently sampled in the reference trajectory make substantial contributions to the average: S rel = ln AA (Δ)e Δ Δ AAdΔ with Δ AA = AA (Δ)ΔdΔ (44) In the LHS, AA (Δ) gives the distribution of Δ values as sampled from the AA ensemble. The value of the integrand that contributes most to S rel satisfies d ln AA (Δ) dδ = 1, which clearly is offset from the most probable values of Δ near the distribution peak where this derivative vanishes. To a first approximation, one can show that the expected error in the relative entropy as calculated by Eqn. (42) scales exponentially with its own value: ε Srel ~e S rel (45) Eqn. (42) also introduces a substantial bias due to the asymmetric nature of the integral in Δ. Thus, its averaging approach is likely only to be accurate when Δ values are small, S rel is close to zero, and the AA and CG models are similar. In such cases, one might expect that AA (Δ) is Gaussiandistributed with variance σ. Substitution into Eqn. (44) gives S rel = ln (2πσ 2 ) 1 2 e Δ2 2σ 2 e Δ dδ = σ2 2 = Δ2 AA Δ 2 AA 2 = var AA (Δ) 2 where, in the last line, we again use the Gaussian approximation for the distribution of Δ. In this limit, Eqn. (46) suggests that the relative entropy measures fluctuations in Δ that capture differences in the AA and CG potential energy landscapes. Naturally, it relates to the second (46)

27 moment of energy landscape differences since additive shifts to either U AA or U CG have no effect on configurational probabilities. The validity of the Gaussian assumption for S rel can be addressed through a cumulant expansion of the exact expression in Eqn. (44): S rel = ln [1 + Δ Δ AA AA (Δ Δ AA )2 AA + ] = Δ2 AA Δ 2 AA 2 + (47) Here we see that the Gaussian result in Eqn. (46) corresponds to a truncation of third and higher order Δ moment contributions to the relative entropy. Indeed, Eqn. (47) suggests a systematic expansion in terms of moments with increasing associated statistical error. The statistical error associated with the first term (the Gaussian approximation) can be shown to scale as ε S rel ~S rel (48) In turn, errors transition from a linear to an exponential scaling with the relative entropy as per (45) as one includes successively higher moment contributions to S rel. IV. IV.1 MODEL OPTIMIZATION BY RELATIVE ENTROPY MINIMIZATION General considerations and matched averages & distributions The relative entropy gives an inverse measure of the fitness of a CG model based merely on ensemble probabilities and hence can be used to guide many different aspects of coarse-graining problems for a wide variety of model types. The most common task in bottom-up coarse-graining involves determination of a CG force field, U CG (R), for a given model architecture or mapping. Often a particular form of U CG is presumed (e.g., Lennard-Jones interactions), and free parameters in the form of energetic coefficients, length scales, etc. must be chosen; examples include σ and ε values in nonbonded potentials, force constants and equilibrium bond and angle potentials, and atomic partial charges in electrostatic interactions. We denote the collection of all such parameters as the vector λ. On the other hand, the force field may include more flexible functionalities, such as splines or tabulated potentials for bonded, nonbonded, and angular potentials. In this case, the free

28 parameters λ, correspond to a collection of spline knots or discrete potential values, but it is also useful to characterize this situation as containing a free function to be optimized (most often a univariate function). Minimization of S rel provides a systematic method for parameterizing such force fields. In the canonical ensemble, an optimal CG parameter set will zero the derivative of Eqn. (38). For a single parameter λ, this gives the optimality condition S rel λ = 0 U CG λ = U CG AA λ CG In other words, the average variation of the CG potential energy with λ should be the same when viewed in either ensemble. It is worth recalling that the AA average of Eqn. (49) projects the AA ensemble into CG configurational space, ala Eqn. (39). One can determine whether or not Eqn. (49) corresponds to a relative entropy minimum, rather than another stationary point, by checking the second derivative: (49) 2 S rel λ 2 = β U CG 2 λ 2 AA β 2 U CG λ 2 CG + β 2 U 2 CG β 2 U 2 CG λ λ CG CG A special case emerges when U CG is linear in a parameter λ. Let us suppose that (50) U CG (R) = λx(r) + where X(R) is a pre-specified function of one or more configurational coordinates. Typically, X involves one or more structural parameters, like a bond length or pair distance between atoms, and in turn relates to a component energy term. For example, X = i<j r 12 for the repulsive part of the Lennard Jones potential, in which case λ gives the combination of parameters 4εσ 12. In any case, minimization of the relative entropy gives X AA = X CG (51) with 2 S rel λ 2 = β2 X 2 CG β 2 X 2 CG = β 2 var CG (X) (52) such that the second derivative is directly related to the fluctuations in X in the CG ensemble and thus is never negative. As a result, the relative entropy contains only a single global minimum,

29 ensuring that gradient-based optimization methods will converge to the global solution, in principle. This is particularly encouraging because many standard force-field terms involve such linear parameters, or can be transformed as so (such as Lennard-Jones interactions). In many instances the CG potential can be completely decomposed into a sum of several such linear terms: U CG (R) = λ 1 X 1 (R) + λ 2 X 2 (R) + + λ M X M (R) (53) where M gives the number of basis functions X in the potential. Another special case then emerges when every λ is optimized. Application of (51) repeatedly shows that λ 1 X 1 AA + λ 2 X 2 AA + = λ 1 X 1 CG + λ 2 X 2 CG + U CG AA = U CG CG (54) That is, this case guarantees that the average CG energy is the same in both ensembles. By virtue of Eqn. (38), that fact makes the minimized value of the relative entropy equal to S rel,min = β U CG CG β U AA AA β(a CG A AA ) S map S rel,min = (S CG S AA ) k B S map = (S CG S AA ) k B where S AA and S CG give the usual configurational entropies of each ensemble (in dimensional units), and in which the CG parameter set is optimal. In this special case, then, the relative entropy can be related to a difference in properties intrinsic to the two ensembles. This result holds so long as every energy scale in the CG model is subject to optimization, even if there are nonlinear parameters that are also optimized, and it leads to two very interesting implications. First, it suggests that the thermal entropy difference serves as a lower bound for S rel for the force field embodied by (53). Any partial minimization, in which one or more λ s are held constant, will necessarily attain a higher value of S rel. Second, it shows that the CG model will have a higher entropy than the AA one, since the relative entropy is positive. In fact, this is a quite general conclusion that is emphasized below in the discussion of model design. (55)

30 There is also a nice analogy between Eqn. (53) and classical thermodynamics. If we consider all of the X s to be extensive mechanical quantities and the λ s as conjugate intensive thermodynamic fields, then from a macroscopic point of view the thermodynamic potential is U CG = λ 1 X 1 + λ 2 X λ M X M + TS CG (56) where S CG is the usual thermal entropy of the CG system. In this point of view, relative entropy minimization finds values of the conjugate fields that are the most appropriate to the CG system. In effect, it renormalizes these fields when downgrading the resolution from the reference AA system. Eqns. (49) and (51), which stem from optimization with respect to a parameter, can be generalized to functional minimization with respect to an arbitrary term in the force field. Let u(x) represent an additive component function in the force field, where again X(R) gives one or more structural parameters that depend on the complete coordinate set. For example, X may report the pair distance between two sites such that u is a pair potential. Functional minimization of S rel with respect to u then gives δs rel δu(x) = β δ[x X(R)] AA β δ[x X(R)] CG (57) Evaluation of the delta functions converts the averages to probability distributions, so that at the S rel minimum, we have δ[x X(M(r))] AA (r)dr = δ[x X(R)] CG (R)dR AA (X) = CG (X) (58) Thus the inclusion of free functions like u in U CG, with subsequent S rel optimization, ensures that the CG distributions of corresponding structural parameters X will completely replicate the AA reference distributions. In practice, functions like u(x) are represented by splines such that the actual minimization proceeds using the approach outlined initially, with a parameter λ for each of the knot points (which may be many). However, this analysis shows that optimized splines will in effect reproduce the corresponding structural correlations. Another more formal but less practical analogy expresses u(x) as an expansion, with

31 u(x) = λ 1 X + λ 2 X 2 + λ 3 X 3 + (59) Then by Eqn. (51), optimization of each λ results in X n AA = X n CG for n = 1,2,3, (60) and the CG model will replicate as many moments of the AA distribution as there are expansion terms and powers in u(x). An infinite expansion then recovers every moment, leading to (58). IV.2 Application: coarse-graining to understand liquid-state dynamics We found that the relative entropy provides an interesting interpretation for the emergence of dynamic scaling laws in simple liquids [22]. There has been tremendous interest in developing predictive models for liquid-state diffusion constants (and other kinetic coefficients) in terms of simple thermodynamic parameters and scaling laws. In particular, it has been found that the diffusion constant in many liquids behaves similarly to soft spheres, in which particles interact through a pair potential of the form u(r)~r n where n is a repulsive exponent. Soft sphere fluids are special in this regard because all reduced thermodynamic and dynamic properties can be expressed as a function of the single thermodynamic state variable φ = ρ n 3 T 1 that combines the effect of number density and temperature. For example, the reduced diffusivity, D = (ρ 1 3 m 1 2 T 1 2 )D, follows this scaling. For systems that behave like soft spheres, then, an effective repulsive exponent n exists that can collapse reduced properties to a single functional dependence on φ. We found that relative entropy minimization gives effective soft-sphere exponents that capture a temperature-density diffusivity scaling law, D = f(ρ n 3 T 1 ), for Lennard-Jones binary mixtures [22]. In this case, the coarse model involves a simplified potential energy function, that of soft spheres, and the minimization procedure determines the exponent n and energy coefficients in the CG pair potentials. The exponents, which are purely thermodynamic in origin, then successfully collapse diffusivity data for a range of pressures and temperatures onto a master curve (Figure 7),

32 as might be expected if the systems behave similarly to soft spheres. Indeed, the calculated values of the relative entropy are small, less than unity on a per-particle basis, and they only notably increase in the low-temperature regime where the collapse also begins to break down; thus, the behavior of S rel is consistent with the emergence of soft-sphere behavior and indeed seems to signal it. More interestingly, theory suggests that S rel should in fact directly capture how well the soft-sphere diffusivity mimics the original Lennard-Jones one. Many simple liquids exhibit so-called Rosenfeld scaling of the diffusion constant, ln D ~ c R s k B where c is a constant in the range and s is the per-particle excess thermal entropy [71,72]. Thus it is not unreasonable to expect that the diffusion constants for the two models follows ln(d SS ) ln(d LJ ) = c R (s k SS s LJ ) B = c R S rel N in which we used Eqn. (55) to replace the entropy differences with the relative entropy. Eqn. (61) describes the error in the coarse-grained diffusion constant and shows that it grows exponentially with S rel. It reinforces the notion that coarse models with lower values of S rel in turn reduce errors in properties, here demonstrated perhaps surprisingly for a dynamic quantity. Note that, because S rel 0, the optimized soft-sphere model should have faster dynamics, a reflection of the speedup upon smoothing the underlying energy landscape with a simpler potential. In summary, relative entropy minimization in this example shows that optimization to specific functional forms for coarse grained potentials, like the soft-sphere potential, can facilitate a kind of perturbation approach to properties by quantifying the relevance of systems with simpler, more transparent, or easier to describe emergent physics. (61)

33 ln D B * ln D B * P = 5 P = 10 P = /T P = 5 P = P = <n>/3 /T Figure 7: Coarse-graining to understand dynamical correlation laws. The diffusion constant of species B in a binary A-B mixture of Lennard-Jones particles depends in principle on two state parameters (top panel). Relative entropy minimization is used to determine an effective soft-sphere potential with repulsive exponent n. Because the reduced diffusion constant in soft spheres is a function of the single combined variable ρ n 3 T, the exponent n extracted from relative entropy minimization can then be used to collapse all of the data of the Lennard-Jones system onto a single master curve (bottom). Reproduced with permission from [22]. Copyright 2012, AIP Publishing LLP. IV.3 Relationship to other coarse-graining methods Relative entropy minimization shares many behaviors and ideas with other strategies in coarse-graining, both conceptual and numerical [15,19,70]. A first basic result occurs for the case in which the CG potential is entirely unconstrained not forced to any functional form for the force field, basis space, or interaction order (e.g., two-body). In this circumstance, the optimal U CG functionally minimizes S rel. Using the approach of Eqn. (58), we then have δs rel δu CG (R) = 0 δ[r M(r)] AA (r)dr = CG (R) AA (R) = CG (R) (62)

34 Thus functional minimization demands that the full CG microstate distribution will mimic that of the projected AA ensemble. Using Eqns. (3) and (8), we find that such minimization returns the perfect CG force field as given by the multidimensional atomistic PMF, with U CG (R) W(R), S rel 0, S rel S map (63) It may seem reassuring that relative entropy minimization returns this ideal limit in the unconstrained case, although ultimately this is not too surprising given that S rel measures the distance between the CG and AA configurational distributions and any such approach would necessarily attain the same limit. It is interesting to note that while S rel = 0 for perfect CG models, S rel remains positive and equal to the mapping entropy a lower bound that is influenced by the mapping design of the CG model. We discuss this point in greater detail below because it seems that S map may provide insight into the behaviors of various mappings. The result above gives a formal connection between relative entropy minimization and the force matching approach pioneered by Voth and co-workers [11 14]. We require the following theoretical result result: if U CG (R) = W(R), then the forces in the CG ensemble follow F CG (R) = R W(R) = β 1 V N n e βw R δ[r M(r)]e βu AA(r) dr = 1 AA (R) δ[r M(r)]F(r) AA (r)dr = F AA (R) where F gives the net forces projected onto CG sites, i.e., summed from the component AA forces: (64) F I = ( U AA r i ) i atoms for I The derivation of Eqn. (64) involves special treatment of the delta function derivative and can be approached in several ways, but its physical interpretation remains the same. Namely, the perfect CG force field will predict forces on a configuration R that are equal to the average net forces due to (65)

35 component AA sites, where the average is taken over all AA configurations r mapping to R with relative populations AA (r). The force-matching approach uses Eqn. (64) to optimize a force field U CG by minimizing force residuals between the CG model and averages from a reference AA trajectory. Equivalently, the approach minimizes the distance between CG and projected AA force vectors in a high dimensional space [13,73]. If the CG potential is unconstrained, then this analysis shows that both force-matching and relative entropy minimization will arrive at the same U CG that is the atomic PMF, W. However, in most settings potentials are constrained to be pair additive and the two approaches are not likely to return identical results. Noid and coworkers have nicely illustrated the relationship and tradeoffs between the approaches [70,73 75]. For example, they show that the force-matching technique embeds threebody and higher order correlations in spline-based CG pair potentials, whereas S rel minimization will directly match the two-body, distance-dependent ones per Eqn. (58). Importantly, they demonstrate a formal connection between the frameworks that is based on a metric for the information content in a given CG configuration, Φ(R) ln AA (R) CG (R) In this perspective, the relative entropy approach minimizes the average value of the information metric in the reference system, Φ AA = S rel, while force-matching minimizes its average square gradient, Φ 2 AA. Using simple-low dimensional models, they find that the two approaches give similar results, particularly for harmonic potentials, but that relative entropy CG models improve on reproduction of peaks in the AA configurational probability distribution, while force-matched models better-capture tails. Many more compelling details are given in Ref. [70]. A recent paper by Plecháč and co-workers also investigates the connection between the relative entropy and force matching approaches, and establishes links with thermodynamic integration [76]. (66)

36 In practical settings that presume specific functionalities for U CG, relative entropy minimization behaves similarly to so-called structure-based coarse graining techniques that seek to reproduce correlations or low-dimensional probability distributions in various structural coordinates. It should be emphasized that the relative entropy approach is not a structure-based method per se, as only equilibrium probabilities enter its construction and S rel makes no presumption about the relationship of structural parameters to them. It is only when particular forms of U CG are presupposed that a connection then emerges as a result of the projection of the CG model on a particular basis space that controls variations in U CG. The most prominent connection to structure-based coarse-graining stems from the Iterative Boltzmann inversion (IBI) [7 9] and Inverse Monte Carlo (IMC) [10] methods, both of which seek to construct a pair potential u(r) so that the CG model recapitulates a radial distribution function along the same distance coordinate, i.e., g CG (r) = g AA (r). Here, r is assumed to be a CG pair distance, i.e., dependent on the CG site configuration R so that it is measurable in both the AA and CG ensembles. Typically, the potential u(r) and distributions g(r) are finely tabulated on a distance grid. An initial guess for u(r) is proposed, u 0 (r) = k B T ln g AA (r), and then a trial CG simulation is performed to measure g CG (r). IBI then updates u(r) according to u(r) u(r) + k B T ln g CG (r) g AA (r) Such iterations of CG simulations and potential updates proceed until g CG (r) = g AA (r) within a numerical tolerance. IMC works similarly, except that the update corrects for measured crosseffects of the pair potential on g(r) at difference distance values r. In any case, S rel minimization will produce an identical result to IBI and IMC so long as the system involves a pair potential represented by a sufficiently detailed spline or lookup table. This was first observed by Murtola et al., who found that the iterative IMC equations could be reexpressed as a minimization problem using Newton-Raphson descent [77]. Namely, relative entropy minimization through Eqn. (58) with X = r gives (67)

37 AA (r) = CG (r) g AA (r) = g CG (r) (68) where the RHS results after suitable normalization with respect to the volume and relevant particle species numbers. (A more detailed derivation is in Refs. [16,19].) It is important to note that the u(r) giving rise to a particular g CG (r) is unique, which was originally proved by Henderson [78] but which can also be shown through using relative entropy arguments alone [16,70]. As a result, S rel minimization will then find the same optimal CG force field as the IBI and IMC methods, although the numerical algorithm is likely to be distinct because the former can be framed as a minimization problem. In general, whenever a flexible force field component u(x) is included, where X is a molecular distance or angle, then the relative entropy approach will behave similarly to structurebased CG methods. It behaves distinctly, however, if u(x) has a specific functional form that is not a spline or tabulated potential. The relative entropy approach also bears some similarity to energy-matching methods that are, for example, frequently used to develop classical force fields from electronic structure calculations. These approaches might minimize an energy residual between the coarse and reference models, given by E = (U CG U AA ) 2 AA = [U CG (M(r)) U AA (r)] 2 AA (r)dr (69) We expect that the CG potential can be shifted globally by an amount C, since the zero of the energy is unimportant to dynamics or equilibrium properties. The shift that minimizes the error satisfies C = U CG U AA AA. As a result, minimization of the residual in Eqn. (69) is exactly equivalent to minimizing the Gaussian approximation to the relative entropy in Eqn. (46). The relationship is β 2 E 2 = 1 2 var AA (Δ) = S rel (70) where Δ = βu AA βu CG is defined as before, per Eqn. (43). The Gaussian approximation to S rel works well when either the temperature is high or there are little energetic differences between the AA and CG models (i.e., Δ is small), and so under these conditions relative entropy minimization gives results similar to energy matching.

38 An interesting consequence is evident when (70) is minimized with respect to a force field parameter λ, giving for the optimal model, S rel λ = 0 U AA U CG AA U CG λ = (U AA U CG ) U CG AA λ AA This implies that energy matching will find a value of λ that decorrelates the response of the (71) potential, U CG λ, from energy differences in the two models. Intuitively, this seems appropriate as any correlation would imply a variation in λ that could improve the fit. Finally, the relative entropy approach has a close relationship with classical variational mean field theory. In the latter, one approximates a statistical mechanical model that has complex interactions, here indicated by potential U AA, with an approximate but analytically tractable one U CG. In this case, the CG model has the same degrees of freedom, with S map = 0. The conventional variational approach actually uses an inverse relative entropy that exchanges the roles of the CG and AA systems, S var = CG (R) ln CG (R) AA (R) In the canonical ensemble, similar to that shown for Eqn. (38), this variational relative entropy (72) becomes S var = β U AA U CG CG β(a AA A CG ) (73) Like S rel, S var can never be negative. In fact, it is equivalent to the venerable Gibbs-Bogoliubov- Feynman bound that is minimized to determine the form of or free parameters in a mean-field model, here given by the CG system. A key difference is that the energetic average in (73) is evaluated in the CG ensemble, which makes determination of S var easier than S rel. This is essential in calibrating the mean-field model as the reference AA one is not directly tractable. For example, minimization of S var with respect to parameter λ in U CG gives S var λ = 0 U AA U CG CG U CG λ = (U AA U CG ) U CG CG λ CG (74)

39 which is similar to the energy matching result except that all averages are now in the CG ensemble. Eqn. (74) shows that determination of λ implicitly requires the evaluation of CG, and involves a self-consistency criterion in the CG ensemble that is modulated by the form of U AA. From an information theoretic perspective, therefore, the variational mean field approach finds the CG model for which the AA system is most informative, rather than the other way around. Because of the asymmetry of the relative entropy expression, a CG model optimized in this way is distinct from the more difficult optimization described by Eqn. (49). We have the relationship S rel + S var = β U CG U AA AA β U CG U AA CG = Δ AA Δ CG (75) 2S rel The last line uses the Gaussian approximation to the relative entropy, Eqn. (46), and shows that weakly coarse-grained models (Δ small) give rise to similar behaviors with S rel S var. However, in general the two approaches give distinct results. V. NUMERICAL ALGORITHMS FOR CG FORCE FIELD PARAMETERIZATION V.1 General approaches Numerical minimization of the relative entropy offers a way to parameterize the force fields of CG models from reference AA simulations. Typically the reference simulation involves a small but representative AA system deemed to capture all relevant physics of interest, and data is collected in the form of a trajectory of configurational coordinates and energies. Selection of the AA system itself constitutes an important problem as well, because it will influence the quality and transferability of the CG model. Given an AA reference, let the vector λ = (λ 1, λ 2, ) denote all parameters in U CG that are to be optimized. In practice, these might be spline knots, Lennard Jones coefficients, force constants, equilibrium bond lengths and angles, among other things.

40 Figure 8: Relative entropy optimization by gradient-based methods. Coarse-graining can proceed by following the gradient of S rel in a high-dimensional space that includes all parameters in the CG force field λ. Because it depends only on simulation averages of U CG / λ, the gradient can be computed through a series reference and trial simulations of both the AA and CG systems. If all parameters are linear in the CG force field which spans a wide range of potential types, including splines the relative entropy contains only a single, global minimum because its curvature is strictly zero or positive. Reproduced with permission from [21]. Copyright 2011, American Chemical Society. Direct calculation of S rel and its dependence on λ is generally difficult due to the free energy difference in Eqn. (38). Not only do free energies require specialized simulation techniques (e.g., thermodynamic integration, histogram-based methods), one would need a continuous path interpolating between the AA and CG models free of thermodynamic pathologies that could complicate application of such methods (e.g., elimination of bonded degrees of freedom). However, the gradient of S rel is actually easier to access and can be used efficiently to direct the minimization problem. A particularly effective approach uses the Newton-Raphson iterative scheme to minimize S rel [16,19]. Here the CG model parameters are updated at discrete intervals k until they converge to the relative entropy minimum. The update follows

41 λ k+1 = λ k χh 1 S rel (76) where, χ (0,1] is a mixing parameter that can be used to ensure stability. The gradient is given by S rel = β U CG λ β U CG AA λ CG and the Hessian matrix of second derivatives follows (77) H ij = 2 U CG λ i λ j AA 2 U CG λ i λ j CG + β U CG λ i U CG λ j CG β U CG U CG λ i λ CG j CG Note that both Eqns. (77) and (78) require derivatives of the CG potential, which are readily (78) evaluated as a part of the pairwise interaction loop. Moreover, these expressions lack explicit free energies and involve only averages that are readily evaluated using simulations. However, two types of averages appear and are treated differently in practice. The AA averages are determined by reprocessing the saved reference trajectory, projecting each frame into the CG coordinate space through the mapping and then evaluating the corresponding U CG derivatives. In practice, it is usually better to completely pre-project the AA trajectory into a coarsened version such that the reprocessing is fast. Alternatively, if U CG contains only one- or two-dimensional component functions, one might instead calculate AA averages using finely-discretized histograms built from the reference trajectory. For example, if the force field is built from a pair potential u(r), such that U CG (R) = u( R i R j ) i<j, the it is possible to express U CG λ AA H(r) AA u(r) λ r where the histogram H(r) counts the number of CG site pairs separated by distance r, and in the above equation is averaged over all frames of the coarsened AA trajectory. The same approach works for multiple atom types one simply tabulates a separate histogram for each unique pair potential and is trivially generalizable to bond, angle, and dihedral potentials. (79)

42 The CG averages are more complicated because that ensemble changes with each step forward in λ. One possibility is to perform a (presumably fast) CG simulation at each iteration. The stochastic nature of these finite-length molecular simulations introduces errors in the calculated S rel derivatives, which generally necessitates regularization to ensure stable convergence. This may include Hessian conditioning techniques such as diagonal increments [79] and removal of errorprone eigenmodes, or more practical strategies such as backtracking and capping of parameter increments. Alternatively, stochastic minimization algorithms can be used, as demonstrated nicely by Bilionas and Zabaras [41]. Approaches are also needed to address parameters that are ill-sampled by the system in the sense that they have little effect on the average energies and their derivatives. The most common examples are spline knots associated with high energies that are rarely visited, e.g., knots that appear in the highly repulsive core of interparticle pair potentials. Such knot points can either be interpolated/extrapolated or simply removed, but it is important to exclude them from explicit optimization because they can contribute zero eigenvalues to H that prevent its inversion. Rather than perform a trial CG simulation at each iteration, a far better approach is to reuse old CG simulations through reweighting techniques, which removes many of the stochasticityassociated problems [21]. The idea is to perform a long simulation of a guess CG model with parameters λ 0 to generate a reference CG trajectory. Then, CG averages at subsequent iterations are approximated by reweighting the guess trajectory through perturbation expressions, a wellestablished practice in free energy techniques [68]. For example, S rel λ = β U CG λ AA β 1 w CG,λ 0 U CG λ w CG,λ 0 (80) where for each trajectory frame the reweighting weight is related to the difference in CG energy for the original and current force field,

43 w(r) exp[βu CG (R; λ 0 ) βu CG (R; λ)] = exp[βδu CG (R)] (81) Eqns. (80) and (81) imply a reprocessing of the saved CG reference trajectory at each minimization step. Because the saved trajectory typically only contains statistically decorrelated configurations, reprocessing is far less computationally expensive than a comparable brute-force CG simulation at the new parameter values. This has the effect of significantly speeding up the minimization procedure. More importantly, the approach eliminates stochasticity during minimization since only a single CG simulation is performed, once for the initial guess model, and so calculation of all averages becomes deterministic. In this reweighting approach, one is actually minimizing the difference between the relative entropy of the current and guess CG parameter set [21], ΔS rel = S rel,λ S rel,λ 0 = β ΔU AA + ln w CG,λ 0 (82) where the second line uses Eqn. (38) and free energy perturbation to replace the free energy difference. A little algebra shows that the derivative of (82) returns the expression in Eqn. (80). Importantly, ΔS rel shares the same minimum as S rel but can be calculated without special free energy techniques, using only averages. By accessing directly the absolute quantity ΔS rel, rather than merely its derivatives, it then also becomes possible to apply more robust minimization techniques such as the conjugate gradient approach, in place of Netwon-Raphson iterations. In practice, the accuracy of (82) depends on how close the current CG parameter set is to the reference one; if during the minimization it moves too far away, a new CG reference trajectory can be generated and saved, and the minimization then restarted. A new trajectory might be deemed needed when the effective number of frames, n eff from Eqn. (22), reaches 10-20% of the actual number of frames in the original CG trajectory. Local minimization techniques like those just described will attain a global S rel minimum if the initial parameter values are well-chosen and the Hessian remains positive definite. In the

44 special case that all of the parameters are linear in the force field the situation for many common functionalities, including splines and tabulated potentials the curvature of the relative entropy surface is always positive and contains only a single minimum per the analysis in Eqn. (52). Still, the convergence time will be influenced by the starting parameter set. Usually, it is not too difficult to use characteristic length and energy scales to estimate reasonable initial values for parameters. A more systematic approach is to use the high-temperature Gaussian approximation S rel introduced in Eqn. (46), which is particularly convenient because it contains only AA averages. Minimizing it with respect to a parameter λ gives (U AA U CG ) U CG λ U AA U CG AA U CG AA λ = 0 AA which is an implicit solution for λ 0 that seeks to vanish the covariance of the energy differences of the AA and CG models with the U CG derivative. Alternatively, if λ is a linear parameter such that U CG (R) = λx(r) +, then a closed-form solution for the guess value of λ 0 is possible: (83) λ 0 = U AAX AA U AA AA X AA X 2 AA X 2 = cov AA (U AA, X) AA var AA (X) (84) which is simply an energy-matching estimate for the parameter. V.2 Application: development of CG water and peptide models We used numerical relative entropy minimization to develop two kinds of singlesite CG water models, one based on pair spline potentials and another based on a pair potential that is the supposition of a Lennard-Jones functionality with a Gaussian (LJG) [80,81]. The latter has been motivated by the so-called core-softened picture of water that attempts to explain many of this fluid s unique properties in terms of two competing pair interaction distances, which in turn are reflected as two minima (or a shoulder and a minimum) in the pair potential. Using an all-atom water model as reference, S rel minimization gave realistic parameters for the LJG potential at different state conditions, and in fact found forms that are very similar to much more flexible spline

45 u(r) (kcal/mol) u (kcal/mol) u(r) (kcal/mol) pair potentials (Figure 9). Interestingly, the density dependence of the potential is significant, echoing findings by other authors [58,62], and it is actually critical to capturing well-known water anomalies in the thermal expansion coefficient and diffusion constant. Indeed, this work suggests that the manner in which the core-softened potential varies with state contributes to the anomalies as much as the presence itself of the soft core in the pair potential. We later found the optimized LJG water models are also able to well-capture a range of properties related to hydrophobic interactions between model associating spherical hydrophobes. For example, they show how the hydrophobic force law evolves from a weak, entropy-dominated, oscillatory interaction between small hydrophobes to a strong, enthalpic, monotonic force between macroscopic surfaces [23]. Moreover, even though core-softened models lack explicit hydrogen bonds or other angulardependent forces, we showed that they can capture many aspects of tetrahedral coordination and the role of tetrahedrality in hydrophobic interactions [24]. Lennard Jones plus Gaussian potential T spline potential r (Å) T r (Å) Figure 9: Relative-entropy-optimized pair potentials for the single-site SPC/E model [82] of water. The top panel shows results for a particular functional form, the so-called Lennard-Jones-Gaussian potential, while the bottom one corresponds to a more flexible spline pair potential. The existence of two minima is a hallmark of the core-softened picture of water thermodynamics. Reproduced from Ref. [17] with permission from the PCCP Owner Societies.

46 We also used numerical relative entropy minimization to develop several kinds of CG peptide models from detailed AA reference systems [21,25]. Such models are highly attractive and useful for the study of peptide self-assembly and aggregation problems, but the development of CG peptides is complicated for at least two reasons: it is unclear how to best design the mapping or even the number of sites to use per amino acid, and the number of distinct component bond, angle, torsional, and nonbonded spline potentials (and their spline knot parameters) grows substantially with the number and arrangement of pseudoatom types. For example, a four-bead-per-amino acid model of polyalanine (Figure 10) contains ~440 parameters in bond lengths, bond force constants, and (coarse) spline knots. From an implementation perspective, several features were critical to successful relative entropy minimization in models of such complexity: the reweighting formalism of Eqns. (80)-(82); Hessian conditioning to stabilize the inversion of H; and replica exchange sampling of the CG system to overcome trapping in artificial metastable states that emerged during the course of parameter optimization. Interestingly, we explored one-, two-, three-, and four-bead-per-amino-acid models of polyalanine, but the improvement in the performance of the CG model was not systematic. To characterize these models, we compared CG to AA distributions for macromolecular properties like the radius of gyration and helicity, and also examined folding curves that probe temperature transferability. In some respects, the triple-bead model seemed to perform worse than the doublebead one, while the four-bead model seemed to perform unexpectedly well, recapitulating many AA structural distributions over a wide range of temperature (Figure 10). We suspect that the success of the latter model is due to higher resolution in the backbone and a chemically reasonable mapping; however, a rigorous understanding of the effects of resolution and mapping here remains incomplete. We offer some thoughts towards mapping design in Section VI.1.

$(Top right) Although it is parameterized at a single temperature, the model exhibits good transferability over a wide temperature range, in terms of the comparison of helix and hairpin fraction to$ reference all-atom simulations.

47 Figure 10: A four-site-per-amino acid CG model of polyalanine. (Top left) Sites are grouped by chemistry. (Top right) Although it is parameterized at a single temperature, the model exhibits good transferability over a wide temperature range, in terms of the comparison of helix and hairpin fraction to reference all-atom simulations. (Bottom) Slightly above the folding temperature, the peptide adopts multiple conformational states that are clear in the free energy surface as a function of helicity and radius of gyration, which the CG model well-captures. Reproduced with permission from [25]. Copyright 2015, AIP Publishing LLP. VI. VI.1 EMERGING APPLICATIONS OF RELATIVE ENTROPY THEORY Optimal CG mappings One of the foremost challenges in modern coarse-graining is the issue of how to design CG models, i.e., the nature of CG pseudoatoms and their composition in terms of component AA sites [67]. Certainly many algorithms now exist to determine effective CG potentials given a predetermined mapping and thus a pre-constructed mapping function M, but the design of the

Advanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds)

Advanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds) Advanced sampling ChE210D Today's lecture: methods for facilitating equilibration and sampling in complex, frustrated, or slow-evolving systems Difficult-to-simulate systems Practically speaking, one is