Stat 36 Stochastic Processes on Graphs The Sherrington-Kirkpatrick model Andrea Montanari Lecture - 4/-4/00 The Sherrington-Kirkpatrick (SK) model was introduced by David Sherrington and Scott Kirkpatrick in 975 as a simple solvable (in their words) model for spin-glasses. Spin-glasses are some type of magnetic alloys, and solvable meant that the asymptotic free entropy density could be computed exactly. It turns out that the original SK solution was incorrect and in fact inconsistent (the authors knew this). A consistent conjecture for the asymptotic free energy per spin was put forward by Giorgio Parisi in 98, and derived through the non-rigorous replica method. It took 4 years to prove this conjecture. The final proof is due to Michel Talagrand (006) and is a real tour de force. In these two lectures we will prove that the asymptotic free entropy density exists and that it is upper bounded by the Parisi formula. The first result is due to Francesco Guerra and Fabio Toninelli [GT0], and the second to Guerra [Gue03]. They are based on an interpolation trick that was a authentic breakthrough eventually leading to Talagrand s proof. Definitions and the Parisi formula Let {J ij } i,j [n] be a collection of i.i.d. N(0, /(n)) random variables. The SK model is the random measure over x {+, } n defined by µ J,β,B (x) = { Z n (β, B) exp β J ij x i x j + B x i }, () Here Z n (β, B) is defined by the normalization condition x µ J,β,B(x) =. It is of course a random variable. An alternative but sometime useful formulation of the model consists in saying that { } µ β,b (x) = Z n (β, B) exp H(x), () where H(x) is a gaussian process indexed by x {+, } n with mean and covariance E H(x) = n B M x, Cov(H(x), H(y)) = nβ Q x,y. (3) Here M x, and Q x,y denote the empirical magnetizaton and empirical overlap, namely M x n x i, Q x,y n Throughout these lectures we will be interested in the free entropy density Parisi formula provides a surprising prediction for this quantity. x i y i. (4) φ n (β, B) n log Z n(β, B). (5) Definition. Let D be the space of non-decreasing functions x : [0, ] [0, ]. The Parisi functional is the function P : D R R R defined by P[x; β, B] = log + f(0, B; x) β 0 q x(q) dq, (6)
where f : [0, ] R D R, (q, y, x) f(q, y; x) is the unique solution of the partial differential equation f q + f y + x(q) with boundary condition f(, y; x) = log cosh βy. ( ) f = 0, (7) y The relation between Parisi formula and the SK free entropy is given by the following theorem. Theorem. Let φ n (β, B) be the free entropy density of a Sherrington-Kirkpatrick model with n variables. Then, almost surely lim φ n(β, B) = P (β, B) inf P[x; β, B]. (8) n x D At first sight, this result appears analogous to the one we proved in the first lecture for the Curie-Weiss model, that we reproduce here for convenience: lim n φcw n (β, B) = sup ϕ β,b (m), (9) m [,] ϕ β,b (m) H ( + m ) + β m + Bm. (0) Notice however two important and surprising differences: (i) The supremum in the last expression is replaced by an infimum in Eq. (8); (ii) The optimum is taken over a single real parameter in the Curie-Weiss model, and over a function in the SK case. Existence of the limit In this section we will prove that the limit n exists almost surely. Theorem 3. Let φ n (β, B) be the free entropy density of a Sherrington-Kirkpatrick model with n variables. Then, almost surely for some non-random quantity φ (β, B). lim φ n(β, B) = φ (β, B), () n Proof Let φ av n (β, B) = E φ n (β, B) be the expected free entropy density. (We will often drop the dependence on β, B in the following.) The proof is proceed in two steps. First one shows that φ n concentrates around φ av n, and therefore is sufficient to show that the latter (deterministic) sequence converges. This is proved by showing that the sequence {nφ av n } n 0 is superaddittive, and applying Fekete s lemma. Concentration follows from gaussian isoperimetry. Recall indeed that if F : R k R is Lipshitz continuous with modulus L (i.e. if F (a) F (b) L a b ) and Z = (Z,..., Z k ) is a vector of i.i.d. standard N(0σ ) random variables, then P{F (Z) E F (Z) u} e u /(Lσ). () It is easy to check that φ n is a Lipshitz continuous function with modulus β of the n random variables J ij, whence By Borel-Cantelli it is therefore sufficient to prove that φ av n P{ φ n φ av n u} e nu /β. (3) has a limit.
As mentioned, existence of the limit folllows from the superaddittivity of the sequence {nφ av n } n 0, i.e from the observation that, for any n, n N, letting n = n + n, we have n φ av n n φ av n + n φ av n. (4) In order to prove this inequality, define, for t [0, ], the gaussian process on the hypercube {H t (x)} x {+, } n with mean and covariance E H t (x) = n B M = n BM n BM, (5) Cov(H t (x), H t (y)) = β{ t nq + ( t)[n Q + n Q ] }. (6) where, letting [n] = V V with V = {,..., n }, V = {n +,..., n + n = n}, we defined, for a {, }, M a n a i V a x i, (7) Q a n a i V a x i y i. (8) Further M = (n /n)m + (n /n)m and Q = (n /n)q + (n /n)q are the magnetization and overlap defined in Eq. (4). Define { Φ(t) E log e Ht(x)}. (9) x It is obvious that Φ() = nφ av n. Further, for t = 0, one can represent the gaussian process as H 0 (x) = β Jijx i x j B x i (0) i,j V i V β Jijx i x j B x i, i,j V i V with J ij N(0, /(n )) i.i.d. and J ij N(0, /(n )). It follows that Φ(0) = n φ av n + n φ av n. The superaddittivity is proved by showing that the first derivative Φ (t) is non-negative. In order to compute this derivative, it is convenient to use the following Lemma. Lemma 4. For t [0, ], let {X k } k S be a finite collection of normal random variables, with covariance C kl (t) = E t [X k X l ] E t [X k ]E t [X l ], and mean a k = E t [X k ] independent of t. Then, for any polynomially bounded function F : R S R, d dt E t{f (X)} = k,l S dc k,l dt In order to use this formula for computing Φ (t), we use { H(x) H(y) log x { F } (t) E t (X) x k x l () e H(x)} = µ(x) I x=y µ(x)µ(y), () and d dt Cov(H t(x), H t (y)) = {( n β n n Q (x, y) + n ) n Q n (x, y) n Q (x, y) n n Q (x, y) }. (3) 3
Therefore the first term in Eq. () does not give any contribution to the derivative, and { } d log dt Cov(H x e H(x) t(x), H t (y)) = {( n H(x) H(y) β ne µ µ n Q + n ) n Q n n Q n } n Q, x,y which is non-negative by convexity of x x. The derivative Φ (t) is obtained by taking the expectation of the above and hence is non-negative as well. 3 RSB bounds Theorem 5. Let φ av n (β, B) be the expected free entropy density of a Sherrington-Kirkpatrick model with n variables. Then φ av n (β, B) P (β, B) inf P[x; β, B]. (4) x D Proof Since β, B are fixed throughout the proof, we will regard P : D R uniquely as a function of x : [0, ] [0, ]. An important simplifying remark is that P is Lipshitz continuous with respect to the L norm. More precisely, for x, x D, we have P[x] P[x ] β x x. (5) We refer to [Gue03] for a proof of this statement. It is therefore sufficient to prove φ av n (β, B) P[x] for x in a dense subset of D. The proof proceeds by considering the set K D K, where D K is the subset of functions x : [0, ] [0, ] non-decreasing and right-continuous which takes at most K distinct values in [0, ) (plus, eventually, x() = ). A function x D K is parameterized by two vectors 0 q 0 q q K q K q = and 0 = m 0 m m K m =, by letting, for q [0, ) x(q) = (The notation is here slightly different from [Gue03].) Of particular interest is the case K = 0 (replica symmetric, RS) where m i I { q [q i, q i ) }. (6) x(q) = { 0 if 0 q < q0, if q 0 q. (7) Also important in the analysis of other models is the case K = (one-step replica symmetry breaking, RSB) 0 if 0 q < q 0, x(q) = m if q 0 q < q, (8) if q q. The function x(q) actually achieving the inf P[x] has the interpretation of cumulative distribution of the overlap Q x,y for two configurations drawn from the random measure µ J µ J. The partial differential equation (7) is easily solved for x D K yielding amore explicit expression for P[x] = P K (m, q). We get P K (m, q) = E log Y 0 4 β 4 m i (q i q i ), (9)
where Y 0 is the random variable constructed as follows. Define, for X 0,..., X i.i.d. N(0, ), ( Y = cosh B + β where it is understood that q = 0. Then, recursively, for l = K, K,..., 0 ql q l X l ). (30) Y l = {E l+ (Y m l+ l+ )} /m l+. (3) where E l+ denotes expectation with respect to X l+. In particular in the replica symmetric case (K = 0), we have the following function of the only remaining parameter q 0 P 0 (q 0 ) = E log cosh(b + β q 0 X) + 4 β ( q 0 ). (3) We need now to prove that, for any K, and for any choice of the parameters {m i }, {q i }, we have φ av n P K (m, q). This is done by interpolation. Define, for t [0, ], H t (x) = β t J ij x i x j B x i β t ql q l n G l ix i, (33) with the {G l i } i.i.d. N(0, ) random variables. We then construct the partition functions {Z l(t)} 0 l by letting Z (t) x e Ht(x), (34) and then recursively for l {0,..., K} Z l (t) {E l+ (Z m l+ l+ (t))}/m l+, (35) where E l+ denotes expectation with respect to {G l+ i } i n. Finally, we define Φ(t) n E 0 log Z 0 (t). (36) Here E 0 is expectation with respect to both {G 0 i } i n and {J ij } i,j n. It is easy to see that Φ() = φ av n, since in this case H 0 (x) = H(x) is just the SK energy function. Further Φ(0) = E log Y 0 (with Y 0 defined as above) since Z (t) n { ( cosh B + β ql q l G l i)}, (37) is the product of n i.i.d. copies of Y. The proof is completed by evaluating the defivative of Φ(t) with respect to t. This takes the form Φ (t) = 4 β l= m l (q l q l ) 4 β K (m l+ m l ) (Q(x, y) q l ) l, (38) where l denotes expectation with respect to an apropriate measure on (x, y) {+, } n {+, } n. Since the argument in this expectation is a perfect square, it follows that Φ() Φ(0) 4 β l= 5 m l (q l q l ), (39)
which is our claim. Computing the derivative (38) is not particularly difficult, just a bit laborious. For the sake of simplicity, we will consider the case K = 0 (replica symmetric). In that case, applying the definitions we get We then have Φ(t) = β ( t)( q 0 ) + { n E log } e H b t(x), (40) Ĥ t (x) = β t J ij x i x j B Φ (t) = β ( q 0 ) + β n t x x i β t q 0 E{J ij µ t (x i x j )} β q 0 n t n G i x i. (4) E{G i µ t (x i )}, (4) where µ t is the probaility measure µ t (x) exp{ Ĥt(x)}, and µ t (x i x j ), µ t (x i ) denote the expectations of x i x j and x i. Using Stein s Lemma, we get where µ () t Φ (t) = β ( q 0 ) + β 4n = 4 β β 4n = 4 β β 4 E{µ() t E{µ t (x i x j) µ t (x i x j ) } β q 0 n E{µ () t (x i y i x j y j )} + β q 0 n (Q(x, y) )} + β q 0E{µ () t (Q(x, y))} E{µ t (x i ) µ t (x i ) } E{µ () t (x i y i )} = µ t µ t is the product distribution over x, y. By completing the square, we get which indeed coincides with Eq. (38) for K = 0. Φ (t) = 4 β ( q 0) β 4 E{µ() t ([Q(x, y) q 0 ] )}, References [GT0] Francesco Guerra and Fabio L. Toninelli. The thermodynamic limit in mean field spin glasses. Commun. Math. Phys., 30:7 79, 00. [Gue03] Francesco Guerra. Broken Replica Symmetry Bounds in the Mean Field Spin Glass Model. Comm. Math. Phys., 33:, 003. 6