The Sherrington-Kirkpatrick model

Similar documents
Quadratic replica coupling in the Sherrington-Kirkpatrick mean field spin glass model

Universality in Sherrington-Kirkpatrick s Spin Glass Model

Spin glasses and Stein s method

Lecture I: Asymptotics for large GUE random matrices

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 12 Jun 2003

Learning About Spin Glasses Enzo Marinari (Cagliari, Italy) I thank the organizers... I am thrilled since... (Realistic) Spin Glasses: a debated, inte

1 Functions of normal random variables

The Random Matching Problem

Is the Sherrington-Kirkpatrick Model relevant for real spin glasses?

A variational approach to Ising spin glasses in finite dimensions

Gärtner-Ellis Theorem and applications.

The cavity method. Vingt ans après

Phase transitions in discrete structures

MAJORIZING MEASURES WITHOUT MEASURES. By Michel Talagrand URA 754 AU CNRS

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

Ferromagnets and the classical Heisenberg model. Kay Kirkpatrick, UIUC

LECTURE 3. Last time:

IEOR 3106: Introduction to Operations Research: Stochastic Models. Fall 2011, Professor Whitt. Class Lecture Notes: Thursday, September 15.

arxiv:cond-mat/ v3 16 Sep 2003

Concentration inequalities and the entropy method

Problem set 1, Real Analysis I, Spring, 2015.

Math 421, Homework #9 Solutions

The Parisi Variational Problem

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

δ xj β n = 1 n Theorem 1.1. The sequence {P n } satisfies a large deviation principle on M(X) with the rate function I(β) given by

Exercises with solutions (Set D)

Appendix A Stability Analysis of the RS Solution

Empirical Processes: General Weak Convergence Theory

Probability and Measure

bipartite mean field spin systems. existence and solution Dipartimento di Matematica, Università di Bologna,

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 2 Sep 1998

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Information Theory and Statistics Lecture 3: Stationary ergodic processes

The glass transition as a spin glass problem

SYMMETRIC LANGEVIN SPIN GLASS DYNAMICS. By G. Ben Arous and A. Guionnet Ecole Normale Superieure and Université de Paris Sud

FLUCTUATIONS OF THE LOCAL MAGNETIC FIELD IN FRUSTRATED MEAN-FIELD ISING MODELS

Mean field behaviour of spin systems with orthogonal interaction matrix

Combinatorics in Banach space theory Lecture 12

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Concentration Inequalities

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

ON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT

Concentration inequalities: basics and some new challenges

Energy-Decreasing Dynamics in Mean-Field Spin Models

4. Conditional risk measures and their robust representation

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

The Codimension of the Zeros of a Stable Process in Random Scenery

A diluted version of the perceptron model

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

Lecture 3: Oct 7, 2014

CPSC 536N: Randomized Algorithms Term 2. Lecture 9

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 22: Final Review

arxiv: v1 [math.pr] 14 Aug 2017

OPTIMIZATION BY SIMULATED ANNEALING: A NECESSARY AND SUFFICIENT CONDITION FOR CONVERGENCE. Bruce Hajek* University of Illinois at Champaign-Urbana

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

The large deviation principle for the Erdős-Rényi random graph

The Ghirlanda-Guerra identities

Lecture 5 - Information theory

Ferromagnetic Ising models

Lecture 14 February 28

Vector measures of bounded γ-variation and stochastic integrals

p D q EA Figure 3: F as a function of p d plotted for p d ' q EA, t = 0:05 and y =?0:01.

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Convergence of Feller Processes

PERSPECTIVES ON SPIN GLASSES

Estimates for probabilities of independent events and infinite series

The Sherrington-Kirkpatrick model: an overview

I forgot to mention last time: in the Ito formula for two standard processes, putting

Order-parameter fluctuations (OPF) in spin glasses: Monte Carlo simulations and exact results for small sizes

Advanced Combinatorial Optimization Feb 13 and 25, Lectures 4 and 6

Random Fermionic Systems

LECTURE 2 NOTES. 1. Minimal sufficient statistics.

Chapter One. The Calderón-Zygmund Theory I: Ellipticity

The Replica Approach: Spin and Structural Glasses

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Through Kaleidoscope Eyes Spin Glasses Experimental Results and Theoretical Concepts

1 Glivenko-Cantelli type theorems

Random Bernstein-Markov factors

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

ECE 4400:693 - Information Theory

1 Definition of the Riemann integral

Statistical Physics on Sparse Random Graphs: Mathematical Perspective

Fluctuations of the free energy of spherical spin glass

Information, Physics, and Computation

Hands-On Learning Theory Fall 2016, Lecture 3

ECE 275B Homework # 1 Solutions Version Winter 2015

Lecture 7 Introduction to Statistical Decision Theory

. Find E(V ) and var(v ).

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

ON A NONHIERARCHICAL VERSION OF THE GENERALIZED RANDOM ENERGY MODEL 1. BY ERWIN BOLTHAUSEN AND NICOLA KISTLER Universität Zürich

On the convergence of sequences of random variables: A primer

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

Spectral Continuity Properties of Graph Laplacians

Lecture 2: Convergence of Random Variables

Transcription:

Stat 36 Stochastic Processes on Graphs The Sherrington-Kirkpatrick model Andrea Montanari Lecture - 4/-4/00 The Sherrington-Kirkpatrick (SK) model was introduced by David Sherrington and Scott Kirkpatrick in 975 as a simple solvable (in their words) model for spin-glasses. Spin-glasses are some type of magnetic alloys, and solvable meant that the asymptotic free entropy density could be computed exactly. It turns out that the original SK solution was incorrect and in fact inconsistent (the authors knew this). A consistent conjecture for the asymptotic free energy per spin was put forward by Giorgio Parisi in 98, and derived through the non-rigorous replica method. It took 4 years to prove this conjecture. The final proof is due to Michel Talagrand (006) and is a real tour de force. In these two lectures we will prove that the asymptotic free entropy density exists and that it is upper bounded by the Parisi formula. The first result is due to Francesco Guerra and Fabio Toninelli [GT0], and the second to Guerra [Gue03]. They are based on an interpolation trick that was a authentic breakthrough eventually leading to Talagrand s proof. Definitions and the Parisi formula Let {J ij } i,j [n] be a collection of i.i.d. N(0, /(n)) random variables. The SK model is the random measure over x {+, } n defined by µ J,β,B (x) = { Z n (β, B) exp β J ij x i x j + B x i }, () Here Z n (β, B) is defined by the normalization condition x µ J,β,B(x) =. It is of course a random variable. An alternative but sometime useful formulation of the model consists in saying that { } µ β,b (x) = Z n (β, B) exp H(x), () where H(x) is a gaussian process indexed by x {+, } n with mean and covariance E H(x) = n B M x, Cov(H(x), H(y)) = nβ Q x,y. (3) Here M x, and Q x,y denote the empirical magnetizaton and empirical overlap, namely M x n x i, Q x,y n Throughout these lectures we will be interested in the free entropy density Parisi formula provides a surprising prediction for this quantity. x i y i. (4) φ n (β, B) n log Z n(β, B). (5) Definition. Let D be the space of non-decreasing functions x : [0, ] [0, ]. The Parisi functional is the function P : D R R R defined by P[x; β, B] = log + f(0, B; x) β 0 q x(q) dq, (6)

where f : [0, ] R D R, (q, y, x) f(q, y; x) is the unique solution of the partial differential equation f q + f y + x(q) with boundary condition f(, y; x) = log cosh βy. ( ) f = 0, (7) y The relation between Parisi formula and the SK free entropy is given by the following theorem. Theorem. Let φ n (β, B) be the free entropy density of a Sherrington-Kirkpatrick model with n variables. Then, almost surely lim φ n(β, B) = P (β, B) inf P[x; β, B]. (8) n x D At first sight, this result appears analogous to the one we proved in the first lecture for the Curie-Weiss model, that we reproduce here for convenience: lim n φcw n (β, B) = sup ϕ β,b (m), (9) m [,] ϕ β,b (m) H ( + m ) + β m + Bm. (0) Notice however two important and surprising differences: (i) The supremum in the last expression is replaced by an infimum in Eq. (8); (ii) The optimum is taken over a single real parameter in the Curie-Weiss model, and over a function in the SK case. Existence of the limit In this section we will prove that the limit n exists almost surely. Theorem 3. Let φ n (β, B) be the free entropy density of a Sherrington-Kirkpatrick model with n variables. Then, almost surely for some non-random quantity φ (β, B). lim φ n(β, B) = φ (β, B), () n Proof Let φ av n (β, B) = E φ n (β, B) be the expected free entropy density. (We will often drop the dependence on β, B in the following.) The proof is proceed in two steps. First one shows that φ n concentrates around φ av n, and therefore is sufficient to show that the latter (deterministic) sequence converges. This is proved by showing that the sequence {nφ av n } n 0 is superaddittive, and applying Fekete s lemma. Concentration follows from gaussian isoperimetry. Recall indeed that if F : R k R is Lipshitz continuous with modulus L (i.e. if F (a) F (b) L a b ) and Z = (Z,..., Z k ) is a vector of i.i.d. standard N(0σ ) random variables, then P{F (Z) E F (Z) u} e u /(Lσ). () It is easy to check that φ n is a Lipshitz continuous function with modulus β of the n random variables J ij, whence By Borel-Cantelli it is therefore sufficient to prove that φ av n P{ φ n φ av n u} e nu /β. (3) has a limit.

As mentioned, existence of the limit folllows from the superaddittivity of the sequence {nφ av n } n 0, i.e from the observation that, for any n, n N, letting n = n + n, we have n φ av n n φ av n + n φ av n. (4) In order to prove this inequality, define, for t [0, ], the gaussian process on the hypercube {H t (x)} x {+, } n with mean and covariance E H t (x) = n B M = n BM n BM, (5) Cov(H t (x), H t (y)) = β{ t nq + ( t)[n Q + n Q ] }. (6) where, letting [n] = V V with V = {,..., n }, V = {n +,..., n + n = n}, we defined, for a {, }, M a n a i V a x i, (7) Q a n a i V a x i y i. (8) Further M = (n /n)m + (n /n)m and Q = (n /n)q + (n /n)q are the magnetization and overlap defined in Eq. (4). Define { Φ(t) E log e Ht(x)}. (9) x It is obvious that Φ() = nφ av n. Further, for t = 0, one can represent the gaussian process as H 0 (x) = β Jijx i x j B x i (0) i,j V i V β Jijx i x j B x i, i,j V i V with J ij N(0, /(n )) i.i.d. and J ij N(0, /(n )). It follows that Φ(0) = n φ av n + n φ av n. The superaddittivity is proved by showing that the first derivative Φ (t) is non-negative. In order to compute this derivative, it is convenient to use the following Lemma. Lemma 4. For t [0, ], let {X k } k S be a finite collection of normal random variables, with covariance C kl (t) = E t [X k X l ] E t [X k ]E t [X l ], and mean a k = E t [X k ] independent of t. Then, for any polynomially bounded function F : R S R, d dt E t{f (X)} = k,l S dc k,l dt In order to use this formula for computing Φ (t), we use { H(x) H(y) log x { F } (t) E t (X) x k x l () e H(x)} = µ(x) I x=y µ(x)µ(y), () and d dt Cov(H t(x), H t (y)) = {( n β n n Q (x, y) + n ) n Q n (x, y) n Q (x, y) n n Q (x, y) }. (3) 3

Therefore the first term in Eq. () does not give any contribution to the derivative, and { } d log dt Cov(H x e H(x) t(x), H t (y)) = {( n H(x) H(y) β ne µ µ n Q + n ) n Q n n Q n } n Q, x,y which is non-negative by convexity of x x. The derivative Φ (t) is obtained by taking the expectation of the above and hence is non-negative as well. 3 RSB bounds Theorem 5. Let φ av n (β, B) be the expected free entropy density of a Sherrington-Kirkpatrick model with n variables. Then φ av n (β, B) P (β, B) inf P[x; β, B]. (4) x D Proof Since β, B are fixed throughout the proof, we will regard P : D R uniquely as a function of x : [0, ] [0, ]. An important simplifying remark is that P is Lipshitz continuous with respect to the L norm. More precisely, for x, x D, we have P[x] P[x ] β x x. (5) We refer to [Gue03] for a proof of this statement. It is therefore sufficient to prove φ av n (β, B) P[x] for x in a dense subset of D. The proof proceeds by considering the set K D K, where D K is the subset of functions x : [0, ] [0, ] non-decreasing and right-continuous which takes at most K distinct values in [0, ) (plus, eventually, x() = ). A function x D K is parameterized by two vectors 0 q 0 q q K q K q = and 0 = m 0 m m K m =, by letting, for q [0, ) x(q) = (The notation is here slightly different from [Gue03].) Of particular interest is the case K = 0 (replica symmetric, RS) where m i I { q [q i, q i ) }. (6) x(q) = { 0 if 0 q < q0, if q 0 q. (7) Also important in the analysis of other models is the case K = (one-step replica symmetry breaking, RSB) 0 if 0 q < q 0, x(q) = m if q 0 q < q, (8) if q q. The function x(q) actually achieving the inf P[x] has the interpretation of cumulative distribution of the overlap Q x,y for two configurations drawn from the random measure µ J µ J. The partial differential equation (7) is easily solved for x D K yielding amore explicit expression for P[x] = P K (m, q). We get P K (m, q) = E log Y 0 4 β 4 m i (q i q i ), (9)

where Y 0 is the random variable constructed as follows. Define, for X 0,..., X i.i.d. N(0, ), ( Y = cosh B + β where it is understood that q = 0. Then, recursively, for l = K, K,..., 0 ql q l X l ). (30) Y l = {E l+ (Y m l+ l+ )} /m l+. (3) where E l+ denotes expectation with respect to X l+. In particular in the replica symmetric case (K = 0), we have the following function of the only remaining parameter q 0 P 0 (q 0 ) = E log cosh(b + β q 0 X) + 4 β ( q 0 ). (3) We need now to prove that, for any K, and for any choice of the parameters {m i }, {q i }, we have φ av n P K (m, q). This is done by interpolation. Define, for t [0, ], H t (x) = β t J ij x i x j B x i β t ql q l n G l ix i, (33) with the {G l i } i.i.d. N(0, ) random variables. We then construct the partition functions {Z l(t)} 0 l by letting Z (t) x e Ht(x), (34) and then recursively for l {0,..., K} Z l (t) {E l+ (Z m l+ l+ (t))}/m l+, (35) where E l+ denotes expectation with respect to {G l+ i } i n. Finally, we define Φ(t) n E 0 log Z 0 (t). (36) Here E 0 is expectation with respect to both {G 0 i } i n and {J ij } i,j n. It is easy to see that Φ() = φ av n, since in this case H 0 (x) = H(x) is just the SK energy function. Further Φ(0) = E log Y 0 (with Y 0 defined as above) since Z (t) n { ( cosh B + β ql q l G l i)}, (37) is the product of n i.i.d. copies of Y. The proof is completed by evaluating the defivative of Φ(t) with respect to t. This takes the form Φ (t) = 4 β l= m l (q l q l ) 4 β K (m l+ m l ) (Q(x, y) q l ) l, (38) where l denotes expectation with respect to an apropriate measure on (x, y) {+, } n {+, } n. Since the argument in this expectation is a perfect square, it follows that Φ() Φ(0) 4 β l= 5 m l (q l q l ), (39)

which is our claim. Computing the derivative (38) is not particularly difficult, just a bit laborious. For the sake of simplicity, we will consider the case K = 0 (replica symmetric). In that case, applying the definitions we get We then have Φ(t) = β ( t)( q 0 ) + { n E log } e H b t(x), (40) Ĥ t (x) = β t J ij x i x j B Φ (t) = β ( q 0 ) + β n t x x i β t q 0 E{J ij µ t (x i x j )} β q 0 n t n G i x i. (4) E{G i µ t (x i )}, (4) where µ t is the probaility measure µ t (x) exp{ Ĥt(x)}, and µ t (x i x j ), µ t (x i ) denote the expectations of x i x j and x i. Using Stein s Lemma, we get where µ () t Φ (t) = β ( q 0 ) + β 4n = 4 β β 4n = 4 β β 4 E{µ() t E{µ t (x i x j) µ t (x i x j ) } β q 0 n E{µ () t (x i y i x j y j )} + β q 0 n (Q(x, y) )} + β q 0E{µ () t (Q(x, y))} E{µ t (x i ) µ t (x i ) } E{µ () t (x i y i )} = µ t µ t is the product distribution over x, y. By completing the square, we get which indeed coincides with Eq. (38) for K = 0. Φ (t) = 4 β ( q 0) β 4 E{µ() t ([Q(x, y) q 0 ] )}, References [GT0] Francesco Guerra and Fabio L. Toninelli. The thermodynamic limit in mean field spin glasses. Commun. Math. Phys., 30:7 79, 00. [Gue03] Francesco Guerra. Broken Replica Symmetry Bounds in the Mean Field Spin Glass Model. Comm. Math. Phys., 33:, 003. 6