Displacement convexity of the relative entropy in the discrete h

Displacement convexity of the relative entropy in the discrete hypercube LAMA Université Paris Est Marne-la-Vallée Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, June 2012

General problem (X, d) a complete separable metric space.

General problem (X, d) a complete separable metric space. Definition [Relative entropy] Let µ, ν be two Borel probability measures on X H(ν µ) := log dν dµ dν if ν is absolutely continuous with respect to µ (otherwise we set H(ν µ) = ). The map ν H(ν µ) is always convex in the usual sense: H((1 t)ν 0 + tν 1 µ) (1 t)h(ν 0 µ) + th(ν 1 µ), t [0, 1]. Problem: Examine the convexity of H along other types of paths (ν t) t [0,1] interpolating between ν 0 and ν 1.

Outline Based on a joint work in progress with C. Roberto, P-M Samson and P. Tetali I. Displacement convexity of the entropy in a continuous setting. Link with curvature and functional inequalities. II. Displacement convexity in discrete setting. General framework and the example of the discrete hypercube.

I. Displacement convexity of the entropy in a continuous setting.

Geodesic spaces A metric space (E, d) is said geodesic if for all x 0, x 1 E there is at least one path γ : [0, 1] E such that γ(0) = x 0 and γ(1) = x 1 and d(γ(s), γ(t)) = t s d(x 0, x 1), t, s [0, 1]. Such a path is called a constant speed geodesic between x 0 and x 1.

Geodesics in the Wasserstein space Let P p(x ), p 1, be the set of Borel probability measures having a finite p-th moment. Definition [L p-wasserstein distance] Let ν 0, ν 1 P p(x ); W p p (ν 0, ν 1) = inf π P(ν 0,ν 1 ) where P(ν 0, ν 1) is the set of couplings of ν 0 and ν 1. d p (x 0, x 1) π(dx 0dx 1),

Displacement convexity of the entropy and the Bakry-Emery criterion Theorem Let (M, g) be a complete connected Riemannian manifold and suppose that µ P(M) is absolutely continuous with µ(dx) = e V (x) dx. The following are equivalent: 1 µ verifies the CD(K, ) condition, for some K R: Ric + Hess V Kg 2 The relative entropy functional is K-displacement convex with respect to the W 2 metric: for all ν 0, ν 1 P 2(M), absolutely continuous with respect to µ, there is a W 2-geodesic (ν t) t [0,1] connecting ν 0 to ν 1 such that H(ν t µ) (1 t)h(ν 0 µ)+th(ν 1 µ) K t(1 t) W2 2 (ν 0, ν 1), t [0, 1] 2 Mc Cann, Cordero-McCann-Schmuckenschläger, Sturm-Von Renesse, Lott-Villani, Sturm.

Consequences 1. This equivalence sets the ground of a possible definition of the condition CD(K, ) on geodesic spaces (Lott-Sturm-Villani).

Consequences 1. This equivalence sets the ground of a possible definition of the condition CD(K, ) on geodesic spaces (Lott-Sturm-Villani). 2. Brunn-Minkowski type inequalities. Suppose K 0, then µ([a, B] t) µ(a) 1 t µ(b) t e Kt(1 t) d 2 2 (A,B), t [0, 1], where [A, B] t = {x M; (a, b) A B, d(x, a) = (1 t)d(a, b) and d(x, b) = td(a, b)}. 3. Talagrand s inequality. Suppose K > 0, then W 2 2 (ν 0, µ) 2 K H(ν0 µ), ν0 P2(M).

5. Log-Sobolev inequality. Suppose K > 0, then H(ν 0 µ) 2 K I (ν0 µ), ν0.

5. Log-Sobolev inequality. Suppose K > 0, then H(ν 0 µ) 2 K I (ν0 µ), ν0. 6. Prekopa-Leindler type inequalities. Suppose K R, and fix t (0, 1). If f, g, h : M R are such that h(z) (1 t)f (x)+tg(y) then ( e h(z) µ(dz) Kt(1 t) d 2 (x, y), 2 ) 1 t ( e f (x) µ(dx) x, y M, z [x, y] t ) t e g(y) µ(dy)

II. Displacement convexity of the entropy in a discrete setting.

Extension to discrete setting Question: Is it possible to extend this theory to discrete settings, for example finite graphs?

Extension to discrete setting Question: Is it possible to extend this theory to discrete settings, for example finite graphs? Two obstructions: Talagrand s inequality W 2 2 (ν 0, µ) CsteH(ν 0 µ), ν 0 does not hold in discrete setting unless µ is a Dirac. W 2-geodesics do not exist in discrete setting. Namely, if ν t is a W 2 geodesic between δ x and δ y, then it is easy to see that ν t is supported in [x, y] t. But in discrete, [x, y] t can be empty. For example, if x, y are neighbors in a graph, then [x, y] t =, for all t (0, 1).

Recent results Ollivier, Ollivier-Villani, Bonciocat-Sturm, Maas, Erbar-Maas, Mielke, Hillion, Léonard, Lehec, Joulin, Veysseire...

Recent results Ollivier, Ollivier-Villani, Bonciocat-Sturm, Maas, Erbar-Maas, Mielke, Hillion, Léonard, Lehec, Joulin, Veysseire... Ollivier-Villani:Brunn-Minkowski type inequality on the discrete hypercube Ω n = {0, 1} n equipped with the Hamming distance d(x, y) = n i=1 1I x i y i. [A, B] 1/2 A 1/2 B 1/2 e 1 16n d2 (A,B), A, B Ω n Consequence of the following property: ν 0, ν 1, ν 1/2 such that H(ν 1/2 µ) 1 2 H(ν0 µ) + 1 2 H(ν1 µ) 1 16n W 2 1 (ν 0, ν 1), where µ is the uniform measure on Ω n.

Recent results Maas-Erbar: Displacement convexity of the entropy on the discrete hypercube Ω n = {0, 1} n. H(ν t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) t(1 t) W2 2 (ν 0, ν 1) n

Recent results Maas-Erbar: Displacement convexity of the entropy on the discrete hypercube Ω n = {0, 1} n. t(1 t) H(ν t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) W2 2 (ν 0, ν 1) n The (finite dimensional) space P(Ω n) is equipped with a Riemannian metric. This metric is made in such a way that the continuous time simple random walk becomes a gradient flow of the function H( µ) (with respect to the Riemannian structure). [This construction is very general and holds for every reversible Markov kernel on a finite graph]. The pseudo Wasserstein distance W 2 corresponds to the geodesic distance on P(Ω n) and ν t is a geodesic connecting ν 0 to ν 1.

Our approach similar approach developed independently by E Hillion

Our approach similar approach developed independently by E Hillion We seek to prove convexity of the entropy along (pseudo)-w 1 -geodesics ν t connecting ν 0 to ν 1 :

Our approach similar approach developed independently by E Hillion We seek to prove convexity of the entropy along (pseudo)-w 1 -geodesics ν t connecting ν 0 to ν 1 : There is some K 0 such that for all ν 0, ν 1 there is some pseudo W 1-geodesic ν t connecting ν 0 to ν 1 such that for all t [0, 1] H(ν t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) Kt(1 t) 2 A good candidate for the transport term in the right hand side is W 2 1 (ν 0, ν 1)..

Our main example - the discrete hypercube Ω n Theorem [GRST] Let µ be a probability on {0, 1} and denote by µ n its n-fold tensor product. The following displacement convexity properties of the entropy hold true. For all ν 0, ν 1 P(Ω n), there is a coupling π P(ν 0, ν 1) such that for all t [0, 1], it holds and (1) H(ν π t µ n ) (1 t)h(ν 0 µ n ) + th(ν 1 µ n ) (2) H(ν π t µ n ) (1 t)h(ν 0 µ n ) + th(ν 1 µ n ) where ν π t = x 0,x 1 Ω n π(x 0, x 1) z x 0,x 1 t d(x 0,z) (1 t) d(z,x 1) δ z, 2t(1 t) W1 2 (ν 0, ν 1) n t(1 t) W 2 2 (ν 0, ν 1), 2 where z x 0, x 1 means that z belongs to some geodesic joining x 0 to x 1.

A first consequence - Transport inequalities and concentration µ n verifies the following transport-entropy inequalities W 2 1 (ν 0, µ n ) n 2 H(ν0 µn ), ν 0 P(Ω n) and W 2 2 (ν 0, µ n ) 2 H(ν 0 µ n ), ν 0 P(Ω n).

A first consequence - Transport inequalities and concentration µ n verifies the following transport-entropy inequalities W1 2 (ν 0, µ n ) n 2 H(ν0 µn ), ν 0 P(Ω n) and W 2 2 (ν 0, µ n ) 2 H(ν 0 µ n ), ν 0 P(Ω n). They imply respectively, the following well known concentration inequalities µ n (f µ n (f ) + t) e n 2 t2, t 0, for all function f : Ω n R, 1-Lipschitz with respect to Hamming distance. µ n (f µ n (f ) + t) e t2 4, t 0, for all f : [0, 1] n R convex and 1-Lipschitz for the Euclidean distance on R n.

A general framework. In all the sequel, G = (V, E) will be a finite connected graph equipped with the classical graph distance d. If γ = (x 0, x 1,..., x n) (n = d(x, y)) is a geodesic between x and y, we write γ(k) = x k.

W 1 -geodesics Definition [L 1-Wasserstein distance] For all ν 0, ν 1 P(V ), W 1(ν 0, ν 1) = inf π P(ν0,ν 1 ) d(x0, x 1) π(dx 0dx 1). Important example: Let V = {0, 1,..., n}; consider the Binomial distribution B(n, t) defined by ( ) n n B(n, t) = t k (1 t) n k δ k P(V ). k k=0 Then (B(n, t)) t [0,1] is a W 1 constant speed geodesic connecting δ 0 to δ n.

Construction of (pseudo)-w 1 geodesic. Let π be a coupling between ν 0 and ν 1 and (X 0, X 1) a random variable with law π.

Construction of (pseudo)-w 1 geodesic. Let π be a coupling between ν 0 and ν 1 and (X 0, X 1) a random variable with law π. For all x 0, x 1 V let G(x 0, x 1) be the set of geodesics connecting x 0 to x 1 and consider the following random variables: - Γ x 0,x 1 is a random variable uniformly distributed over G(x 0, x 1) and independent of (X 0, X 1). - for all t [0, 1], N x 0,x 1 t is a random variable following a B(n, t) distribution with n = d(x 0, x 1) independent of Γ x 0,x 1 and of (X 0, X 1). Finally, set X t = Γ X 0,X 1 (N X 0,X 1 t ) and ν π t = Law(X t).

Explicit formula for ν π t Proposition For all ν 0, ν 1 P(V ), and t [0, 1], νt π = π(x 0, x ( ) d(x 0, x 1) 1) t d(x0,z) (1 t) d(z,x 1) G(x 0, z, x 1) δ z, d(x x 0,x 1 z V 0, z) G(x 0, x 1) where G(x 0, x 1) is the set of geodesic connecting x 0 to x 1. G(x 0, z, x 1) is the set of geodesics connecting x 0 to x 1 and crossing the vertex z. In particular, on the cube, if z x 0, x 1 and so we recover G(x 0, x 1) = d(x 0, x 1)! G(x 0, z, x 1) = d(x 0, z)!d(z, x 1)! ν π t = x 0,x 1 Ω n π(x 0, x 1) z x 0,x 1 t d(x 0,z) (1 t) d(z,x 1) δ z

Marton s weak W 2 distance Let ν 0, ν 1 P(V ).

Marton s weak W 2 distance Let ν 0, ν 1 P(V ). If π P(ν 0, ν 1), consider its conditional disintegration w.r.t the first coordinate: π(dx 0dx 1) = ν 0(dx 0)p(x o, dx 1), where p : V P(V ) is a probability kernel. Definition [Marton s transport cost] ( T 2(ν 1 ν 0) = inf π P(ν 0,ν 1 ) d(x 0, x 1) p(x 0, dx 1)) 2 ν 0(dx 0).

Marton s weak W 2 distance Proposition On any space X equipped with the trivial distance 1I x y, the following holds [ ] 2 T 2(ν 1 ν 0) = 1 f1 f 0 dµ, f 0 + where ν 0 = f 0 µ, ν 1 = f 1 µ. This formula will be useful on the two point space {0, 1} or more generally on the complete graph K n.

The two point space On the two point space whatever the coupling π is. ν π t = (1 t)ν 0 + tν 1

The two point space On the two point space whatever the coupling π is. ν π t = (1 t)ν 0 + tν 1 Differentiating twice the function H(t) := H((1 t)ν 0 + tν 1 µ) it is easy to check that [ ] 2 [ ] 2 H (t) 1 f1 f 0 dµ + 1 f0 f 1 dµ = f 0 + f W 2 2 (ν 0, ν 1). 1 + So Proposition For all probability measure µ on Ω 1 = {0, 1}, it holds H(ν π t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) t(1 t) W 2 2 (ν 0, ν 1). 2

Tensorisation Theorem [GRST] Assume that a probability µ on V verifies the following property: There is K 0 such that for all ν 0, ν 1 P(V ), there is some π P(ν 0, ν 1) such that for all t [0, 1], it holds H(ν π t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) K t(1 t) W 2 2 (ν 0, ν 1). 2 Then the product measure µ n, verifies the following property: for all ν 0, ν 1 P(V n ), there is some ˆπ P(ν 0, ν 1) such that for all t [0, 1], it holds H(ν ˆπ t µ n ) (1 t)h(ν 0 µ n ) + th(ν 1 µ n ) K t(1 t) W 2,n(ν 2 0, ν 1), 2 with the same constant K as above, where W 2,n is the tensorised Marton s distance.

Tensorisation Ingredients of the proof:

Tensorisation Ingredients of the proof: 1 Tensorisation of the pseudo W 1 geodesic. Denoting by ν x,y t the W 1-geodesic connecting δ x to δ y, x, y V n. Then ν x,y t = n i=1 ν x i,y i t. Strongly connected to the choice of the Binomial distributions. 2 Knothe-Rosenblatt construction. (n = 2) Write ν 0(dx 1dx 2) = ν 1 0(dx 1)ν 2 0(dx 2 x 1) and ν 1(dy 1dy 2) = ν 1 1(dy 1)ν 2 1(dy 2 y 1). Suppose that π 1 P(ν 1 0, ν 1 1) and π 2( x 1, y 1) P(ν 2 0( x 1), ν 2 1( y 1)).

Consequences in terms of functional inequalities Theorem [GRST] Suppose that a probability µ on V verifies the following property: K 0 such that ν 0, ν 1 P(V ), π P(ν 0, ν 1) such that t [0, 1], it holds H(ν π t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) K t(1 t) (I 2(π) + 2 Ī2(π)). Then, for all n N, the following HWI inequality holds: for all ν 0 P(V n ), π P(ν 0, µ), H(ν 0 µ) x V n i=1 n z N i (x) [ log ν0(x) ] ν0(z) log µ(x) µ(z) + where N i (x) = {y V n ; x i y i and d(x, y) = 1}. K 2 2 ν 0(x) I (n) 2 (π) (I (n) 2 (π) + Ī (n) 2 (π)),

A weak Log-Sobolev Theorem [GRST] Suppose that a probability µ on V verifies the following property: K 0 such that ν 0, ν 1 P(V ), π P(ν 0, ν 1) such that t [0, 1], it holds H(ν π t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) K t(1 t) W 2 2 (ν 0, ν 1). 2 Then, for all n N, the following HWI inequality holds: for all ν 0 P(V n ), H(ν 0 µ) 2 n 2 [ log ν0(x) ] ν0(z) log ν 0(x) K µ(x) µ(z) + x V n i=1 z N i (x) This LSI applied to the hypercube gives back Log-Sobolev for the Gaussian measure using the central limit theorem.

Discrete forms of the Prekopa-Leindler inequality Theorem [GRST] Suppose that a probability µ on V verifies the following property: K 0 such that ν 0, ν 1 P(V ), π P(ν 0, ν 1) such that t [0, 1], it holds H(ν π t µ) (1 t)h(ν 0 µ) + th(ν 1 µ) K t(1 t) W 2 2 (ν 0, ν 1). 2 Then, for all n N and for all triple of functions f, g, h : V n R such that for some t (0, 1) and for all m P(V n ), h(z) ν x,y t (dz) m(dy) (1 t)f (x) + t Kt(1 t) 2 i=1 g(y) m(dy) n ( 2 d(x i, y i ) m(dy)), x V n,

Discrete forms of the Prekopa-Leindler inequality The proof follows easily from the following general duality formula ( ) { } log e h dµ = sup ν P(X ) h dν H(ν µ). Consequences: PL implies a form of the log-sobolev inequality and a transport-entropy inequality involving Marton s transport costs.

Thank you for your attention! Congratulations to Alain!