Mixing in Product Spaces. Elchanan Mossel

Size: px

Start display at page:

Download "Mixing in Product Spaces. Elchanan Mossel"

Edgar O’Neal’
5 years ago
Views:

2 Poincaré Recurrence Theorem Theorem (Poincaré, 1890) Let f : X X be a measure preserving transformation. Let E X measurable. Then P[x E : f n (x) / E, n > N(x)] = 0

3 Poincaré Recurrence Theorem Theorem (Poincaré, 1890) Let f : X X be a measure preserving transformation. Let E X measurable. Then P[x E : f n (x) / E, n > N(x)] = 0 One of the first results in Ergodic Theory. Long term mixing. This talk is about short term mixing.

4 Finite Markov Chains As a first example consider a Finite Markov chain. Let M be a k k doubly stochastic symmetric matrix. Pick X 0 uniformly at random from 1,..., k. Given X i = a, let X i+1 = b with probability M a,b.

5 Finite Markov Chains As a first example consider a Finite Markov chain. Let M be a k k doubly stochastic symmetric matrix. Pick X 0 uniformly at random from 1,..., k. Given X i = a, let X i+1 = b with probability M a,b. Theorem (Long Term Mixing for Markov Chains) Suppose that other than 1, all eigenvalues λ i of M satisfy λ i λ < 1. Then for any two sets A, B [k], it holds that P[X 0 A, X t B] P[A]P[B] λ t

6 Short Term Mixing for Markov Chains Theorem P[X 0 A, X 1 B] P[A]P[B] is upper bounded by λ P[A](1 P[A])P[B](1 P[B]) Shows: mixing in one step for large sets.

7 Short Term Mixing for Markov Chains Theorem P[X 0 A, X 1 B] P[A]P[B] is upper bounded by λ P[A](1 P[A])P[B](1 P[B]) Shows: mixing in one step for large sets. Proof: 1 A = P[A]1 + f, 1 B = P[B]1 + g, where f, g 1

8 Short Term Mixing for Markov Chains Theorem P[X 0 A, X 1 B] P[A]P[B] is upper bounded by λ P[A](1 P[A])P[B](1 P[B]) Shows: mixing in one step for large sets. Proof: 1 A = P[A]1 + f, 1 B = P[B]1 + g, where f, g 1 P[X 0 A, X 1 B] = 1 k (P[A]1 + f )t M(P[B]1 + g) = P[A]P[B] + 1 k f t Mg, 1 k f t Mg λ f 2 g 2 = λ P[A](1 P[A])P[B](1 P[B])

9 Short Term Mixing for Markov Chains Theorem P[X 0 A, X 1 B] P[A]P[B] is upper bounded by λ P[A](1 P[A])P[B](1 P[B]) Shows: mixing in one step for large sets. Proof: 1 A = P[A]1 + f, 1 B = P[B]1 + g, where f, g 1 P[X 0 A, X 1 B] = 1 k (P[A]1 + f )t M(P[B]1 + g) = P[A]P[B] + 1 k f t Mg, 1 k f t Mg λ f 2 g 2 = λ P[A](1 P[A])P[B](1 P[B]) Also called Expander Mixing Lemma. Used a lot in computer science, e.g. in (de)randomization.

10 The tensor property Consider (Y 1, Z 1 ),..., (Y n, Z n ) which are drawn independently from the distribution of (X 0, X 1 ). Equivalently, the transition matrix from Y = (Y 1,..., Y n ) to Z = (Z 1,..., Z n ) is M n.

11 The tensor property Consider (Y 1, Z 1 ),..., (Y n, Z n ) which are drawn independently from the distribution of (X 0, X 1 ). Equivalently, the transition matrix from Y = (Y 1,..., Y n ) to Z = (Z 1,..., Z n ) is M n. Thm = that for any sets A, B [k] n : P[Y A, Z B] P[A]P[B] λ P[A](1 P[A])P[B](1 P[B])

12 The tensor property Consider (Y 1, Z 1 ),..., (Y n, Z n ) which are drawn independently from the distribution of (X 0, X 1 ). Equivalently, the transition matrix from Y = (Y 1,..., Y n ) to Z = (Z 1,..., Z n ) is M n. Thm = that for any sets A, B [k] n : P[Y A, Z B] P[A]P[B] λ P[A](1 P[A])P[B](1 P[B]) Follows immediately from tensorization of the spectrum.

13 Log Sobolev inequalities Entropy, Log Sobolev and hyper-contraction A similar story could be told using more sophisticated analytic tools. Easier to work with Markov semi-groups T t = e tl.

14 Log Sobolev inequalities Entropy, Log Sobolev and hyper-contraction A similar story could be told using more sophisticated analytic tools. Easier to work with Markov semi-groups T t = e tl. Entropy, Dirchelet Form Ent(f ) = E(f log f ) Ef log Ef E(f, g) = E(fLg) = E(gLf ) = E(g, f ) = d dt EfT tg. t=0

15 Log Sobolev inequalities Entropy, Log Sobolev and hyper-contraction A similar story could be told using more sophisticated analytic tools. Easier to work with Markov semi-groups T t = e tl. Entropy, Dirchelet Form Ent(f ) = E(f log f ) Ef log Ef E(f, g) = E(fLg) = E(gLf ) = E(g, f ) = d dt EfT tg. t=0 Definition of Log-Sob p-logsob(c) f, Ent(f p ) Cp2 4(p 1) E(f p 1, f ) (p 0, 1) 1-logSob(C) f, Ent(f ) C 4 E(f, log f ) 0-logSob(C) f, Var(log f ) C 2 E(f, 1/f )

16 Log Sob. Inequalities and Hyper-Contraction Hyper-Contraction (Gross, Nelson ) r-logsob with constant C implies T t f p f q, t C 4 log p 1 q 1, 1 < p < q < r or r < q < p = E[g(X 0 )f (X t )] = E[gT t f g p Tf p g p f q If f = 1 A and g = 1 B, get: P[X 0 A, X t B] 1 A q 1 B p = P[A] 1/q P[B] 1/p, Now optimize over norms to get a better bound than CS.

17 Reverse-Hyper-Contraction Log-Sobolev and Rev. Hyper-Contraction(M-Oleszkiewicz-Sen-13) Let T t = e tl be a general Markov semi-group satisfying 2-Logsob with constant C or 1-Logsob inequality with constant C. Then for all q < p < 1, all positive f, g and all t C 4 holds that T t f q f p = E[g(X 0 )f (X t )] = E[gT t f ] g q f p log 1 q 1 p it

18 Short-Time Implications Theorem (M-Oleszkiewicz-Sen-13 ; Short-Time Implications) Let T t = e tl, where L satisfy 1 or 2-LogSob inequality with constant C. Let A, B Ω n with P[A] ɛ and P[B] ɛ. Then: P[X (0) A, X (t) B] ɛ 2 1 e 2t/C

19 Short-Time Implications Theorem (M-Oleszkiewicz-Sen-13 ; Short-Time Implications) Let T t = e tl, where L satisfy 1 or 2-LogSob inequality with constant C. Let A, B Ω n with P[A] ɛ and P[B] ɛ. Then: P[X (0) A, X (t) B] ɛ 2 1 e 2t/C Comments 1. Works for small sets too. 2. Tensorizes. 3. Some examples where it is (almost) tight. 4. Uses in social choice analysis, queuing theory.

20 Comment: typical application MCMC Long Time Behavior Log Sobolev inequalities play a major role in analyzing long term mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloff-Coste etc.) Long Time Behavior The ε-total variation mixing time of a finite Markov chain is bounded by: 1 λ (log(1/π ) + log(1/ɛ)) 1 C (log log(1/π ) + log(1/ɛ)) for a continuous time Markov chain with spectral gap λ and 2-LogSob C.

21 Comment: typical application MCMC Long Time Behavior Log Sobolev inequalities play a major role in analyzing long term mixing of Markov chains, in particular in analysis of mixing times (Diaconis, Saloff-Coste etc.) Long Time Behavior The ε-total variation mixing time of a finite Markov chain is bounded by: 1 λ (log(1/π ) + log(1/ɛ)) 1 C (log log(1/π ) + log(1/ɛ)) for a continuous time Markov chain with spectral gap λ and 2-LogSob C.

22 What are these lectures about? High Dimensional Phenomena High dimensional mixing: mixing of product processes on product spaces Ω n with n large. Tight bounds For which processes, given measures a and b can we find precise upper/lower bounds for sup ( P[X 0 A, X t B] : P[A] = a, P[B] = b ) Interested in product space/processes of dimension n and answers as n. Most important examples / techniques from probability / analysis.

23 What are these lectures about? Mulit-step prcoesses How to bound P[X 0 A 0, X 1 A 1,..., X k A k ] for processes X 0,..., X k? Interested in product space/processes of dimension n and answers as n. Most important examples / techniques from additive combinatorics.

24 What are these lectures about? And more Theory that does both? Applications?

25 Today: tight bounds Borell s result. Open Problem: The Boolean cube. The state of affairs - partition into 3 parts or more.

26 Two Examples: Gaussian, Boolean Correlated pairs (M-O Donnell-Regev-Steif-Sudakov-05): Let x, y { 1, 1} n be e t correlated: x is chosen uniformly and y is T t correlated version. i.e. E[x i y i ] = e t for all i independently Let A, B { 1, 1} n 1/2 with P[A] ɛ and P[B] ɛ Then: P[x A, y B] ɛ 2 1 e t Easy to prove when A = B...

27 Two Examples: Gaussian, Boolean Correlated pairs (M-O Donnell-Regev-Steif-Sudakov-05): Let x, y { 1, 1} n be e t correlated: x is chosen uniformly and y is T t correlated version. i.e. E[x i y i ] = e t for all i independently Let A, B { 1, 1} n 1/2 with P[A] ɛ and P[B] ɛ Then: P[x A, y B] ɛ 2 1 e t Easy to prove when A = B... Gaussian Version Let x, y R n two Gaussian vectors: x N(0, 1), y N(0, 1), E[x i y j ] = e t δ i,j Let A, B R n with P[A] ɛ and P[B] ɛ Then: P[x A, y B] ɛ 2 1 e t

28 Borell s Result and Open Problems Borell (85): In Gaussian case the maximum and minimum of P[x A, y B] as a function of P[A] and P[B] is obtained for parallel half-spaces.

29 Borell s Result and Open Problems Borell (85): In Gaussian case the maximum and minimum of P[x A, y B] as a function of P[A] and P[B] is obtained for parallel half-spaces. Do not know what is the optimum in { 1, 1} n. In particular:

30 Borell s Result and Open Problems Borell (85): In Gaussian case the maximum and minimum of P[x A, y B] as a function of P[A] and P[B] is obtained for parallel half-spaces. Do not know what is the optimum in { 1, 1} n. In particular: Open Problem: lim min(p[x A, Y B] : A, B { 1, n 1}n, P[A] = P[B] = 1/4) and similarly for max.

31 Borell s Result and Open Problems Borell (85): In Gaussian case the maximum and minimum of P[x A, y B] as a function of P[A] and P[B] is obtained for parallel half-spaces. Do not know what is the optimum in { 1, 1} n. In particular: Open Problem: lim min(p[x A, Y B] : A, B { 1, n 1}n, P[A] = P[B] = 1/4) and similarly for max. Partition to 3 or more parts even in Gaussian space.

32 If there is time before the break... A cute proof of a special case of Borell s result. Connections to social choice Theory.

33 Simple Example 1 Cosmic coin problem(m-o Donnell-05): x { 1, 1} n uniform. (y i ) m 1 conditionally independent given x. Each pair (x, y i ) is ρ-correlated. Problem: What is the largest P[y 1 A,... y m A] can be?

34 Simple Example 2 (y i,j ) 1 i<j m is an exchangeable collection of vectors in { 1, 1} n. If I J = 1 then y I, y J are 1/3 correlated. Otherwise independent. Why? If n voters rank alternatives uniformly at random, the pairwise preferences between alternatives will be given by the collection y.

35 Full support finite Ω using hyper-contraction Thm: More General Reverse Hypercontractivity Theorem (M-Oleszkiewicz-Sen-13) Let a the measure Ψ over a finite Ω k satisfy min x1,...,x k Ω Pr[X 1 = x,..., X k = x k ] = α > 0 and have equal marginals.

36 Full support finite Ω using hyper-contraction Thm: More General Reverse Hypercontractivity Theorem (M-Oleszkiewicz-Sen-13) Let a the measure Ψ over a finite Ω k satisfy min x1,...,x k Ω Pr[X 1 = x,..., X k = x k ] = α > 0 and have equal marginals. Consider the distribution Ψ n and let A 1,..., A k Ω n, µ(a i ) µ. Then: Pr[X 1 A 1,... X k A k ] µ O( 1 α), where (X 1 (i),..., X k (i)) are i.i.d. according to Ψ. Note This is a key tool of analyzing the examples above as well as many others.

37 Notation Distributed according to P := P n. Tuples X i are i.i.d. according to P. The marginals of P are π j. Vectors X (j) are distributed according to π j := π n j. X X 1 X 2... X i... X n X (1) X (1) 1 X (1) 2 X (2) X (2) 1 X (2) 2. X (j)... X (j) 1 X (j) 2.. X (1) i X (2) i. X (j) i. X (l) X (l) X (l) 1 X (l) 2 i X (1) n X (2) n. X n (j). X n (l)

38 Lower Bounds We are mostly interested in two types of lower bounds: Set hitting: Lower bounds on in terms of P[A 1 ],..., P[A k ] P[X 1 A 1,..., X k A k ] Same set hitting: Lower bounds on in terms of P[A]. P[X 1 A,..., X k A] Set hitting will require something... - e.g. X 1 = X 2 =... = X k.

39 Gaussian Bounds Borell (85) k = 2 - parallel half-spaces are optimal (also Isaksson-Mossel, Neeman) By a Reverse Brascamp-Lieb inq. (Ledoux, Chen-Dafnis-Paouris 14-15) for A,... C R n : P[U A,..., Z C] (P[A] P[C]) 1/(1 ρ2), where ρ is the second eigenvalue of Σ. Doesn t require independence of coordinates

40 Full Support Case Thm: More General Reverse Hypercontractivity Theorem (M-Oleszkiewicz-Sen-13) Let a the measure Ψ over a finite Ω k satisfy min x1,...,x k Ω Pr[X 1 = x,..., X k = x k ] = α > 0 and have equal marginals.

41 Full Support Case Thm: More General Reverse Hypercontractivity Theorem (M-Oleszkiewicz-Sen-13) Let a the measure Ψ over a finite Ω k satisfy min x1,...,x k Ω Pr[X 1 = x,..., X k = x k ] = α > 0 and have equal marginals. Then: Pr[X 1 A 1,... X k A k ] µ O( 1 α), where (X 1 (i),..., X k (i)) are i.i.d. according to Ψ.

42 Non full support? What if the support of Ω is not full? Do we care?

43 Non full support? What if the support of Ω is not full? Do we care? Maybe: This is what additive combinatorics is all about. In particular: finite cominatorics in finite field models (Green ). Many other applications in combinatorics and computer science.

44 Additive combinatorics perspective Example: Theorem (Finite Field Roth Theorem) Y, R be chosen uniformly at random at F n 3. Then for every µ > 0 there exists c(µ) > 0, N(µ) such that if n N(µ) and A F n 3 satisfies P[A] µ, then: P[Y A, Y + R A, Y + 2R A] c(µ).

45 Additive combinatorics perspective Example: Theorem (Finite Field Roth Theorem) Y, R be chosen uniformly at random at F n 3. Then for every µ > 0 there exists c(µ) > 0, N(µ) such that if n N(µ) and A F n 3 Why is this true? satisfies P[A] µ, then: P[Y A, Y + R A, Y + 2R A] c(µ).

46 Fourier Obstructions Theorem (Finite Field Roth Theorem - Analysis) Let Y, R be chosen uniformly at random at F n 3. Let A, B, C F n 3 then P[Y A, Y + R B, Y + 2R C] P[A]P[B]P[C] Â

47 Fourier Obstructions Theorem (Finite Field Roth Theorem - Analysis) Let Y, R be chosen uniformly at random at F n 3. Let A, B, C F n 3 then P[Y A, Y + R B, Y + 2R C] P[A]P[B]P[C] Â Only obstruction to uniformity is linear structure

48 Fourier Obstructions Theorem (Finite Field Roth Theorem - Analysis) Let Y, R be chosen uniformly at random at F n 3. Let A, B, C F n 3 then P[Y A, Y + R B, Y + 2R C] P[A]P[B]P[C] Â Only obstruction to uniformity is linear structure If A = B = C, high Fourier coefficient = can restrict to linear subspace with higher denisty

49 Fourier Obstructions Theorem (Finite Field Roth Theorem - Analysis) Let Y, R be chosen uniformly at random at F n 3. Let A, B, C F n 3 then P[Y A, Y + R B, Y + 2R C] P[A]P[B]P[C] Â Only obstruction to uniformity is linear structure If A = B = C, high Fourier coefficient = can restrict to linear subspace with higher denisty Density increase arguments...

50 Higher Order Arithmetic Obstructions Furstenberg-Weiss (80s): For longer arithmetic progressions, obstructions other than Fourier. Gowers: Obstructions can be identified using the Gowers norms. Again - use obstruction to your benefit. Thm: (Gowers 08; Rodel and Skokan 04,06): If q is prime and l q then for every µ > 0 there exists c(µ) > 0, N(µ) such that if n N(µ) and A Fq n satisfies P[A] µ, then: P[Y A, Y + R A,..., Y + (l 1)R A] c(µ), where A, R F n q are chosen uniformly at random.

51 Higher Order Arithmetic Obstructions Furstenberg-Weiss (80s): For longer arithmetic progressions, obstructions other than Fourier. Gowers: Obstructions can be identified using the Gowers norms. Again - use obstruction to your benefit. Thm: (Gowers 08; Rodel and Skokan 04,06): If q is prime and l q then for every µ > 0 there exists c(µ) > 0, N(µ) such that if n N(µ) and A Fq n satisfies P[A] µ, then: P[Y A, Y + R A,..., Y + (l 1)R A] c(µ), where A, R F n q are chosen uniformly at random. Question: Is the additive structure necessary?

52 Obstruction to Chaos Consider the support of Ω as a graph G with vertex V = all atoms with non-zero weight and edges between any two atoms that differ in one coordinate. We say that ρ < 1 if the graph G is connected. More formally: Definition { ρ(p, S, T ) := sup Cov[f (X (S) ), g(x (T ) )] f : Ω (S) R, g : Ω (T ) R, } Var[f (X (S) )] = Var[g(X (T ) )] = 1. The correlation of P is ρ(p) := max j [l] ρ (P, {j}, [l] \ {j}).

53 The quest for a unifying theory Is there one theory that explains both the noisy examples and the additive theory?

54 Example Let X be uniform in F n 3. Let Y i = X i or X i + 1 with probability 1/2 independently for each coordinate. Theorem = P[X A, Y a] c(p[a]). Motivation from understanding parallel repetition. Does not follow from hyper-contraction nor does it follow from additive techniques...

55 A General Result Theorem (+ Hazla, Holenstein) Suppose (X, Y ) is distributed in a finite Ω 2 such that: α = min a P[X = Y = a] > 0. P[X = a] = P[Y = a] for all a. Then for any set A Ω n with P X n[a] = P Y n[a] µ it holds that P[X A, Y A] c(α, µ) > 0 Our c is pretty bad: c = 1/ exp(exp(exp(1/(µ) D ))), D = D(α)

56 A General Result Theorem (+ Hazla, Holenstein) Suppose (X, Y ) is distributed in a finite Ω 2 such that: α = min a P[X = Y = a] > 0. P[X = a] = P[Y = a] for all a. Then for any set A Ω n with P X n[a] = P Y n[a] µ it holds that P[X A, Y A] c(α, µ) > 0 Our c is pretty bad: c = 1/ exp(exp(exp(1/(µ) D ))), D = D(α) Related to the fact that the proof is interesting: 1 Lose in Regularity Lemma type arguments. 2 Lose in Invariance transforming the problem to a Gaussian problem.

57 A Markov Chain Theorem and a general process theorem Theorem[+Hazla, Holenstein] X i, Y i, Z i,..., W i be a Markov chain over Ω with min x Ω Pr[X i = Y i = Z i =... W i = x] = β > 0 and uniform marginals. Let A Ω n, µ(a) = µ > 0. Pr[X A Y A Z A,..., W A] f (µ, β) > 0. Theorem[+Hazla, Holenstein] X i, Y i, Z i,..., W i be distributed over Ω k with min x Ω Pr[X i = Y i = Z i =... W i = x] = β > 0 and uniform marginals. Suppose further that ρ(x i, Y i,..., W i ) < 1. Let A Ω n, µ(a) = µ > 0. Pr[X A Y A Z A,..., W A] f (µ, β) > 0.

58 The condition ρ < 1 Weaker than full support.

59 The condition ρ < 1 Weaker than full support. Does not hold in arithmetic setups.

60 The condition ρ < 1 Weaker than full support. Does not hold in arithmetic setups. ρ = 1 iff the support of Ψ is connected with respect to changing one coordinate at a time.

61 The condition ρ < 1 Weaker than full support. Does not hold in arithmetic setups. ρ = 1 iff the support of Ψ is connected with respect to changing one coordinate at a time. Example: (x, y) F3 2 where y = x, x + 1 has ρ < 1 but not full support.

62 Open Problems Still searching for unified theory. Concrete Example: Suppose Ψ is uniform over {(0, 0, 0), (1, 1, 1), (2, 2, 2), (0, 1, 2), (1, 2, 0), (2, 0, 1)} ρ = 1 but not arithmetic. Do not understand.

63 Questions??

64 Questions?? Thank you!

Reverse Hyper Contractive Inequalities: What? Why?? When??? How????

Reverse Hyper Contractive Inequalities: What? Why?? When??? How???? UC Berkeley Warsaw Cambridge May 21, 2012 Hyper-Contractive Inequalities The noise (Bonami-Beckner) semi-group on { 1, 1} n 1 2 Def 1: