Hypercontractivity of spherical averages in Hamming space

Hypercontractivity of spherical averages in Hamming space Yury Polyanskiy Department of EECS MIT yp@mit.edu Allerton Conference, Oct 2, 2014 Yury Polyanskiy Hypercontractivity in Hamming space 1

Hypercontractivity: definition L p -norms of random variables: ( f(x) Lp(µ) f(x) p µ(dx) U p (E[ U p ]) 1 p ) 1 p Let P XY joint distribution. Define T Y X stochastic (Markov) averaging operator T Y X f(x) E[f(Y ) X = x] = T f p f p by Jensen s. T is hypercontractive if T p q = 1 for p < q: T f q f p, T p q sup f T f q f p T f gains integrability! (by smoothing peaks) T is HC for some p T is HC for all p. Yury Polyanskiy Hypercontractivity in Hamming space 2

Hypercontractivity: examples Nelson-Gross: (X, Y ) jointly Gaussian: E[f(Y ) X] 4 f(y ) 2 ρ(x, Y ) 1 3. [ ] Bonami-Gross: (X, Y ) 1 1 δ δ 2 BSS(δ): δ 1 δ E[f(Y ) X] q f(y ) p p 1 (1 2δ)2 q 1 Bonami-Beckner: T p q = 1 T n p q = 1 Ahslwede-Gács: for finite X, Y pretty much any T Y X is HC Yury Polyanskiy Hypercontractivity in Hamming space 3

What is HC good for? Canonical example: Estimate probability of a rectangle P[X A, Y B], where P[X A] P[Y B] = θ 1. Spectral gap (expanders, etc): 1 = σ 1 (T ) > σ 2 (T ) > P[X A, Y B] θ 2 + σ 2 θ(1 θ) σ 2 θ Hypercontractivity: P[X A, Y B] = (1 A, T 1 B ) 1 A q T 1 B q 1 A q 1 B p = θ 1 1 q + 1 p = θ 1+ɛ σ 2 θ (!) Note: Used 0/1 nature of indicator functions. Yury Polyanskiy Hypercontractivity in Hamming space 4

Hypercontractivity and KL-divergence P X P Y X P Y X Y Q X Q Y Apply to typical sets: So we get: 2 nd(q X P X ) P[X n T n [Q X ], Y n T n [Q Y ] ] 2 n 1 q D(Q X P X ) n 1 p D(Q Y P Y ) T Y X p q = 1 = D(Q Y P Y ) p q D(Q X P X ) Q X Yury Polyanskiy Hypercontractivity in Hamming space 5

Hypercontractivity and KL-divergence P X P Y X P Y X Y Q X Q Y Apply to typical sets: So we get: 2 nd(q X P X ) P[X n T n [Q X ], Y n T n [Q Y ] ] 2 n 1 q D(Q X P X ) n 1 p D(Q Y P Y ) T Y X p q = 1 = D(Q Y P Y ) p q D(Q X P X ) Q X = Yury Polyanskiy Hypercontractivity in Hamming space 5

Hypercontractivity and KL-divergence P X P Y X P Y X Y Q X Q Y For fixed P XY, KL-contraction ratio: η KL (P XY ) sup Q X D(Q Y P Y ) D(Q X P X ) = sup U X Y I(U; Y ) I(U; X) Yury Polyanskiy Hypercontractivity in Hamming space 6

Hypercontractivity and KL-divergence P X P Y X P Y X Y Q X Q Y For fixed P XY, KL-contraction ratio: Theorem (Ahlswede-Gács) Roughly: η KL (P XY ) sup Q X D(Q Y P Y ) D(Q X P X ) = η KL (P XY ) = inf sup U X Y { } p q : T Y X p q = 1 I(U; Y ) I(U; X) Q X : D(Q Y P Y ) rd(q X P X ) = T Y X p p r = 1, p 1 Yury Polyanskiy Hypercontractivity in Hamming space 6

Hypercontractivity and KL-divergence [Nair 2014]: Apply HC to typical sets of arbitrary Q XY to get T Y X p q = 1 = 1 p D(Q Y P Y ) 1 q D(Q X P X ) + D(Q Y X P Y X Q X ) Q XY Yury Polyanskiy Hypercontractivity in Hamming space 7

Hypercontractivity and KL-divergence [Nair 2014]: Apply HC to typical sets of arbitrary Q XY to get T Y X p q = 1 1 p D(Q Y P Y ) 1 q D(Q X P X ) + D(Q Y X P Y X Q X ) Q XY Proof: Take g 1 1 = 1 and let f = (g 1 ) 1 p STP: T f q 1. Ahlswede-Gács: L p Nair: Q X (x) = P X (x) T f(x)q T f q q Q Y X (y x) = P Y X (y x) Q X (x) =... same... Q Y X (y x) = P Y X (y x) f(y) T f(x) log T f q 1 p D(Q Y P Y ) 1 q D(Q X P X ) D(Q Y X P Y X Q X ) 0 Yury Polyanskiy Hypercontractivity in Hamming space 7

Hypercontractivity and KL-divergence [Nair 2014]: Apply HC to typical sets of arbitrary Q XY to get T Y X p q = 1 1 p D(Q Y P Y ) 1 q D(Q X P X ) + D(Q Y X P Y X Q X ) Q XY Proof: Take g 1 1 = 1 and let f = (g 1 ) 1 p STP: T f q 1. Ahlswede-Gács: L p Nair: Q X (x) = P X (x) T f(x)q T f q q Q Y X (y x) = P Y X (y x) Q X (x) =... same... Q Y X (y x) = P Y X (y x) f(y) T f(x) log T f q 1 p D(Q Y P Y ) 1 q D(Q X P X ) D(Q Y X P Y X Q X ) 0 Note: For p 1 both have Q Y X P Y X. Yury Polyanskiy Hypercontractivity in Hamming space 7

L p L q norm as measure of dependence T Y X p q measures dependence: Invariant to relabeling Satisfies data-processing Unusual: sticky at 1. Yury Polyanskiy Hypercontractivity in Hamming space 8

L p L q norm as measure of dependence T Y X p q measures dependence: Invariant to relabeling Satisfies data-processing Unusual: sticky at 1. Turns out the last one is a general phenomenon: All operators T sufficiently noisy are p q HC. Yury Polyanskiy Hypercontractivity in Hamming space 8

Semenov-Shneiberg 88 Theorem For any q 2 p there is ɛ = ɛ(p, q) s.t. simultaneously for all P XY : T Y X E Y p q ɛ T Y X p q = 1 where E Y f 1 X E[f(Y )]. Remarks: Also for general q < p with extra condition: T E 2 2 ɛ Also for other operators of the type: T 1 Y = 1 X, (1 X, T ) = (1 X, E ) Orig. result is for P X = P Y, but proof easily generalizes. For finite (X, Y ), suff. to check P XY P X P Y T V ɛ = ɛ (p, q, P X, P Y ) Yury Polyanskiy Hypercontractivity in Hamming space 9

Semenov-Shneiberg 88: Proof Proof for q 2 p: WLOG can take f = 1 + Z with Z 1. Decompose T = E +. STP: ( ) 1 + Z q 1 + Z p Cool fact: universal τ, a, b s.t. 1 + Z q 1 + a Z 2 q if Z q τ 1 + Z p 1 + b Z 2 p if Z p τ Whenever 2 p q b a and Z p τ: 1 + Z) q 1 + a Z 2 q 1 + Z p Yury Polyanskiy Hypercontractivity in Hamming space 10

Semenov-Shneiberg 88: Proof Proof for q 2 p: WLOG can take f = 1 + Z with Z 1. Decompose T = E +. STP: ( ) 1 + Z q 1 + Z p Cool fact: universal τ, a, b s.t. 1 + Z q 1 + a Z 2 q if Z q τ 1 + Z p 1 + b Z 2 p if Z p τ Whenever 2 p q b a and Z p τ: 1 + Z) q 1 + a Z 2 q 1 + Z p Large Z: Another cool fact: universal β p ( ) 1 + Z p 1 + β p ( Z p ), and β p(u) u increasing So (*) holds if p q βp(τ) τ. Yury Polyanskiy Hypercontractivity in Hamming space 10

Spherical averages in Hamming space Hamming space F n 2 with uniform probability Bonami-Gross operator N δ f(x) E[f(x + Z)], Z Bern(δ) n Spherical average: T δ f(x) E[f(x + Z)], Z Unif(S δn ) where S j {y : wt(y) = j}. Theorem (P 13) δ 1 2 and p = 1 + (1 2δ)2 there is a dim-independent C p,δ s.t. T δ f 2 C p,δ f p Remark: Bonami-Gross N δ f 2 1 f p Yury Polyanskiy Hypercontractivity in Hamming space 11

Hypercontractivity of T δ : exact version 1 + (1 2δ)2 p 2 F = (δ, p) : 0 δ 1 p 1 Theorem (P 13) For any compact subset K F there is a constant C K s.t. T δ p 2 C K (δ, p) K, where T δ f = f P Z, P Z = Unif(S δn ). For p = 1, δ = 1/2: T 1 p 2 = θ(n 1 4 ). 2 Note: for Bernoulli noise N δ p 2 = 1 everywhere on closure of F. Yury Polyanskiy Hypercontractivity in Hamming space 12

Proof: Prelim remarks Why is constant not 1? Let 1 even = 1{x : wt(x)-even} Then: T δ 1 even = 1 even (or 1 odd ) So: T δ p q 2 1 p 1 q > 1. Yury Polyanskiy Hypercontractivity in Hamming space 13

Proof: Prelim remarks Why is constant not 1? Let 1 even = 1{x : wt(x)-even} Then: T δ 1 even = 1 even (or 1 odd ) So: T δ p q 2 1 p 1 q > 1. Beautiful methods that failed: Tensorization (T δ not a tensor power) Semigroup method, aka d dδ (T δ not a semigroup) Comparison of Dirichlet forms (not useful for δ 1 n ) Ugly method that worked: Direct computation of eigenvalues of T δ Yury Polyanskiy Hypercontractivity in Hamming space 13

Proof: Orthogonal decomposition T δ and N δ are S n -equivariant f(x) = poly(( 1) x 1,..., ( 1) xn ) = n f a (x), a=0 where f a (x) = degree a part of the poly. By odd/even trick can assume f a = 0 for a n/2. Have: N δ f a = (1 2δ) a f a T δ f a = (?? ) f a Yury Polyanskiy Hypercontractivity in Hamming space 14

Proof: Orthogonal decomposition T δ and N δ are S n -equivariant f(x) = poly(( 1) x 1,..., ( 1) xn ) = n f a (x), a=0 where f a (x) = degree a part of the poly. By odd/even trick can assume f a = 0 for a n/2. Have: N δ f a = (1 2δ) a f a T δ f a = κ δn (a) f a and κ δn (a) Krawtchouk polynomial of degree δn. Yury Polyanskiy Hypercontractivity in Hamming space 14

Proof (cont d): Eigenvalues of N δ vs T δ 0 Asymptotic spectra of T δ and N δ : δ = 0.1 Bernoulli noise N δ Spherical average T δ 0.05 (1/n) log(fourier coef.) 0.1 0.15 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 a/n Yury Polyanskiy Hypercontractivity in Hamming space 15

Proof (cont d): Eigenvalues of N δ vs T δ 0 Asymptotic spectra of T δ and N δ : δ = 0.1 Bernoulli noise N δ Spherical average T δ 0.05 (1/n) log(fourier coef.) 0.1 0.15 For any eigenspace a λ a (T δ ) λ a (N δ ) 0.2 0.25 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 a/n Yury Polyanskiy Hypercontractivity in Hamming space 15

Proof (cont d): Eigenvalues of N δ vs T δ 0 Asymptotic spectra of T δ and N δ : δ = 0.1 Bernoulli noise N δ Spherical average T δ 0.05 (1/n) log(fourier coef.) 0.1 0.15 0.2 For any eigenspace a λ a (T δ ) λ a (N δ ) But only for δ 0.17! 0.25 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 a/n Yury Polyanskiy Hypercontractivity in Hamming space 15

Proof (cont d): Failure for δ 0.17 0 Asymptotic spectra of T δ and N δ : δ = 0.25 Fourier projection norm Π a p(δ) >2 0.05 Bernoulli noise N δ Spherical average T δ 0.1 0.15 (1/n) log(fourier coef.) 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 a/n Yury Polyanskiy Hypercontractivity in Hamming space 16

Proof (cont d): Failure for δ 0.17 0 Asymptotic spectra of T δ and N δ : δ = 0.25 Fourier projection norm Π a p(δ) >2 0.05 Bernoulli noise N δ Spherical average T δ 0.1 (1/n) log(fourier coef.) 0.15 0.2 0.25 0.3 When δ > 0.17 and a large λ a (T δ ) λ a (N δ ) 0.35 Fortunately: f a 2 f 2 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 a/n Yury Polyanskiy Hypercontractivity in Hamming space 16

Proof (cont d): Overall argument Know: N δ f 2 2 = a (1 2δ) 2a f a 2 2 f 2 p by Bonami! Need to show: T δ f 2 2 = a κ δn (a) 2 f a 2 2 C f 2 p a a crit : κ δn (a) 2 f a 2 2 C (1 2δ) 2a f a 2 2 a a crit a = C N δ f 2 2 C f 2 p a > a crit : Need extra estimate: κ δn (a) f a 2 e Cn f p = a ξ crit n contributes C f 2 p. Yury Polyanskiy Hypercontractivity in Hamming space 17

Proof (cont d): Overall argument Know: N δ f 2 2 = a (1 2δ) 2a f a 2 2 f 2 p by Bonami! Need to show: T δ f 2 2 = a κ δn (a) 2 f a 2 2 C f 2 p a a crit : For δ < 0.17: a crit = n/2! κ δn (a) 2 f a 2 2 C (1 2δ) 2a f a 2 2 a a crit a a > a crit : Need extra estimate: = C N δ f 2 2 C f 2 p κ δn (a) f a 2 e Cn f p = a ξ crit n contributes C f 2 p. Yury Polyanskiy Hypercontractivity in Hamming space 17

Application to additive-combinatorics Known fact: Subspace L F n 2 s.t. L S δn S δn implies dim L n. Yury Polyanskiy Hypercontractivity in Hamming space 18

Application to additive-combinatorics Known fact: Subspace L F n 2 s.t. L S δn S δn implies dim L n. Corollary Let A F n 2 be a subset s.t. sumset A + A intersects some S δn with average multiplicty > λ A. Then Proof: Statement means A λ ɛ 2 n. 2 n (1 A 1 A, 1 Sδn ) λ A S δn Rearrange (1 A 1 A, 1 S ) = (T δ 1 A, 1A) Apply Hölder and HC. Yury Polyanskiy Hypercontractivity in Hamming space 18

Thank you! References R. Ahlswede and P. Gacs, Spreading of sets in product spaces and hypercontraction of the Markov operator, Ann. Probab., pp. 925 939, 1976. E. M. Semenov and I. Y. Shneiberg, Hypercontractive operators and Khinchin s inequality, Func. Analysis and Appl., vol. 22, no. 3, pp. 244 246, 1988. Y. Polyanskiy, Hypercontractivity of spherical averages in Hamming space, arxiv:1309.3014, Sep. 2013 Y. Polyanskiy, Hypothesis testing via a comparator and hypercontractivity, preprint, Feb. 2013. C. Nair, Equivalent formulations of hypercontractivity using information measures, Zurich Seminar on Comm., 2014 Yury Polyanskiy Hypercontractivity in Hamming space 19

Dobrushin coefficient total variation P Q TV = 1 2 dp dq Dobrushin coefficient η TV = P Y Q Y sup TV P X Q X P X Q X TV Original motivation: mixing of Markov chains = sup x,x P Y X=x P Y X=x TV. Yury Polyanskiy Hypercontractivity in Hamming space 20

Contraction coeffs and mixing of Markov chains Consider a M.C. with invariant dist. P : Contraction: X 0 X 1 X 2 P Xn P TV (η TV ) n χ 2 (P Xn P ) (η χ 2) n χ 2 (P X0 P ) D(P Xn P ) (η KL ) n D(P X0 P ) (Dobrushin) (spectral gap) (log-sobolev) Yury Polyanskiy Hypercontractivity in Hamming space 21

Contraction coeffs and mixing of Markov chains Consider a M.C. with invariant dist. P : Contraction: X 0 X 1 X 2 P Xn P TV (η TV ) n χ 2 (P Xn P ) (η χ 2) n χ 2 (P X0 P ) D(P Xn P ) (η KL ) n D(P X0 P ) (Dobrushin) (spectral gap) (log-sobolev) TODO: add mixing estimate via HC Yury Polyanskiy Hypercontractivity in Hamming space 21

From TV to KL... and further Theorem (Cohen-Iwasa-Rautu-Ruskai-Seneta-Zbăganu 93) η KL η TV Yury Polyanskiy Hypercontractivity in Hamming space 22

From TV to KL... and further Theorem (Cohen-Iwasa-Rautu-Ruskai-Seneta-Zbăganu 93) η KL η TV Fun application: Let T kernel of a finite-state M.C. with invariant distribution P. Let E be indep. M.C.: Ef fdp If T (x, ) E(x, ) TV < 1/2 then η TV = max x,x T (x, ) T (x, ) TV < 1 Then η KL < 1. By Ahslwede-Gács q 1 T f q f p, p = η KL q < q (hypercont) There is a ball around E s.t. for all T : T Lp L q = 1. Yury Polyanskiy Hypercontractivity in Hamming space 22

From TV to KL... and further Theorem (Cohen-Iwasa-Rautu-Ruskai-Seneta-Zbăganu 93) η KL η TV Fun application: Let T kernel of a finite-state M.C. with invariant distribution P. Let E be indep. M.C.: Ef fdp If T (x, ) E(x, ) TV < 1/2 then η TV = max x,x T (x, ) T (x, ) TV < 1 Then η KL < 1. By Ahslwede-Gács q 1 T f q f p, p = η KL q < q (hypercont) There is a ball around E s.t. for all T : T Lp L q = 1. Every (mixing) M.C. eventually (hyper-)contracts L p L q. Yury Polyanskiy Hypercontractivity in Hamming space 22

From TV to KL... and further Theorem (Cohen-Iwasa-Rautu-Ruskai-Seneta-Zbăganu 93) η KL η TV Fun application: Let T kernel of a finite-state M.C. with invariant distribution P. Let E be indep. M.C.: Ef fdp If T (x, ) E(x, ) TV < 1/2 then η TV = max x,x T (x, ) T (x, ) TV < 1 Then η KL < 1. By Ahslwede-Gács q 1 T f q f p, p = η KL q < q (hypercont) There is a ball around E s.t. for all T : T Lp L q = 1. Every (mixing) M.C. eventually (hyper-)contracts L p L q. Euclidean space: planar face of the ball of Fourier multipliers: [Segal 70], [Fefferman-Shapiro 72], [Semenov-Shneiberg 88] Yury Polyanskiy Hypercontractivity in Hamming space 22