A Matrix Expander Chernoff Bound

Size: px

Start display at page:

Download "A Matrix Expander Chernoff Bound"

Lydia Morrison
5 years ago
Views:

1 A Matrix Expander Chernoff Bound Ankit Garg, Yin Tat Lee, Zhao Song, Nikhil Srivastava UC Berkeley

2 Vanilla Chernoff Bound Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4)

3 Vanilla Chernoff Bound Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) Two Extensions: 1.Dependent Random Variables 2.Sums of random matrices

4 Expander Chernoff Bound [AKS 87, G 94] Thm[Gillman 94]: Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V R be a function with f v 1 and σ v f v = 0.

5 Expander Chernoff Bound [AKS 87, G 94] Thm[Gillman 94]: Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V R be a function with f v 1 and σ v f v = 0. Then, if v 1,, v k is a stationary random walk: P 1 k i f(v i ) ε 2exp( c(1 λ)kε 2 ) Implies walk of length k 1 λ 1 concentrates around mean.

6 Intuition for Dependence on Spectral Gap 1 λ = 1 n 2 f = +1 f = 1

7 Intuition for Dependence on Spectral Gap 1 λ = 1 n 2 Typical random walk takes Ω n 2 steps to see both ±1.

8 Derandomization Motivation Thm[Gillman 94]: Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V R be a function with f v 1 and σ v f v = 0. Then, if v 1,, v k is a stationary random walk: P 1 k i f(v i ) ε 2exp( c(1 λ)kε 2 ) To generate k indep samples from [n] need k log n bits. If G is 3 regular with constant λ, walk needs: log n + k log(3) bits.

9 Derandomization Motivation Thm[Gillman 94]: Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V R be a function with f v 1 and σ v f v = 0. Then, if v 1,, v k is a stationary random walk: P 1 k i f(v i ) ε 2exp( c(1 λ)kε 2 ) To generate k indep samples from [n] need k log n bits. If G is 3 regular with constant λ, walk needs: log n + k log(3) bits. When k = O(log n) reduces randomness quadratically. Can completely derandomize in polynomial time.

10 Matrix Chernoff Bound Thm [Rudelson 97, Ahlswede-Winter 02, Oliveira 08, Tropp 11 ]. If X 1,, X k are independent mean zero random d d Hermitian matrices with X i 1 then P 1 k i X i ε 2dexp( kε 2 /4)

11 Matrix Chernoff Bound Thm [Rudelson 97, Ahlswede-Winter 02, Oliveira 08, Tropp 11 ]. If X 1,, X k are independent mean zero random d d Hermitian matrices with X i 1 then P 1 k i X i ε 2dexp( kε 2 /4) Factor d is tight because of the diagonal case. Very generic bound (no independence assumptions on the entries). Many applications + martingale extensions (see Tropp).

12 Matrix Expander Chernoff Bound? Conj[Wigderson-Xiao 05]: Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V C d d be a function with f v 1 and σ v f v = 0. Then, if v 1,, v k is a stationary random walk: P 1 k i f(v i ) ε 2dexp( c(1 λ)kε 2 ) Motivated by derandomized Alon-Roichman theorem.

13 Main Theorem Thm. Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V C d d be a function with f v 1 and σ v f v = 0. Then, if v 1,, v k is a stationary random walk: P 1 k i f(v i ) ε 2dexp( c(1 λ)kε 2 ) Gives black-box derandomization of any application of matrix Chernoff

14 1. Proof of Chernoff: reduction to mgf Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) P X i kε e tkε E exp t X i = e tkε Ee tx i i i i e tkε 1 + tex i + t 2 k e tkε+kt2 e kε2 4

15 1. Proof of Chernoff: reduction to mgf Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) P X i kε e tkε E exp i t i X i = e tkε i Ee tx i e tkε 1 + tex i + t 2 k e tkε+kt2 e kε2 4 Markov

16 1. Proof of Chernoff: reduction to mgf Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) P X i kε e tkε E exp i t i X i = e tkε i Ee tx i e tkε 1 + tex i + t 2 k e tkε+kt2 e kε2 4 Indep.

17 1. Proof of Chernoff: reduction to mgf Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) P X i kε e tkε E exp i t i X i = e tkε i Ee tx i e tkε 1 + tex i + t 2 k e tkε+kt2 e kε2 4 Bounded

18 1. Proof of Chernoff: reduction to mgf Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) P X i kε e tkε E exp i t i X i = e tkε i Ee tx i e tkε 1 + tex i + t 2 k e tkε+kt2 e kε2 4 Mean zero

19 1. Proof of Chernoff: reduction to mgf Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) P X i kε e tkε E exp i t i X i = e tkε i Ee tx i e tkε 1 + tex i + t 2 k e tkε+kt2 e kε2 4

20 1. Proof of Chernoff: reduction to mgf Thm [Hoeffding, Chernoff]. If X 1,, X k are independent mean zero random variables with X i 1 then P 1 k i X i ε 2exp( kε 2 /4) P X i kε e tkε E exp i t i X i = e tkε i Ee tx i e tkε 1 + tex i + t 2 k e tkε+kt2 e kε2 4

21 2. Proof of Expander Chernoff Goal: Show E exp t σ i f v i exp c λ kt 2

22 2. Proof of Expander Chernoff Goal: Show E exp t σ i f v i exp c λ kt 2 Issue: Eexp σ i f v i ς i Eexp(tf(v i )) How to control the mgf without independence?

23 Step 1: Write mgf as quadratic form Ee t σ i k f(v i) = i 0,,i k V P(v 0 = i 0 )P i 0, i 1 P(i k 1, i k ) exp t 1 j k f i j = 1 n i 0,,i k V P i 0, i 1 e tf(i 1) P i k 1, i k e tf(i k) = u, EP k u where u = 1 n,, 1 E = e tf(1) n. i 0 e tf(i 1) i 1 e tf(i 3) i 3 P i 0, i 1 P i 1, i 2 i 2 e tf(n) P i 2, i 3 e tf(i 2)

24 Step 1: Write mgf as quadratic form Ee t σ i k f(v i) = i 0,,i k V P(v 0 = i 0 )P i 0, i 1 P(i k 1, i k ) exp t 1 j k f i j = 1 n i 0,,i k V P i 0, i 1 e tf(i 1) P i k 1, i k e tf(i k) = u, EP k u where u = 1 n,, 1 E = e tf(1) n. i 0 e tf(i 1) i 1 e tf(i 3) i 3 P i 0, i 1 P i 1, i 2 i 2 e tf(n) P i 2, i 3 e tf(i 2)

25 Step 1: Write mgf as quadratic form Ee t σ i k f(v i) = i 0,,i k V P(v 0 = i 0 )P i 0, i 1 P(i k 1, i k ) exp t 1 j k f i j = 1 n i 0,,i k V P i 0, i 1 e tf(i 1) P i k 1, i k e tf(i k) = u, EP k u where u = 1 n,, 1 E = e tf(1) n. i 0 e tf(i 1) i 1 e tf(i 3) i 3 P i 0, i 1 P i 1, i 2 i 2 e tf(n) P i 2, i 3 e tf(i 2)

26 Step 2: Bound quadratic form Goal: Show u, EP k u exp c λ kt 2 Observe: P J λ where J = complete graph with self loops So for small λ, should have u, EP k u u, (EJ) k u

27 Step 2: Bound quadratic form Goal: Show u, EP k u exp c λ kt 2 Observe: P J λ where J = complete graph with self loops So for small λ, should have u, EP k u u, (EJ) k u Approach 1. Use perturbation theory to show EP exp c λ t 2

28 Step 2: Bound quadratic form Goal: Show u, EP k u exp c λ kt 2 Observe: P J λ where J = complete graph with self loops So for small λ, should have u, EP k u u, (EJ) k u Approach 1. Use perturbation theory to show EP exp c λ t 2 Approach 2. (Healy 08) track projection of iterates along u.

29 Simplest case: λ = 0 EPEPEPEPu

30 Simplest case: λ = 0 EPEPEPEPu

31 Simplest case: λ = 0 EPEPEPEPu

32 Simplest case: λ = 0 EPEPEPEPu

33 Simplest case: λ = 0 EPEPEPEPu

34 Observations EPEPEPEPu Observe: P shrinks every vector orthogonal to u by λ. condition. u, Eu = 1 n σ v V e tf(v) = 1 + t 2 by mean zero

35 A small dynamical system m = Q v v = EP j u m = v, u m m

36 A small dynamical system m = Q v v = EP j u m = v, u e t2 t m m 0 λ = 0 0 v EPv

37 A small dynamical system m = Q v v = EP j u m = v, u e t2 t m m e t λ any λ λt v EPv

38 A small dynamical system m = Q v v = EP j u m = v, u e t2 t m m 1 t log 1/λ λt v EPv

39 A small dynamical system m = Q v v = EP j u m = v, u e t2 t m m 1 λt Any mass that leaves gets shrunk by λ

40 Analyzing dynamical system gives Thm[Gillman 94]: Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V R be a function with f v 1 and σ v f v = 0. Then, if v 1,, v k is a stationary random walk: P 1 k i f(v i ) ε 2exp( c(1 λ)kε 2 )

41 Generalization to Matrices? Setup: f: V C d d, random walk v 1,, v k. Goal: ETr exp t i f v i d exp ckt 2 where e A is defined as a power series. Main Issue: exp A + B exp A exp(b) unless A, B = 0 can t express exp(sum) as iterated product.

42 Generalization to Matrices? Setup: f: V C d d, random walk v 1,, v k. Goal: ETr exp t i f v i d exp ckt 2 where e A is defined as a power series. Main Issue: exp A + B exp A exp(b) unless A, B = 0 can t express exp(sum) as iterated product.

43 The Golden-Thompson Inequality Partial Workaround [Golden-Thompson 65]: Tr exp A + B Tr exp A exp B Sufficient for independent case by induction. For expander case, need this for k matrices. False! e A Tr e A+B+C > 0 > Tr(e A e B e C ) e B e C

44 The Golden-Thompson Inequality Partial Workaround [Golden-Thompson 65]: Tr exp A + B Tr exp A exp B Sufficient for independent case by induction. For expander case, need this for k matrices. e A Tr e A+B+C > 0 > Tr(e A e B e C ) e B e C

45 The Golden-Thompson Inequality Partial Workaround [Golden-Thompson 65]: Tr exp A + B Tr exp A exp B Sufficient for independent case by induction. For expander case, need this for k matrices. False! e A Tr e A+B+C > 0 > Tr(e A e B e C ) e B e C

46 Key Ingredient [Sutter-Berta-Tomamichel 16] If A 1,, A k are Hermitian, then log Tr e A 1+ +A k dβ(b) log Tr e A 1(1+ib) 2 e A k(1+ib) 2 e A 1(1+ib) 2 e A k(1+ib) 2 where β(b) is an explicit probability density on R. 1. Matrix on RHS is always PSD. 2. Average-case inequality: e A i/2 are conjugated by unitaries. 3. Implies Lieb s concavity, triple-matrix, ALT, and more.

47 Key Ingredient [Sutter-Berta-Tomamichel 16] If A 1,, A k are Hermitian, then log Tr e A 1+ +A k dβ(b) log Tr e A 1(1+ib) 2 e A k(1+ib) 2 e A 1(1+ib) 2 e A k(1+ib) 2 where β(b) is an explicit probability density on R. 1. Matrix on RHS is always PSD. 2. Average-case inequality: e A i/2 are conjugated by unitaries. 3. Implies Lieb s concavity, triple-matrix, ALT, and more.

48 Proof of SBT: Lie-Trotter Formula e A+B+C = lim θ 0 + eθa e θb θc 1/θ e log Tr e A+B+C = lim θ 0 + log G θ 2/ θ For G z e za 2 e zb 2 e zc 2

49 Proof of SBT: Lie-Trotter Formula e A+B+C = lim θ 0 + eθa e θb θc 1/θ e log Tr e A+B+C = lim θ 0 + 2log G θ 2/ θ /θ For G z e za 2 e zb 2 e zc 2

50 Complex Interpolation (Stein-Hirschman) θ log G θ 2/θ

51 Complex Interpolation (Stein-Hirschman) For each θ, find analytic F(z) st: F it = 1 F 1 + it G 1 + it 2 F θ = G θ 2/θ

52 Complex Interpolation (Stein-Hirschman) For each θ, find analytic F(z) st: F it = 1 F 1 + it G 1 + it 2 F θ = G θ 2/θ log F θ නlog F it + නlog F 1 + it

53 Complex Interpolation (Stein-Hirschman) For each θ, find analytic F(z) st: F it = 1 F 1 + it G 1 + it 2 F θ = G θ 2/θ lim θ 0 log F θ නlog F it + නlog F 1 + it

54 Key Ingredient [Sutter-Berta-Tomamichel 16] If A 1,, A k are Hermitian, then log Tr e A 1+ +A k dβ(b) log Tr e A 1(1+ib) 2 e A k(1+ib) 2 e A 1(1+ib) 2 e A k(1+ib) 2 where β(b) is an explicit probability density on R.

55 Key Ingredient [Sutter-Berta-Tomamichel 16] If A 1,, A k are Hermitian, then log Tr e A 1+ +A k dβ(b) log Tr e A 1(1+ib) 2 e A k(1+ib) 2 e A 1(1+ib) 2 e A k(1+ib) 2 where β(b) is an explicit probability density on R. Issue. SBT involves integration over unbounded region, bad for Taylor expansion.

56 Bounded Modification of SBT Solution. Prove bounded version of SBT by replacing strip with half-disk. [Thm] If A 1,, A k are Hermitian, then log Tr e A 1+ +A k dβ(b) log Tr e A 1e ib 2 e A ke ib 2 e A 1e ib 2 e A ke ib 2 θ where β(b) is an explicit probability density on π 2, π 2. Proof. Analytic F(z) + Poisson Kernel + Riemann map.

57 Handling Two-sided Products Issue. Two-sided rather than one-sided products: Tr e tf v 1 e ib 2 e tf v k e ib 2 e tf v 1 e ib 2 e tf v k e ib 2 Solution. Encode as one-sided product by using Tr AXB = A B T vec X : e tf v 1 e ib 2 e tf v 1 e ib 2 e tf v k T e ib 2 e tf v k T e ib 2 vec I d, vec I d

58 Handling Two-sided Products Issue. Two-sided rather than one-sided products: Tr e tf v 1 e ib 2 e tf v k e ib 2 e tf v 1 e ib 2 e tf v k e ib 2 Solution. Encode as one-sided product by using Tr AXB = A B T vec X : e tf v 1 e ib 2 e tf v 1 T e ib 2 e tf v k T e ib 2 e tf v k T e ib 2 vec I d, vec I d

59 Finishing the Proof Carry out a version of Healy s argument with P I d 2 and: E = tf 1 eib e 2 e tf 1 T e ib 2 e tf(n)eib 2 e tf n T e ib 2 i 0 And vec(i d ) u instead of u. This leads to the additional d factor. i 1 P i 1, i 2 P i 0, i 1 i 2 i 3 P i 2, i 3

60 Main Theorem Thm. Suppose G = (V, E) is a regular graph with transition matrix P which has second eigenvalue λ. Let f: V C d d be a function with f v 1 and σ v f v = 0. Then, if v 1,, v k is a stationary random walk: P 1 k i f(v i ) ε 2dexp( c(1 λ)kε 2 )

61 Open Questions Other matrix concentration inequalities (multiplicative, low-rank, moments) Other Banach spaces (Schatten norms) More applications of complex interpolation

Matrix Concentration. Nick Harvey University of British Columbia

Matrix Concentration. Nick Harvey University of British Columbia Matrix Concentration Nick Harvey University of British Columbia The Problem Given any random nxn, symmetric matrices Y 1,,Y k. Show that i Y i is probably close to E[ i Y i ]. Why? A matrix generalization