Sum-of-Squares and Spectral Algorithms

Size: px

Start display at page:

Download "Sum-of-Squares and Spectral Algorithms"

Patricia French
5 years ago
Views:

1 Sum-of-Squares and Spectral Algorithms Tselil Schramm June 23, 2017 Workshop on STOC 2017

2 Spectral algorithms as a tool for analyzing SoS. SoS Semidefinite Programs Spectral Algorithms

3 SoS suggests a new family of spectral algorithms! SoS Semidefinite Programs Spectral Algorithms Average-Case & Structured Instances

4 Average Case SoS/Spectral Algorithms Tensor Decomposition/Dictionary Learning [Barak-Kelner-Steurer 14, Ge-Ma 15, Ma-Shi-Steurer 16] Planted Sparse Vector [Barak-Brandão-Harrow-Kelner-Steurer-Zhou 12, Barak-Kelner-Steurer 14] Tensor Completion [Barak-Moitra 16, Potechin-Steurer 17] Refuting Random CSPs [Allen-O Donnell-Witmer 15, Raghavendra-Rao-S 17] Tensor Principal Components Analysis [Hopkins-Shi-Steurer 15,Bhattiprolu-Guruswami-Lee 16, Raghavendra-Rao-S 17]

5 Average Case SoS/Spectral Algorithms Tensor Decomposition/Dictionary Learning [Barak-Kelner-Steurer 14, Ge-Ma 15, Ma-Shi-Steurer 16] Planted Sparse Vector [Barak-Brandão-Harrow-Kelner-Steurer-Zhou 12, Barak-Kelner-Steurer 14] Tensor Completion [Barak-Moitra 16, Potechin-Steurer 17] Refuting Random CSPs [Allen-O Donnell-Witmer 15, Raghavendra-Rao-S 17] Tensor Principal Components Analysis [Hopkins-Shi-Steurer 15,Bhattiprolu-Guruswami-Lee 16, Raghavendra-Rao-S 17]

6 Tensor Principle Components Analysis (TPCA) T = n n n Want ``max tensor singular value/vector : σ = max x S n 1 T, x 3 and x = argmax x S n 1 T, x 3 NP-hard in worst case.

7 This notation Definition Kronecker/tensor product: m k e.g. tensor power of x: A = n B = l x = 1 n 1 mk A B = A 1,1 A 2,1 A n,1 A 1,2 A 1,m A n,m nl x 3 = n 3 x i x j x k

8 Tensor Principle Components Analysis (TPCA) T = n n n Want ``max tensor singular value/vector : NP-hard in worst case. σ = max x S n 1 T, x 3 and x = argmax x S n 1 T, x 3 T x x x

9 Spiked tensor model for TPCA [Montanari-Richard 14] planted T = T = λ v v signal v + noise G n n n entries N(0,1) random Search: find v in planted case noise Distinguishing: planted or random case? Refutation: certify upper bound on max x T, x 3 in random case T = G entries N(0,1)

10 The Plan Refutation: certify upper bound on max x T, x 3 in random case random noise 1. SoS suggests a family of spectral algorithms 2. Naïve spectral algorithm 3. Improving with SoS spectral algorithms T = G entries N(0,1) Search: find v in planted case 4. Use SoS analysis to get fast algorithms T = λ v v signal v + planted noise G entries N(0,1)

11 Degree- SoS deg p D Solve for E: p x R Linearity: n variables E a p x + b q(x) = a E p x + b E q x Fixed Scalars: E 1 = 1 Non-negative squares: E q x 2 0 deg q D 2 + problem-specific constraints, e.g. E x 2 = 1

12 SoS suggests spectral algorithms If we want to bound f x associate some matrix with f and then Rearrange entries along monomial symmetries Apply degree-d SoS polynomial inequalities Cauchy-Schwarz, Jensen s Inequality (for squares), Use problem-specific constraints (e.g. x i 2 = 1)

13 SoS captures spectral algorithms Theorem E f x λ max (f) Definition λ max f = argmin λ max F F symmetric f x = F, x 2d symmetric matrix representation of f(x) f(x) = F, x 2d e.g.

14 SoS captures spectral algorithms Theorem Proof E f x λ max (f) E f(x) = E F, x 2d symmetric matrix representation of f(x) λ = λ max F 0 λ Id F 0 = σ i u i u i E E 0 λ Id F, x 2d = σ i u i, x d 2 sum of degree-d squares if 2d D

15 SoS captures spectral algorithms Theorem E f x λ max (f) Proof E f(x) = E F, x 2d sum of degree-d squares if 2d D E F, x 2d + E λ Id F, x 2d By linearity = E λ Id, x 2d = λ E x 2d squares on diagonal

16 What kind of spectral algorithms? Choose best matrix representation F by: Rearranging entries along symmetries of x d Applying degree-d SoS polynomial inequalities Cauchy-Schwarz, Jensen s Inequality (for squares), Problem-specific constraints (e.g. x i 2 = 1)

17 What kind of spectral algorithms? Choose best matrix representation F by: Rearranging entries along symmetries of x d Applying degree-d SoS polynomial inequalities Cauchy-Schwarz, Jensen s Inequality (for squares), Problem-specific constraints (e.g. x i 2 = 1)

18 SoS suggests several spectral algorithms matrix representation of f(x) λ E x 2d choice of may affect!

19 SoS suggests several spectral algorithms Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). unit vector f x = E g N(0,Id) x, g 4 = 3 N(0,1)

20 SoS suggests several spectral algorithms Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). unit vector f x = E g N(0,Id) x, g 4 N(0,1) = E g 4, x 4 i = j = k = l i, j, k, l two distinct pairs any index with odd multiplicity

21 SoS suggests several spectral algorithms Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). f x = E g N(0,Id) x, g 4 = E g 4, x 4 unit vector N(0,1) n ii jj ij ji E g g g g = eigenvalue is n 3 n ii jj ij ji

22 Rearranging entries along symmetries of Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). f x = E g N(0,Id) x, g 4 = E g 4, x 4 unit vector N(0,1) n ii jj ij ji ii +(x i 2 x j 2 x i 2 x j 2 ) ij n ii jj ij ji jj ij -1 +1

23 Rearranging entries along symmetries of Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). f x = E g N(0,Id) x, g 4 = E g 4, x 4 unit vector N(0,1) n ii jj ij ji n ii jj ij ji

24 Rearranging entries along symmetries of Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). f x = E g N(0,Id) x, g 4 = E g 4, x 4 unit vector N(0,1) n ii jj ij ji n ii jj ij ji

25 Rearranging entries along symmetries of Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). f x = E g N(0,Id) x, g 4 = E g 4, x 4 unit vector N(0,1) n ii jj ij ji n ii jj ij ji

26 Rearranging entries along symmetries of Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). f x = E g N(0,Id) x, g 4 = E g 4, x 4 unit vector N(0,1) n ii jj ij ji n ii jj ij ji = 3 Id eigenvalues are 3!

27 Rearranging entries along symmetries of Claim There exist f(x) with representations F 1, F 2 such that f x = F 1, x d = F 2, x d but λ F 1 λ(f 2 ). f x = E g N(0,Id) x, g 4 = E g 4, x 4 unit vector N(0,1) f x = E g 4, x 4 = 3 Id, x 4 = 3

28 What kind of spectral algorithms? Choose best matrix representation by: Rearranging entries along symmetries of x d Applying degree-d SoS polynomial inequalities Cauchy-Schwarz, Jensen s Inequality (for squares),

29 Tensor Norm Refutation random case, noise only Claim max x S n G, x 3 O n with high probability over G Simple spectral algorithm can only certify O n. Proof: G, x 3 σ max (F G ) G n 2 Gordon s Theorem σ max (G) n n Representations all the same because G is symmetric with iid entries

30 SoS Cauchy-Schwarz Claim if Proof: degree D square

31 Cauchy-Schwarz for Tensor PCA Refutation Theorem unit vector E G, x 3 n 3/4 n noise N(0,1) G i i n Proof: n G, x 3 Cauchy-Schwarz

32 Cauchy-Schwarz for Tensor PCA Refutation Proof: Theorem unit vector E G, x 3 n 3/4 noise N(0,1) SoS analysis spectral algorithm for refutation! G, x 3 Cauchy-Schwarz eigenvalues are n 3/2

33 What kind of spectral algorithms? Choose best matrix representation by: Rearranging entries along symmetries of x d Applying degree-d SoS polynomial inequalities Cauchy-Schwarz, Jensen s Inequality (for squares),

34 Better Approx (in more time) Theorem unit vector E G, x 3 n 3/4 noise N(0,1) Theorem unit vector noise N(0,1) (in time n D ) But actually, max x S n G, x 3 O n. Information-theoretically, can certify O n in time 2 n (epsilon net).

35 Better Approx (in more time) Theorem unit vector E G, x 3 n 3/4 noise N(0,1) Theorem unit vector noise N(0,1) (in time n D ) ε D = n ε 1 2 n Tensor PCA Refutation SoS and spectral algorithms log D log n 0 1 approximation factor SoS lower bounds poly(n) n 1 4 [Hopkins-Kothari- Potechin-Raghavendra- S-Steurer 17]

36 Jensen s Inequality For d a power of 2, D d deg(f) Proof by induction on d E f x d 2 E f x 2d apply inductive hypothesis

37 Jensen s Inequality For d a power of 2, D d deg(f) We can take advantage of increased symmetry in higher-degree polynomials (more matrix representations)

38 Better Approximation Theorem unit vector noise N(0,1) (in time n D ) Proof: Jensen s inequality for d some power of 2

39 Better Approximation Theorem unit vector noise N(0,1) (in time n D ) Proof: Jensen s inequality for d some power of 2

40 Symmetrize to improve eigenvalue ordered multiset of 2d variables Taking the average of row S and π(s) fixes the polynomial S π(s) entries degree-2d polynomials in G ijk N(0,1) n 2d x i = i S j π(s) x j n 2d

41 Symmetrizing to improve eigenvalue Taking the average of row S and π(s) fixes the polynomial S π(s) entries degree-2d polynomials in G ijk N(0,1) n 2d (2) M abcd,ijkl 1 4! ( G i G i ) abij ( G i G i ) cdkl +( G i G i ) acij ( G i G i ) bdkl + n 2d

42 Heuristic spectral norm calculation Spectral norm? I π I ( G i G i ) d = n 3/2 d avg. entry each entry is average of ~d! i.i.d. uniform randomly signed variables magnitude m m d!

43 Improving Tensor PCA noise parameter Theorem unit vector noise N(0,1) Jensen s inequality for d some power of 2 (if D 4d) Average over symmetries of x 2d to reduce matrix representation eigenvalues = E x 4d, M (d) 1/2d M d 1/2d w.h.p.

44 What kind of spectral algorithms? Choose best matrix representation by: Rearranging entries along symmetries of x d Applying degree-d SoS polynomial inequalities Cauchy-Schwarz, Jensen s Inequality (for squares),

45 Other SoS (via Spectral) Algorithms Tensor Decomposition: symmetry, Cauchy-Schwarz, constant-d Jensen s Dictionary Learning: symmetry + tensor decomposition Planted Sparse Vector: symmetry [Barak-Kelner-Steurer 14, Ge-Ma 15, Ma-Shi-Steurer 16] [Barak-Brandão-Harrow-Kelner-Steurer-Zhou 12, Barak-Kelner-Steurer 14] Tensor Completion: symmetry, Cauchy-Schwarz [Barak-Moitra 16, Potechin-Steurer 17] Refuting Random CSPs: symmetry, Cauchy-Schwarz, Jensen s, (x i 2 = 1) constraints [Allen-O Donnell-Witmer 15, Raghavendra-Rao-S 17] Polynomial Maximization over S n : symmetry, Cauchy-Schwarz, Jensen s, worst case [Bhattiprolu-Ghosh-Guruswami-E.Lee-Tulsiani 16]

46 Fast spectral algorithms from SoS Analyses [Hopkins-S-Shi-Steurer 16]

47 SoS Gives Spectral Search Algorithm T = G + λ v 3 T i T i n 2 T i n n i n T i = G i + λv i vv n 2 v i 2 = 1 = G λ 2 v 2 i vv vv i G i + cross-terms + 2 vv vv eigenvalue n 3/2 eigenvalue = λ 2

48 Running in. T = G + λ v 3 n T i T i n 2 T i i n n sum of n matrices of size n 2 n 2 n 2 time = n 5 + n 4 log n build matrix compute top eigenvalue practical spectral algorithm? Theorem Can compress to get an O n 3 -time algorithm.

49 Theorem Compressing the matrix There is an O n 3 -time algorithm. T i T i = G i G i + crossterms + λ 2 vv vv eigenvalue n 3/2 eigenvalue = λ 2 How to reduce dimension but preserve signal-to-noise ratio? A, B are n n matrices A i,i B Partial Trace: Tr par A B = Tr A B Tr par λ 2 vv vv = λ 2 v 2 vv = λ 2 vv signal-to-noise Tr par G i G i = Tr G i G i eigs n 3/2 ratio preserved! ±n 1/2 eigenvalues ±n 1/2

50 Theorem Compressing the matrix There is an O n 3 -time algorithm. T i T i = G i G i + crossterms + λ 2 vv vv eigenvalue n 3/2 eigenvalue = λ 2 How to reduce dimension but preserve signal-to-noise ratio? A, B are n n matrices A i,i B Partial Trace: Tr par A B = Tr A B Tr par T i T i = Tr T i T i runtime? computing all Tr(T i ) : n 2 time each of the n 2 entries is sum of n numbers: n 3 time linear in input! computing top eigenvector/eigenvalue of n n matrix: n 2 log n time

51 Fast Spectral Algorithms via SoS Secret Sauce: apply partial trace to SoS matrix (in a way that enables fast power iteration) Tensor PCA [Hopkins-S-Shi-Steurer 16] Tensor decomposition Planted Sparse Vector [Hopkins-S-Shi-Steurer 16, S-Steurer 17] [Hopkins-S-Shi-Steurer 16] Tensor Completion [Montanari-Sun 17]

52 SoS perspective gives new spectral algorithms Spectral techniques let us analyze SoS Worst-case problems? Spectral Algorithms Sum-of-Squares Algorithms Average-Case & Structured Instances

approximation algorithms I

SUM-OF-SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 201 meta-task encoded as low-degree polynomial in R x example: f(x) = i,j n w ij x i x j 2 given: functions