Local MAX-CUT in Smoothed Polynomial Time Omer Angel 1 Sebastien Bubeck 2 Yuval Peres 2 Fan Wei 3 1 University of British Columbia 2 Microsoft Research Redmond 3 Stanford University November 9, 2016
Introduction Introduction Let G = (V, E) be a connected graph with n vertices; Let w : E R be an edge weight function. We allow negative edge weights. Definition (Cut in weighted graph) Given a partition of the vertices σ : V { 1, 1}, the cut size is uv E w uv (1 σ(u)σ(v)). To maximize CUT is to maximize cut(σ) = {w uv : σ(u) = 1, σ(v) = 1, uv E}.
Introduction Introduction Definition (Local max-cut problem) Find σ : V { 1, 1} whose cut size is locally maximal; that is one cannot increase the cut size by moving a single vertex to the other side.
Introduction Introduction Why local maximal for MAX-CUT is important? Game theory: The party affiliation game is a key example in potential games, a broader important class of games. Party affiliation game: n individuals to be split into two parties. d ij is the measure of distance between individuals i, j. Individual i s utility is d ij ; Everyone wants to minimize the total sum of j i distance from individuals in the same party, i.e., minimize σ i d ij σ j. i j Local maximum of MAX-CUT Nash equilibrium for this game.
Introduction Introduction Definition (FLIP algorithm A natural local search algorithm) Start from some initial partition σ. Until reaching a local maximum, in each step flip a vertex to increase the cut size.
Introduction Introduction Definition (FLIP algorithm A natural local search algorithm) Start from some initial partition σ. Until reaching a local maximum, in each step flip a vertex to increase the cut size. FLIP is hard: D. Johnson, C. H. Papadimtriou, and M. Yannakakis 88: Experiments: FLIP usually reaches a local maximum reasonably fast. A. A. Schäffer and M. Yannakakis 91: there exists instances where FLIP takes an exponential number of steps. (PLS-complete)
Introduction Introduction Smoothed Analysis What are the options to explain these findings? All edge weights are given by an adversary but bad instances are rare. All edge weights are uniformly random on [ 1, 1]? but too ideal.
Introduction Introduction Smoothed Analysis What are the options to explain these findings? All edge weights are given by an adversary but bad instances are rare. All edge weights are uniformly random on [ 1, 1]? but too ideal. Smoothed Analysis: for arbitrary deterministic edge weights w R E (might be even provided by adversary) and any initial configuration, add small noise Z independently on each edge. Main Question random edge weight W e = w e + Z e, all e E. Smoothed complexity of local max-cut: by adding a small noise (with bounded density) to each edge weight, will FLIP algorithm terminate in a polynomial number of steps w.h.p?
Introduction Introduction M. Etscheid and H. Rőglin 14: w.h.p, if weights are smoothed, any implementation of FLIP will terminate in at most n C log(n) steps for some universal constant C > 0.
Introduction Introduction M. Etscheid and H. Rőglin 14: w.h.p, if weights are smoothed, any implementation of FLIP will terminate in at most n C log(n) steps for some universal constant C > 0. We prove a polynomial complexity, provided that a small noise is added even to non-edges, or in other words that G is a complete graph.
Main Theorem Main Theorem Theorem (O. Angel, S. Bubeck, Y. Peres, FW 16) Let G be the complete graph with random edge weights W = w + Z with density bounded by φ. W.h.p. any implementation of FLIP terminates in at most φ 5 n 15.1 steps.
Main Theorem Main Theorem Theorem (O. Angel, S. Bubeck, Y. Peres, FW 16) Let G be the complete graph with random edge weights W = w + Z with density bounded by φ. W.h.p. any implementation of FLIP terminates in at most φ 5 n 15.1 steps. probability that exists an improving sequence of length n 15.1 is o(1).
Main Theorem Main Theorem Theorem (O. Angel, S. Bubeck, Y. Peres, FW 16) Let G be the complete graph with random edge weights W = w + Z with density bounded by φ. W.h.p. any implementation of FLIP terminates in at most φ 5 n 15.1 steps. probability that exists an improving sequence of length n 15.1 is o(1). W.h.p, from any initial configuration, there is no improving sequence of length 2n with total cut size improvement less than ɛ = n 12.1.
Notations Notations Random weights W = (W e ) e E [ 1, 1] E : independent entries. Recall W e = w e + n e. W e has density bounded by φ. Space of spin configurations { 1, 1} V : σ { 1, 1} V, σ = (σ(v)) v V. cut : { 1, 1} V R ( the Hamiltonian): cut(σ) = {w uv : σ(u) = 1, σ(v) = 1, uv E}.
Proof Overview Table of Contents
Proof Overview Intuition for the Proof Definition (Linear Operator α) Recall W = (W e ) e G. Swapping vertex v gives an operator α defined as: α, W = σ(v) W uv σ(u). α depends on the time and vertex u:u v flipped. Thus cut(σ v ) = cut(σ) + α, W. α (for v) is the difference of the two sum of edges weights incident to v in the two sides.
Proof Overview Intuition for the Proof Definition (Linear Operator α) Recall W = (W e ) e G. Swapping vertex v gives an operator α defined as: α, W = σ(v) W uv σ(u). α depends on the time and vertex u:u v flipped. Thus cut(σ v ) = cut(σ) + α, W. α (for v) is the difference of the two sum of edges weights incident to v in the two sides.
Proof Overview Intuition for the Proof Definition (Linear Operator α) Recall W = (W e ) e G. Swapping vertex v gives an operator α defined as: α, W = σ(v) W uv σ(u). α depends on the time and vertex u:u v flipped. Thus cut(σ v ) = cut(σ) + α, W. α (for v) is the difference of the two sum of edges weights incident to v in the two sides. Lemma [Structure of α] Suppose α is for the move v. 1 α { 1, 0, 1} E ; 2 the (v, w) coordinate of α is σ(v)σ(w). 3 The edges not incident to v are 0 in α.
Proof Overview Intuition for the Proof W.h.p, from any initial configuration, there is no improving sequence of length 2n with the total improvement less than ɛ = poly(n 1 ). Observation: P( α, W (0, ɛ]) φɛ. 1 Ideally, ignoring correlations, fix a sequence α 1,..., α 2n, ( ) P i [2n], α i, W (0, ɛ] (φɛ) 2n.
Proof Overview Intuition for the Proof W.h.p, from any initial configuration, there is no improving sequence of length 2n with the total improvement less than ɛ = poly(n 1 ). Observation: P( α, W (0, ɛ]) φɛ. 1 Ideally, ignoring correlations, fix a sequence α 1,..., α 2n, ( ) P i [2n], α i, W (0, ɛ] (φɛ) 2n. 2 Simple counting for union bound: 1 Initial configurations: 2 n ; 2 number of sequences of length 2n: n 2n.
Proof Overview Intuition for the Proof W.h.p, from any initial configuration, there is no improving sequence of length 2n with the total improvement less than ɛ = poly(n 1 ). Observation: P( α, W (0, ɛ]) φɛ. 1 Ideally, ignoring correlations, fix a sequence α 1,..., α 2n, ( ) P i [2n], α i, W (0, ɛ] (φɛ) 2n. 2 Simple counting for union bound: 1 Initial configurations: 2 n ; 2 number of sequences of length 2n: n 2n. 3 Pr( some initial configuration and some improving sequence of length 2n which improves by at most ɛ) 2 n n 2n (φɛ) 2n. 4 ɛ = (nφ) 2 gives o(1) probability.
Proof Overview Intuition for the Proof W.h.p, from any initial configuration, there is no improving sequence of length 2n with total improvement less than ɛ = poly(n 1 ). Observation: P( α, W (0, ɛ]) φɛ. 1 Ideally, ignoring correlations, fix a sequence α 1,..., α 2n, ( ) P i [2n], α i, W (0, ɛ] (φɛ) 2n. 2 Simple counting for union bound: 1 Initial configurations: 2 n ; 2 number of sequences of length 2n: n 2n. 3 Pr( some initial configuration and some improving sequence of length 2n which improves by at most ɛ) 2 n n 2n (φɛ) 2n. 4 ɛ = (nφ) 2 gives o(1) probability.
Proof Overview Intuition for the Proof Why this is the ideal case?... [W e1, W e2,... W e E ] α 1 α 2... α 2n [(0, ɛ), (0, ɛ),, (0, ɛ)].... 1 Say α 1 = α 3. Then they are identical vectors. 2 Say α 4 = α 1 + α 2. Then the fact α 4, W is small is not too surprising if knowing α 1, W and α 2, W are small. 3...... Sounds like the rank of the matrix α 1 α 2... α 2n is what matters....
Proof Overview Intuition for the Proof Justification for ( ) P i [2n], α i, W (0, ɛ] (φɛ) rank of the matrix. Lemma (M. Etscheid and H. Röglin 14) Let α 1,..., α k be k linearly independent vectors in Z E. Then ( ) P i [k], α i, W (c i, c i + ɛ) (φɛ) k. for any collection of real c i s.
Proof Overview Intuition for the Proof Justification for ( ) P i [2n], α i, W (0, ɛ] (φɛ) rank of the matrix. Lemma (M. Etscheid and H. Röglin 14) Let α 1,..., α k be k linearly independent vectors in Z E. Then ( ) P i [k], α i, W (c i, c i + ɛ) (φɛ) k. for any collection of real c i s.... Find the rank of the matrix [α t ] t = α 1 α 2... α 2n....
Proof Overview Intuition for the Proof W.h.p, from any initial configuration, there is no improving sequence of length 2n with the total improvement less than ɛ = poly(n 1 ). Observation: P( α, W (0, ɛ]) φɛ. 1 In fact, fix a sequence α 1,..., α 2n, ( ) P i [2n], α i, W (0, ɛ] (φɛ) rank of the sequence. 2 Simple counting for union bound: 1 Initial configurations: 2 n ; 2 number of sequences of length 2n: n 2n. 3 Pr( some initial configuration and some improving sequence of length 2n which improves by at most ɛ) 2 n n 2n (φɛ) rank. 4 Hope to have ɛ = (nφ) Θ(1) gives o(1) probability.
Main Proof Proof Main Question 1:... What do we know about the rank of matrix [α t ] t = α 1 α 2... α l?...
Main Proof Rank Table of Contents
Main Proof Rank Rank Understand matrix A Recall that if α { 1, 0, 1} E is to move v, then the (v, w) coordinate of α is σ(v)σ(w); the edges not incident to v are 0. V (G) = {v 1, v 2, v 3, v 4 }. Initial configuration σ 0 = (1, 1, 1, 1). Sequence of improving move: Q = ( v 1, v 2, v 3, v 2, v 1 ). σ 1 = ( 1, 1, 1, 1 ), σ 2 = ( 1, 1, 1, 1), σ 3 = ( 1, 1, 1, 1), σ 4 = ( 1, 1, 1, 1), σ 5 = (1, 1, 1, 1). A = [α] = v 1 v 2 v 3 v 2 v 1 {v 1, v 2 } 1 1 0 1 1 {v 1, v 3 } 1 0 1 0 1 {v 2,v 3 } 0 1 1 1 0 {v 1, v 4 } 1 0 0 0 1 {v 2,v 4 } 0 1 0 1 0 {v 3,v 4 } 0 0 1 0 0
Main Proof Rank Rank Given a sequence B, some parameters: l(b): length of B; s(b): number of distinct vertices in B; Example B = (v 1, v 2, v 3, v 2, v 1, v 2 ). Then l(b) = 6, s(b) = 3.
Main Proof Rank Rank Given a sequence B, some parameters: l(b): length of B; s(b): number of distinct vertices in B; Example B = (v 1, v 2, v 3, v 2, v 1, v 2 ). Then l(b) = 6, s(b) = 3. Lemma (Simple Rank Lemma) 1 Rank(B) is at least s(b) if s(b) < n. 2 Rank(B) is at least s(b) 1 if s(b) = n.
Main Proof Rank Rank Lemma (Simple Rank Lemma) 1 Rank(B) is at least s(b) if s(b) < n. 2 Rank(B) is at least s(b) 1 if s(b) = n. V = {v 1, v 2, v 3, v 4 }. Initial configuration σ 0 = (1, 1, 1, 1). Sequence of improving move: Q = (v 1, v 2, v 3, v 2, v 1 ). v 4 is a non-moving vertex. v 1 v 2 v 3 v 2 v 1 {v 1,v 2 } 1 1 0 1 1 {v 1,v 3 } 1 0 1 0 1 {v 2,v 3 } 0 1 1 1 0 A = {v 1, v 4 } 1 0 0 0 1 {v 2, v 4 } 0 1 0 1 0 {v 3, v 4 } 0 0 1 0 0
Main Proof Rank Rank Lemma (Simple Rank Lemma) 1 Rank(B) is at least s(b) if s(b) < n. 2 Rank(B) is at least s(b) 1 if s(b) = n. V = {v 1, v 2, v 3, v 4 }. Initial configuration σ 0 = (1, 1, 1, 1). Sequence of improving move: Q = (v 1, v 2, v 3, v 2, v 1 ). v 4 is a non-moving vertex. v 1 v 2 v 3 v 2 v 1 {v 1,v 2 } 1 1 0 1 1 {v 1,v 3 } 1 0 1 0 1 {v 2,v 3 } 0 1 1 1 0 A = {v 1, v 4 } 1 0 0 0 1 {v 2, v 4 } 0 1 0 1 0 {v 3, v 4 } 0 0 1 0 0 rank is at least 3.
Main Proof Rank Rank - When s(l) is large If the number of moving vertices s(l) = Θ(n), then we are good: W.h.p, from any initial configuration, there is no improving sequence of length 2n with total improvement less than ɛ = poly(n 1 ). 1 In fact, fix a sequence α 1,..., α 2n, ( ) P i [2n], α i, W (0, ɛ] (φɛ) rank of the sequence (φɛ) Θ(n). 2 Simple counting for union bound: 1 Initial configurations: 2 n ; 2 number of sequences of length 2n: n 2n. 3 Pr( some initial configuration and some improving sequence of length 2n which improves by at most ɛ) 2 n n 2n (φɛ) Θ(n). 4 ɛ = (nφ) Θ(1) indeed gives o(1) probability.
Main Proof Rank Rank - When s(l) is small But given l(l) = l, s(l) can be tiny, like l. 1 In fact, fix a sequence α 1,..., α 2n, ( ) P i [2n], α i, W (0, ɛ] (φɛ) rank of the sequence (φɛ) s(l). 2 Simple counting for union bound: 1 Initial configurations: 2 n ; 2 number of sequences of length l: n l. 3 Pr( some initial configuration and some improving sequence of length l which improves by at most ɛ) 2 n n l (φɛ) s(l) = 2 n n l (φɛ) l. 4 ɛ = (nφ) Θ(1) cannot give o(1) probability.
Main Proof Rank Rank - When s(l) is small Potential Solution: we can reduce the length to increase the ratio between l(l)/s(l). Example 1,3,2,4,6,2,3,8,9,10. 1,3,2,4,6 and 2,3,8,9,10. By halving this sequence, one side has a larger l/s ratio compared to the original one.
Main Proof Rank Rank - When s(l) is small Potential Solution: we can reduce the length to increase the ratio between l(l)/s(l). Example 1,3,2,4,6,2,3,8,9,10. 1,3,2,4,6 and 2,3,8,9,10. By halving this sequence, one side has a larger l/s ratio compared to the original one. Eventually we get a block B with s(b) l(b)/2.
Main Proof Rank Rank - When s(l) is small Eventually we get a block B in L with s(b) l(b)/2. W.h.p, from any initial configuration, there is no improving sequence of length 2n with the total improvement less than ɛ = poly(n 1 ). 1 In fact, fix a sequence α 1,..., α 2n, we can find a block B such that ( ) P i [2n], α i, W (0, ɛ] (φɛ) rank of the sequence (φɛ) l(b)/2. 2 Simple counting for union bound: 1 Initial configurations: 2 n ; 2 number of sequences of length l: n l. 3 Pr( some initial configuration and some improving sequence of length 2n which improves by at most ɛ) 2 n n l (φɛ) l(b)/2. But l(b) can be o(n). Need to (1) improve rank, and/or (2) do a better union bound replace 2 n by something smaller.
Main Proof Rank Rank - Previous work M. Etscheid and H. Rőglin 14: exists a block B with rank(b) s 2 (B) l(b)/ log(n). by some linear transformation of A by only using repeated vertices, and the non-moving vertices do not matter. 1 In fact, fix a sequence α 1,..., α 2n, we can find a block B such that ( ) P i [2n], α i, W (0, ɛ] (φɛ) rank of the sequence (φɛ) l(b)/ log(n). 2 Simple counting for union bound: 1 Initial configurations: 2 s(b) ; 2 number of such blocks: n l(b). 3 Pr( some initial configuration and some improving sequence of length 2n which improves by at most ɛ) 2 s(b) n l(b) (φɛ) l(b)/ log(n). 4 ɛ = (nφ) log(n) gives o(1) probability. Unfortunately s 2 (B) l(b)/ log(n) is sharp. Have to use singleton vertices but then how to improve the union bound?
Main Proof Rank Smart Union Bound : 2 n 2 s(b) (2n/ɛ) s(b) v moves : α, W = W vu σ(v)σ(u) + σ(v) W vw σ(w) u moving vertex in B w non-moving = α, W subgraph on moving vertices + σ(v) W vw σ(w). w non-moving
Smart Union Bound : 2 n v moves : α, W = Main Proof u moving vertex in B Rank 2 s(b) (2n/ɛ) s(b) W vu σ(v)σ(u) + σ(v) W vw σ(w) w non-moving = α, W subgraph on moving vertices + σ(v) W vw σ(w). w non-moving No need for 2 n, i.e., no need to union bound over all possible initial spins of the non-moving vertices. For each moving vertex v B, W vw σ(w) does not change w non-moving throughout the block B!
Smart Union Bound : 2 n v moves : α, W = Main Proof u moving vertex in B Only need: quantization of Rank 2 s(b) (2n/ɛ) s(b) W vu σ(v)σ(u) + σ(v) W vw σ(w) w non-moving = α, W subgraph on moving vertices + σ(v) W vw σ(w). w non-moving No need for 2 n, i.e., no need to union bound over all possible initial spins of the non-moving vertices. For each moving vertex v B, W vw σ(w) does not change w non-moving throughout the block B! W vw σ(w). w non-moving
Main Proof Rank Smart Union Bound : 2 n 2 s(b) (2n/ɛ) s(b) Quantization: for each moving vertex v, pick some integer k v such that W vw σ(w) (k v ɛ, (k v + 1)ɛ]. w non-moving Recall v moves : α, W = α, W subgraph on moving vertices + σ(v) W vw σ(w). w non-moving α, W (0, ɛ] = α, W subgraph on moving vertices a fixed interval of length 2ɛ.
Main Proof Rank Smart Union Bound : 2 n 2 s(b) (2n/ɛ) s(b) Quantization: for each moving vertex v, pick some integer k v such that W vw σ(w) (k v ɛ, (k v + 1)ɛ]. w non-moving Recall v moves : α, W = α, W subgraph on moving vertices + σ(v) W vw σ(w). w non-moving α, W (0, ɛ] = α, W subgraph on moving vertices a fixed interval of length 2ɛ. New union bound: # choices for k v s: moving vertices: 2 s(b). ( ) 2n s(b). # initial spins for ɛ
Main Proof Rank After the smart union bound... 1 In fact, fix a sequence α 1,..., α 2n, we can find a block B such that l(b) = 2s(B). ( ) P i [2n], α i, W (0, ɛ] ( ) P moves B, α, W subgraph on moving vertices intervals length 2ɛ (2φɛ) rank(b). 2 Initial configurations: 2 n 2 s(b) (2n/ɛ) s(b). 3 number of sequences of length l(b) = 2s(B): s(b) 2s(B). Pr( some initial configuration and some improving sequence of length 2n which improves by at most ɛ) n 3s(B) ɛ rank(b) s(b). s(b) 2 s(b) (2n/ɛ) s(b) s(b) 2s(B) (2φɛ) rank(b) s(b) So we need rank(b) s(b) = Θ(s(B)).
Main Proof Rank Rank Lemma Proposition (O. Angel, S. Bubeck, Y. Peres, FW 16) Any sequence of improving moves of length 2n has a block B s.t. l(b) = 2s(B), rank(b) 1.25s(B). Main Question: 1 What is a lower bound on the rank? Answers: Given any sequence of length 2n, there is a block B with rank(b) 1.25s(B). Therefore fix the sequence B, ( ) P α t B, α i, W (0, ɛ] (φɛ) rank of the sequence (φɛ) 1.25s(B). 2 How do to the union bound? 2 s(b) (2n/ɛ) s(b)
Main Proof Rank Proof Main Question: 1 What is a lower bound on the rank? Answers: Given any sequence of length 2n, there is a block B with rank(b) 1.25s(B). Therefore fix the sequence B, ( ) P α t B, α i, W (0, ɛ] (2φɛ) rank of the sequence (2φɛ) 1.25s(B). 2 How do to the union bound? 2 s(b) (2n/ɛ) s(b)
Main Proof Rank Proof Main Question: 1 What is a lower bound on the rank? Answers: Given any sequence of length 2n, there is a block B with rank(b) 1.25s(B). Therefore fix the sequence B, ( ) P α t B, α i, W (0, ɛ] (2φɛ) rank of the sequence (2φɛ) 1.25s(B). 2 How do to the union bound? 2 s(b) (2n/ɛ) s(b) Putting together, let ɛ = n 12.1. Pr( some initial configuration and an improving sequence of length 2n with improvement smaller than ɛ) n s(b)=1 2 s(b) n 3s(B) ɛ 0.25s(B) = o(1).
Conclusion Remarks Conclusion We study local search heuristics for finding cuts of large weight in a graph, and prove that such methods find locally optimal cuts in smoothed polynomial time when all the edges are smoothed.
Conclusion Remarks Conclusion We study local search heuristics for finding cuts of large weight in a graph, and prove that such methods find locally optimal cuts in smoothed polynomial time when all the edges are smoothed. Conjecture The smoothed complexity for MAX-CUT FLIP for complete graph on n vertices is Õ(n). Conjecture The smoothed complexity for MAX-CUT FLIP when only the nonzero weight edges are smoothed is also polynomial.
Conclusion Remarks Thank you for your attention!