Robert Seffi Roy Kunal Krauthgamer Naor Schwartz Talwar Weizmann Institute Technion Microsoft Research Microsoft Research
Problem Definition Introduction Definitions Related Work Our Result Input: G = (V, E), w : E R +. Capacities n 1, n 2,..., n k s.t. k j=1 n j n.
Problem Definition Introduction Definitions Related Work Our Result Input: Output: G = (V, E), w : E R +. Capacities n 1, n 2,..., n k s.t. k j=1 n j n. A partition S 1,..., S k of V where each S j n j minimizing: 1 2 k δ(s j ). j=1
Problem Definition Introduction Definitions Related Work Our Result Input: Output: Note: G = (V, E), w : E R +. Capacities n 1, n 2,..., n k s.t. k j=1 n j n. A partition S 1,..., S k of V where each S j n j minimizing: 1 2 k δ(s j ). j=1 Number of parts k might depend on n. Capacities might be of different magnitudes.
Example - Cloud Computing Definitions Related Work Our Result processes
Example - Cloud Computing Definitions Related Work Our Result processes
Example - Cloud Computing Definitions Related Work Our Result processes servers n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9
Example - Cloud Computing Definitions Related Work Our Result processes servers n 1 n 2 n 3 n 4 n 5 n 6 n 7 n 8 n 9
Motivation Introduction Definitions Related Work Our Result Theoretical: Captures well studied problems: Min-Bisection, Min b-balanced-cut, Min k-partitioning.
Motivation Introduction Definitions Related Work Our Result Theoretical: Captures well studied problems: Min-Bisection, Min b-balanced-cut, Min k-partitioning. Practical: Cloud and Parallel Computing: parallelism. Hardware design: VLSI layout, circuit testing. Data mining: clustering. Social network analysis: community discovery. Vision: pattern recognition. Scientific Computing: linear systems.
Related Work Introduction Definitions Related Work Our Result Heuristics: [Barnes-82], [Barnes-Vanneli-Walker-88], [Sanchis-89], [Hadley-Mark-Vanneli-92], [Rendl-Wolkowicz-95]... Mainly use spectral theory, local search and quadratic programming.
Related Work Introduction Definitions Related Work Our Result Heuristics: [Barnes-82], [Barnes-Vanneli-Walker-88], [Sanchis-89], [Hadley-Mark-Vanneli-92], [Rendl-Wolkowicz-95]... Mainly use spectral theory, local search and quadratic programming. Worst Case Guarantee No meaningful known bounds.
Related Work (Cont.) Introduction Definitions Related Work Our Result Balanced Graph Partitioning: (n j n /k)
Related Work (Cont.) Introduction Definitions Related Work Our Result Balanced Graph Partitioning: (n j n /k) Min-Bisection: (k = 2) { O(log 3/2 n) [Feige-Krauthgamer-02] True Approx. O (log n) [Räcke-08]
Related Work (Cont.) Introduction Definitions Related Work Our Result Balanced Graph Partitioning: (n j n /k) Min-Bisection: (k = 2) { O(log 3/2 n) [Feige-Krauthgamer-02] True Approx. O (log n) [Räcke-08] Bicriteria Approx. Related to Sparsest-Cut and Min b-balanced-cut. { O (log n) [Leighton-Rao-99] O ( log n ) [Arora-Rao-Vazirani-08]
Related Work (Cont.) Introduction Definitions Related Work Our Result Balanced Graph Partitioning: (n j n /k) Min-Bisection: (k = 2) { O(log 3/2 n) [Feige-Krauthgamer-02] True Approx. O (log n) [Räcke-08] Bicriteria Approx. Related to Sparsest-Cut and Min b-balanced-cut. { O (log n) [Leighton-Rao-99] O ( log n ) [Arora-Rao-Vazirani-08] Min k-partitioning: (general k) NP-Hardness no true approximation [Andreev-Räcke-06]
Related Work (Cont.) Introduction Definitions Related Work Our Result Balanced Graph Partitioning: (n j n /k) Min-Bisection: (k = 2) { O(log 3/2 n) [Feige-Krauthgamer-02] True Approx. O (log n) [Räcke-08] Bicriteria Approx. Related to Sparsest-Cut and Min b-balanced-cut. { O (log n) [Leighton-Rao-99] O ( log n ) [Arora-Rao-Vazirani-08] Min k-partitioning: (general k) NP-Hardness no true approximation [Andreev-Räcke-06] Bicriteria Approx. { (O (log n), 2) [Even-Naor-Rao-Schieber-99] ( O ( log n log k ), 2 ) [Krauthgamer-Naor-S-09] { (O(ε 2 log 3/2 n), 1 + ε) [Andreev-Räcke-06] (O(log n), 1 + ε) [Feldmann-Forschini-12]
Related Work (Cont.) Introduction Definitions Related Work Our Result Capacitated Metric Labeling: The same as with additional assignment costs. (O (log n), O((log k))) n j n /k [Naor-S-05] (O (log n), 1) constant k [Andrews-Hajiaghayi-Karloff-Moitra-11] NP ZP T IME ( n polylog(n)) No finite approximation that violates capacities by O(log 1/2 ε k), ε > 0. [Andrews-Hajiaghayi-Karloff-Moitra-11]
Our Result Introduction Definitions Related Work Our Result Theorem [Krauthgamer-Naor-S-Talwar-13] There is a bicriteria approximation algorithm achieving a guarantee of: (O (log n), O(1)) for.
Known - Issues 1 Recursive Partitioning: How many vertices to cut in each step?
Known - Issues 1 Recursive Partitioning: How many vertices to cut in each step? 2 Spreading Metrics: Only spreading with average capacity - insufficient.
Known - Issues 1 Recursive Partitioning: How many vertices to cut in each step? 2 Spreading Metrics: Only spreading with average capacity - insufficient. 3 Räcke s Tree Decomposition: Dynamic programming does not seem to yield poly running time.
Known - Issues 1 Recursive Partitioning: How many vertices to cut in each step? 2 Spreading Metrics: Only spreading with average capacity - insufficient. 3 Räcke s Tree Decomposition: Dynamic programming does not seem to yield poly running time. Our Approach Configuration LP. Randomized rounding + concentration via stopping times.
Configuration LP Introduction F j {S : S V, S n j }
Configuration LP Introduction F j {S : S V, S n j } (P) min s.t. 1 2 k j=1 k S F j δ(s) x S,j j=1 S F j:u S S F j x S,j 1 x S,j 0 x S,j 1 u V j = 1,..., k j = 1,..., k, S F j
Configuration LP (Cont.) Theorem (P) can be efficiently solved up to a loss of O(log n) in the objective.
Configuration LP (Cont.) Theorem (P) can be efficiently solved up to a loss of O(log n) in the objective. Proof Outline: Dual separation oracle of (P) relates to Min ρ-unbalanced-cut. from [Räcke-08] give an O(log n) approximation.
Configuration LP (Cont.) Theorem (P) can be efficiently solved up to a loss of O(log n) in the objective. Proof Outline: Dual separation oracle of (P) relates to Min ρ-unbalanced-cut. from [Räcke-08] give an O(log n) approximation. Difficulty: Applying the above in algorithms for solving mixed packing/covering LPs yields O(log n) loss in cost and capacity.
Configuration LP (Cont.) Theorem (P) can be efficiently solved up to a loss of O(log n) in the objective. Proof Outline: Dual separation oracle of (P) relates to Min ρ-unbalanced-cut. from [Räcke-08] give an O(log n) approximation. Difficulty: Applying the above in algorithms for solving mixed packing/covering LPs yields O(log n) loss in cost and capacity. Solution: Scaling constraints differently.
Randomized Introduction Assumption: n 1 n 2... n k and each n j is a power of 2.
Randomized Introduction Assumption: n 1 n 2... n k and each n j is a power of 2. Notation: n 1 = n 2 = n 3 > n 4 = n 5 > n 6 = n 7 = n 8 > W 1 W 2 W 3
Randomized Introduction Assumption: n 1 n 2... n k and each n j is a power of 2. Notation: n 1 = n 2 = n 3 > n 4 = n 5 > n 6 = n 7 = n 8 > W 1 W 2 W 3 } W i {j : n j = 2 (i 1) n 1 i = 1, 2,..., l k i W i
Randomized (Cont.) Idea: Covering vertices by random cuts.
Randomized (Cont.) Idea: Covering vertices by random cuts. - General Approach 1 H i for every i = 1,..., l. 2 While V : Choose j Unif[1,..., k]. Choose S F j w.p. x S,j. Let r be the mega-bucket s.t. j W r. H r H r {S V }. V V \ S.
Randomized - Example H 1 = φ H 2 = φ H 3 = φ H l = φ
Randomized - Example H 1 = φ H 2 = φ H 3 = φ S 1 H l = φ S 1, j 1 j 1 W 3
Randomized - Example H 1 = φ H 2 = φ H 3 = S 1 S 1 H l = φ S 1, j 1 j 1 W 3
Randomized - Example H 1 = φ H 2 = φ H 3 = S 1 S 2 S 1 H l = φ S 2, j 2 j 2 W 1
Randomized - Example H 1 = S 2 \S 1 H 2 = φ H 3 = S 1 S 2 S 1 H l = φ S 2, j 2 j 2 W 1
Randomized - Example H 1 = S 2 \S 1 H 2 = φ H 3 = S 1 S 2 S 1 H l = φ S 3 S 3, j 3 j 3 W 3
Randomized - Example H 1 = S 2 \S 1 H 2 = φ H 3 = S 1, S 3 \ S 1 S 2 S 2 S 1 H l = φ S 3 S 3, j 3 j 3 W 3
Randomized - Example H 1 = S 2 \S 1 H 2 = φ H 3 = S 1, S 3 \ S 1 S 2 S 2 S 1 S 4 H l = φ S 3 S 4, j 4 j 4 W 2
Randomized - Example H 1 = S 2 \S 1 H 2 = S 4 \ S 2 S 3 H 3 = S 1, S 3 \ S 1 S 2 S 2 S 1 S 4 H l = φ S 3 S 4, j 4 j 4 W 2
Randomized - Example H 1 = S 2 \S 1 H 2 = S 4 \ S 2 S 3 H 3 = S 1, S 3 \ S 1 S 2 S 5 S 2 S 1 S 4 H l = φ S 3 S 5, j 5 j 5 W l
Randomized - Example H 1 = S 2 \S 1 H 2 = S 4 \ S 2 S 3 H 3 = S 1, S 3 \ S 1 S 2 S 5 S 2 S 1 S 4 H l = S 5 \ S 1 S 2 S 3 S 5, j 5 j 5 W l
Randomized (Cont.) Question: What to do when all vertices are covered?
Randomized (Cont.) Question: What to do when all vertices are covered? Merge: While H i > k i merge smallest two cuts in H i.
Randomized (Cont.) Question: What to do when all vertices are covered? Merge: While H i > k i merge smallest two cuts in H i. Theorem - Cost E [cost(alg)] 2 cost (P). Proof Outline: k Pr [u and v covered in different iterations] x S,j. j=1 S F j:(u,v) δ(s)
Capacity - Attempt I Observation - Interchanging Cuts Within W i It suffices to upper bound: N i number of vertices covered by H i at the end.
Capacity - Attempt I Observation - Interchanging Cuts Within W i It suffices to upper bound: N i number of vertices covered by H i at the end. Note: E [N i ] k i 2 (i 1) n 1.
Capacity - Attempt I Observation - Interchanging Cuts Within W i It suffices to upper bound: N i number of vertices covered by H i at the end. Note: E [N i ] k i 2 (i 1) n 1. l = O(log k) by merging every W i with k i 2 1 /2(i 1) into W 1. (l is number of mega-buckets)
Capacity - Attempt I Observation - Interchanging Cuts Within W i It suffices to upper bound: N i number of vertices covered by H i at the end. Note: E [N i ] k i 2 (i 1) n 1. l = O(log k) by merging every W i with k i 2 1 /2(i 1) into W 1. (l is number of mega-buckets) Conclusion: Markov + union bound O(log k) capacity violation.
Capacity - Attempt II Martingale M i,t E [N i (S 1, j 1 ),..., (S t, j t )] {M i,t } t=0 is a martingale with respect to {(S t, j t )} t=1.
Capacity - Attempt II Martingale M i,t E [N i (S 1, j 1 ),..., (S t, j t )] {M i,t } t=0 is a martingale with respect to {(S t, j t )} t=1. Note: M i,0 = E [N i ].
Capacity - Attempt II Martingale M i,t E [N i (S 1, j 1 ),..., (S t, j t )] {M i,t } t=0 is a martingale with respect to {(S t, j t )} t=1. Note: M i,0 = E [N i ]. M i,t M i,t 1 k i 2 (i 1) n 1. (Lipschitz)
Capacity - Attempt II Martingale M i,t E [N i (S 1, j 1 ),..., (S t, j t )] {M i,t } t=0 is a martingale with respect to {(S t, j t )} t=1. Note: M i,0 = E [N i ]. M i,t M i,t 1 k i 2 (i 1) n 1. (Lipschitz) Number of iterations to cover V : T = Θ(k log n).
Capacity - Attempt II Martingale M i,t E [N i (S 1, j 1 ),..., (S t, j t )] {M i,t } t=0 is a martingale with respect to {(S t, j t )} t=1. Note: M i,0 = E [N i ]. M i,t M i,t 1 k i 2 (i 1) n 1. (Lipschitz) Number of iterations to cover V : T = Θ(k log n). Conclusion: Azuma + union bound O( k log n log log k) capacity violation.
Capacity - Attempt III Worst Conditional Variance: Let (T 1, r 1 ),..., (T t 1, r t 1 ) be the realization that maximizes: Var [ M i,t M i,t 1 (S 1, j 1 ) = (T 1, r 1 ),..., (S t 1, j t 1 ) = (T t 1, r t 1 ) ]. v i,t is the worst variance value.
Capacity - Attempt III Worst Conditional Variance: Let (T 1, r 1 ),..., (T t 1, r t 1 ) be the realization that maximizes: Var [ M i,t M i,t 1 (S 1, j 1 ) = (T 1, r 1 ),..., (S t 1, j t 1 ) = (T t 1, r t 1 ) ]. v i,t is the worst variance value. Note: For every t a different conditioning might be chosen. Martingale concentration via bounded variances (Bernstein) is not sufficient: t 1 v i,t might be too big!
Capacity - Attempt III (Cont.) Pr Var [ M i,t M i,t 1 (S 1, j 1 ),..., (S t 1, j t 1 ) ] is small? t 1
Capacity - Attempt III (Cont.) Pr Var [ M i,t M i,t 1 (S 1, j 1 ),..., (S t 1, j t 1 ) ] is small? t 1 Theorem - Conditional Variances Sum [ Pr Var [ M i,t M i,t 1 (S 1, j 1 ),..., (S t 1, j t 1 ) ] t 1 ] 2α k i 2 (i 1) n 2 1 1 /α.
Capacity - Attempt III (Cont.) Pr Var [ M i,t M i,t 1 (S 1, j 1 ),..., (S t 1, j t 1 ) ] is small? t 1 Theorem - Conditional Variances Sum [ Pr Var [ M i,t M i,t 1 (S 1, j 1 ),..., (S t 1, j t 1 ) ] t 1 ] 2α k i 2 (i 1) n 2 1 1 /α. Intuition: In later iterations the changes are smaller in expectation.
Capacity - Attempt III (Cont.) Martingale {M i,t } t=0 Good Event Variances are small.
Capacity - Attempt III (Cont.) Martingale {M i,t } t=0 Good Event Variances are small.???
Capacity - Attempt III (Cont.) Martingale {M i,t } t=0 Good Event Variances are small.??? Freedman s Inequality (stopping-time based concentration)
Capacity - Attempt III (Cont.) Immediate Conclusion: Freedman s inequality yields O(log log k) capacity violation.
Capacity - Attempt III (Cont.) Immediate Conclusion: Freedman s inequality yields O(log log k) capacity violation. Question How do we get O(1) capacity violation?
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9
Instance Transformation (Cont.) What is an instance transformation?
Instance Transformation (Cont.) What is an instance transformation? Total capacity of non-empty W i grows by a constant c. Inverse transformation moves cuts from W i to some W j, j i. Inverse transformation incurs a c capacity violation.
Instance Transformation (Cont.) What is an instance transformation? Total capacity of non-empty W i grows by a constant c. Inverse transformation moves cuts from W i to some W j, j i. Inverse transformation incurs a c capacity violation. What does an instance transformation provide?
Instance Transformation (Cont.) What is an instance transformation? Total capacity of non-empty W i grows by a constant c. Inverse transformation moves cuts from W i to some W j, j i. Inverse transformation incurs a c capacity violation. What does an instance transformation provide? Large total capacity of W i Better concentration as the total capacity increases
Instance Transformation (Cont.) What is an instance transformation? Total capacity of non-empty W i grows by a constant c. Inverse transformation moves cuts from W i to some W j, j i. Inverse transformation incurs a c capacity violation. What does an instance transformation provide? Large total capacity of W i Better concentration as the total capacity increases Sum of conditional variances might be higher
Instance Transformation (Cont.) What is an instance transformation? Total capacity of non-empty W i grows by a constant c. Inverse transformation moves cuts from W i to some W j, j i. Inverse transformation incurs a c capacity violation. What does an instance transformation provide? Large total capacity of W i Better concentration as the total capacity increases Sum of conditional variances might be higher Probability that sum of conditional variances is small increases
Instance Transformation (Cont.) What is an instance transformation? Total capacity of non-empty W i grows by a constant c. Inverse transformation moves cuts from W i to some W j, j i. Inverse transformation incurs a c capacity violation. What does an instance transformation provide? Large total capacity of W i Better concentration as the total capacity increases Sum of conditional variances might be higher Probability that sum of conditional variances is small increases Conclusion: O(1) capacity violation.
Thank You! Questions?