Sublinear Algorithms Lecture 3. Sofya Raskhodnikova Penn State University

Similar documents
Lecture 4. 1 Estimating the number of connected components and minimum spanning tree

Sublinear Algorithms Lecture 3

Testing that distributions are close

Probability Revealing Samples

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 5: Derandomization (Part II)

Lecture 1: 01/22/2014

Sublinear-Time Algorithms

With P. Berman and S. Raskhodnikova (STOC 14+). Grigory Yaroslavtsev Warren Center for Network and Data Sciences

Lecture 13: 04/23/2014

25 Minimum bandwidth: Approximation via volume respecting embeddings

6.842 Randomness and Computation March 3, Lecture 8

Algorithm Design and Analysis

2. A vertex in G is central if its greatest distance from any other vertex is as small as possible. This distance is the radius of G.

Minimum spanning tree

Testing Cluster Structure of Graphs. Artur Czumaj

Testing Lipschitz Functions on Hypergrid Domains

Lecture 4 Thursday Sep 11, 2014

Discrete Optimization 2010 Lecture 2 Matroids & Shortest Paths

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 3 Luca Trevisan August 31, 2017

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

Algorithm Design and Analysis

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9

Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations

Sublinear Algorithms for Big Data

CMPSCI611: The Matroid Theorem Lecture 5

Matching Polynomials of Graphs

More on NP and Reductions

Learning and Fourier Analysis

Practice Final Solutions. 1. Consider the following algorithm. Assume that n 1. line code 1 alg(n) { 2 j = 0 3 if (n = 0) { 4 return j

Ring Sums, Bridges and Fundamental Sets

1 Some loose ends from last time

Minimum Spanning Trees

Intelligent Systems (AI-2)

Intro to Theory of Computation

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lecture 22: Oct 29, Interactive proof for graph non-isomorphism

Cographs; chordal graphs and tree decompositions

CS 580: Algorithm Design and Analysis

Notes on MapReduce Algorithms

Contents Lecture 4. Greedy graph algorithms Dijkstra s algorithm Prim s algorithm Kruskal s algorithm Union-find data structure with path compression

Learning convex bodies is hard

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

Lower Bounds for Testing Bipartiteness in Dense Graphs

Intro to Theory of Computation

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

Fiedler s Theorems on Nodal Domains

Computational Models Lecture 8 1

Tree-width and algorithms

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows

Lecture 16: Roots of the Matching Polynomial

Lecture 15: Expanders

Testing Expansion in Bounded-Degree Graphs

NATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: Time Allowed 2 Hours

Shortest Path Algorithms

Chapter 8 The Disjoint Sets Class

Intro to Theory of Computation

Markov Decision Processes

1 Adjacency matrix and eigenvalues

Lecture 1: Introduction to Sublinear Algorithms

Testing Graph Isomorphism

Randomness and Computation March 13, Lecture 3

Intro to Theory of Computation

Scribes: Po-Hsuan Wei, William Kuzmaul Editor: Kevin Wu Date: October 18, 2016

Introduction to Cryptography

Testing the Expansion of a Graph

Lecture 14: Random Walks, Local Graph Clustering, Linear Programming

Computational Models Lecture 8 1

ACO Comprehensive Exam 19 March Graph Theory

Geometric Steiner Trees

A Generalized Turán Problem and its Applications

Ngày 20 tháng 7 năm Discrete Optimization Graphs

Learning and Fourier Analysis

FIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS

Lecture 12: Randomness Continued

Lecture 06 01/31/ Proofs for emergence of giant component

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

CSC 5170: Theory of Computational Complexity Lecture 13 The Chinese University of Hong Kong 19 April 2010

Lecture 13: Spectral Graph Theory

Networks Cannot Compute Their Diameter in Sublinear Time

1 Distributional problems

CS281A/Stat241A Lecture 19

Homework Assignment 1 Solutions

CMSC 858K Advanced Topics in Cryptography March 4, 2004

Proclaiming Dictators and Juntas or Testing Boolean Formulae

CS60007 Algorithm Design and Analysis 2018 Assignment 1

CMPUT 675: Approximation Algorithms Fall 2014

Greedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 10

An Optimal Lower Bound for Monotonicity Testing over Hypergrids

Lecture 4 February 2nd, 2017

Algorithm Design and Analysis

Testing Problems with Sub-Learning Sample Complexity

CS369E: Communication Complexity (for Algorithm Designers) Lecture #8: Lower Bounds in Property Testing

Lecture 5: January 30

Notes for Lecture 14 v0.9

CSE 591 Homework 3 Sample Solutions. Problem 1

6.842 Randomness and Computation February 24, Lecture 6

Lecture 11: Generalized Lovász Local Lemma. Lovász Local Lemma

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

11.1 Set Cover ILP formulation of set cover Deterministic rounding

Transcription:

Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1

Graph Properties

Testing if a Graph is Connected [Goldreich Ron] Input: a graph G = (V, E) on n vertices in adjacency lists representation (a list of neighbors for each vertex) maximum degree d, i.e., adjacency lists of length d with some empty entries Query (v, i), where v V and i [d]: entry i of adjacency list of vertex v Exact Answer: W(dn) time Approximate version: Is the graph connected or ²-far from connected? dist G 1, G 2 Time: O 1 ε 2 d = # of entires in adjacency lists on which G 1 and G 2 differ dn today + improvement on HW No dependence on n! 3

Testing Connectedness: Algorithm Connectedness Tester(G, d, ε) 1. Repeat s=16/ed times: 2. pick a random vertex u 3. determine if connected component of u is small: perform BFS from u, stopping after at most 8/ed new nodes 4. Reject if a small connected component was found, otherwise accept. Run time: O(d/e 2 d 2 )=O(1/e 2 d) Analysis: Connected graphs are always accepted. Remains to show: If a graph is ²-far from connected, it is rejected with probability 2 3 4

Testing Connectedness: Analysis Claim 1 If G is e-far from connected, it has edn 4 connected components. Claim 2 If G is e-far from connected, it has edn of size at most 8/ed. 8 connected components If Claim 2 holds, at least edn 8 By Witness lemma, it suffices to sample from a small connected component. nodes are in small connected components. 2 8 = 16 edn/n ed nodes to detect one 5

Testing Connectedness: Proof of Claim 1 Claim 1 If G is e-far from connected, it has edn 4 connected components. We prove the contrapositive: If G has < edn 4 connected components, one can make G connected by modifying < e fraction of its representation, i.e., < edn entries. If there are no degree restrictions, k components can be connected by adding k-1 edges, each affecting 2 nodes. Here, k < edn, so 2k-2 < edn. What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d? 4 6

Freeing up an Adjacency List Entry Claim 1 If G is e-far from connected, it has edn 4 connected components. What if adjacency lists of all vertices in a component are full, i.e., all vertex degrees are d? v Consider an MST of this component. Let v be a leaf of the MST. Disconnect v from a node other than its parent in the MST. Two entries are changed while keeping the same number of components. Thus, k components can be connected by adding 2k-1 edges, each affecting 2 nodes. Here, k < edn, so 4k-2 < edn. 4 7

Testing Connectedness: Proof of Claim 2 Claim 1 If G is e-far from connected, it has edn 4 connected components. Claim 2 If G is e-far from connected, it has edn of size at most 8/ed. 8 connected components If Claim 1 holds, there are at least edn 4 connected components. Their average size n edn/4 = 4 en. By an averaging argument (or Markov inequality), at least half of the components are of size at most twice the average. 8

Testing if a Graph is Connected [Goldreich Ron] Input: a graph G = (V, E) on n vertices in adjacency lists representation (a list of neighbors for each vertex) maximum degree d Connected or ε-far from connected? O 1 ε 2 d time (no dependence on n) 9

Randomized Approximation in sublinear time Simple Examples

Randomized Approximation: a Toy Example Input: a string w 0,1 n Goal: Estimate the fraction of 1 s in w (like in polls) It suffices to sample s = 1 ε 2 positions and output the average to get the fraction of 1 s ±ε (i.e., additive error ε) with probability 2/3 Hoeffding Bound Let Y 1,, Y s be independently distributed random variables in [0,1] and let Y = s Y i (sample sum). Then Pr Y E Y δ 2e 2δ2 /s. i=1 Y i = value of sample i. Then E[Y] = s i=1 0 0 0 1 0 1 0 0 E[Y i ] = s (fraction of 1 s in w) Pr (sample average) fraction of 1 s in w ε = Pr Y E Y εs 2e 2δ2 /s = 2e 2 < 1/3 Apply Hoeffding Bound with δ = εs substitute s = 1 ε 2 11

Approximating # of Connected Components [Chazelle Rubinfeld Trevisan] Input: a graph G = (V, E) on n vertices in adjacency lists representation (a list of neighbors for each vertex) maximum degree d Exact Answer: W(dn) time Additive approximation: # of CC ±εn with probability 2/3 Time: Known: O d ε 2 log 1 ε, W d ε 2 Today: O d ε 3. No dependence on n! Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/s/course/6/fa10/6.896/coursematerial/topics/topic3/lecturenotes/lecst11/lecst11.pdf 12

Approximating # of CCs: Main Idea Let C = number of components For every vertex u, define n u = number of nodes in u s component for each component A: u V 1 n u = C u A 1 n u = 1 Breaks C up into contributions of different nodes Estimate this sum by estimating n u s for a few random nodes If u s component is small, its size can be computed by BFS. If u s component is big, then 1/n u is small, so it does not contribute much to the sum Can stop BFS after a few steps Similar to property tester for connectedness [Goldreich Ron] 13

Approximating # of CCs: Algorithm Estimating n u = the number of nodes in u s component: Let estimate n u = min n u, 2 ε When u s component has 2/e nodes, n u = n u Else n u = 2/e, and so 0 < 1 1 < 1 = ε n u n u n u 2 Corresponding estimate for C is C = C C = u V 1 n u u V APPROX_#_CCs (G, d, ε) 1. Repeat s=θ(1/e 2 ) times: 1 n u u V u V 1 n u. It is a good estimate: 1 1 εn n u n u 2 2. pick a random vertex u 3. compute n u via BFS from u, stopping after at most 2/e new nodes 4. Return C = (average of the values 1/n u ) n Run time: O(d /e 3 ) a b c 1 n u 1 n u ε 2 14

Approximating # of CCs: Analysis Want to show: Pr C C > εn 2 1 3 Hoeffding Bound Let Y 1,, Y s be independently distributed random variables in [0,1] and let Y = s Y i (sample sum). Then Pr Y E Y δ 2e 2δ2 /s. i=1 Let Y i = 1/n u for the i th vertex u in the sample Y = s i=1 Y i = sc n Pr C C > εn 2 and E[Y] = s E[Y i ] = s E[Y 1 ] = s 1 1 i=1 n u V = Pr n s Y n s E Y > εn 2 Need s = Θ 1 ε 2 samples to get probability 1 3 n v = sc n = Pr Y E Y > εs s 2 2e ε2 2 15

Approximating # of CCs: Analysis So far: C C εn 2 Pr C C > εn 2 1 3 With probability 2 3, C C C C + C C εn 2 + εn 2 εn Summary: The number of connected components in n-vetex graphs of degree at most d can be estimated within ±εn in time O d ε 3. 16

Minimum spanning tree (MST) What is the cheapest way to connect all the dots? Input: a weighted graph with n vertices and m edges 3 4 1 2 5 7 Exact computation: Deterministic O(m inverse-ackermann(m)) time [Chazelle] Randomized O(m) time [Karger Klein Tarjan] Partially based on slides by Ronitt Rubinfeld: http://stellar.mit.edu/s/course/6/fa10/6.896/coursematerial/topics/topic3/lecturenotes/lecst11/lecst11.pdf 17

Approximating MST Weight in Sublinear Time [Chazelle Rubinfeld Trevisan] Input: a graph G = (V, E) on n vertices in adjacency lists representation maximum degree d and maximum allowed weight w weights in {1,2,,w} Output: (1+ ε)-approximation to MST weight, w MST Time: Known: O Today: O dw ε 3 log dw ε dw3 log w ε 3, W dw ε 2 No dependence on n! 18

Idea Behind Algorithm Characterize MST weight in terms of number of connected components in certain subgraphs of G Already know that number of connected components can be estimated quickly 19

MST and Connected Components: Warm-up Recall Kruskal s algorithm for computing MST exactly. Suppose all weights are 1 or 2. Then MST weight = (# weight-1 edges in MST) + 2 (# weight-2 edges in MST) = n 1 + (# of weight-2 edges in MST) = n 1 + (# of CCs induced by weight-1 edges) 1 MST has n 1 edges By Kruskal weight 1 weight 2 connected components induced by weight-1 edges MST

MST and Connected Components In general: Let G i = subgraph of G containing all edges of weight i C i = number of connected components in G i Then MST has C i 1 edges of weight > i. Claim Let β i be the number of edges of weight > i in MST Each MST edge contributes 1 to w MST, each MST edge of weight >1 contributes 1 more, each MST edge of weight >2 contributes one more, w 1 w MST G = β i i=0 w 1 w MST G = n w + C i i=1 w 1 = (C i i=0 1) = w + w 1 i=0 C i w 1 = n w + C i i=1 21

Algorithm for Approximating w MST APPROX_MSTweight (G, w, d, ε) 1. For i = 1 to w 1 do: 2. C i APPROX_#CCs(G i, d, ε/w). w 1 i=1. 3. Return w MST = n w + C i Analysis: Suppose all estimates of C i s are good: C i C i ε w n. Then w MST w MST = w 1 i=1 (C i C i ) w 1 i=1 C i C i w Pr[all w 1 estimates are good] 2/3 w 1 Not good enough! Need error probability 1 Claim. w MST G = n w + C i w 1 i=1 3w for each iteration ε w n = εn Then, by Union Bound, Pr[error] w 1 = 1 3w 3 Can amplify success probability of any algorithm by repeating it and taking the median answer. Can take more samples in APPROX_#CCs. What s the resulting run time? 22

Multiplicative Approximation for w MST For MST cost, additive approximation multiplicative approximation w MST n 1 w MST n/2 for n 2 εn-additive approximation: w MST εn w MST w MST + εn (1 ± 2ε)-multiplicative approximation: w MST 1 2ε w MST εn w MST w MST + εn w MST 1 + 2ε 23