COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 13: 04/23/2014 Spring 2014 Scribe: Psallidas Fotios Administrative: Submit HW problem solutions by Wednesday, 04/26/2014 11:59:59. 1 Overview 1.1 Last Time Graph property testing for (dense) graphs with N nodes and access to adjacency matrix (N sufficiently large); q-query testing with a O(q 2 )-query non-adaptive tester; Õ( 1 3 ) -query tester for bipartiteness ( ). 1.2 Today Broad generalization of ( ) for general graph partition properties (GGPT). Testing -freeness using O (1)-query algorithm: -removal lemma Szemerédi Regularization lemma Relevant readings Goldreich, Goldwasser, and Ron. Property testing and its connection to learning and approximation. [GGR98] Szemerédi. Regular partitions of graphs. [Sze78] 1
2 POLY( 1 )-QUERY TESTABLE GRAPH PROPERTIES 2 2 poly( 1 )-query testable graph properties Besides bipartiteness, some of the graph properties that were shown in [GGR98] to be testable with poly( 1 ) queries (and running time exponential in the number of queries) include the following: k-colorability Definition 1. Fix any integer k. A graph G(V, E) is said to be k-colorable if there exists a (proper) k-coloring of G, that is a mapping ϕ: V [k] such that (i, j) E ϕ(i) ϕ(j). In other terms, after assigning a color to each node, any two adjacent nodes i, j have different colors. Note that for k = 2, this is exactly bipartiteness. Theorem 2. For any fixed k 2, there is a poly( k )-query tester for k-colorability. B R G ρ-clique Definition 3. Fix any ρ (0, 1). A ρ-clique of an N-vertex graph is a collection of ρn vertices containing all the edges within them. Theorem 4. Let P def = { G N-vertex graph : G has a ρ-clique }. Then there is a poly( 1 )-query tester for P. N-1 N-2 v2 v1 v3 ρn N rho-clique (ρn vertices)
3 GENERAL GRAPH PARTITION TESTING (GGPT) 3 ρ-bisection Definition 5. Fix any ρ (0, 1/4). A ρ-bisection of an N-vertex graph G = (V, E) is a partition of V in two N -size subsets such that the number of edges 2 crossing from V 1 to V 2 is at most ρn 2 (that is, V 1, V 2 define a balanced, ρ-sparse cut of G). V1 pn 2 V2 Theorem 6. Let P def = { G N-vertex graph : G has a ρ-bisection }. Then there is a poly( 1 )-query tester for P. The similarity between these results is not fortuitous: it turns out they all fall into the same general setting, General Graph Partition Testing. 3 General Graph Partition Testing (GGPT) Bipartiteness, k-colorability, having a ρ-clique or having a ρ-bisection are special cases of a more general family of properties, the class of General Graph Partition Testing (GGPT) properties, which all admit constant-query testers a single meta-algorithm actually allows one to test any of these properties. More specifically, a GGPT property is specified by an integer k (number of pieces, i.e. size of the desired partition of the graph), as well as (a) size bounds for each of the k pieces and (b) edge density bounds for a pair between 2 pieces. Definition 7 (GGPT property). A General Graph Partition Testing property is specified by an integer k and a collection Φ of k + k 2 intervals in [0, 1]: Φ = {([l i, u i ]) i [k], [l i,j, u i,j ] i,j [k] }. Given k, Φ the property P k,φ is the set of all N-vertex graphs G = (V, E) for which there exists a k-way partition of V into V 1 V 2... V k satisfying (i) i [k], Nl i V i Nu i (ii) i, j [k] 2, N 2 l i,j E(V i, V j ) N 2 u i,j }{{} #edges between V i and V j (right density of vertices) (right density of edges)
4 TRIANGLE-FREENESS 4 As an example, consider the k-colorability property, which can be rephrased as a GGPT as follows: [l 1, u 1 ] = [l 2, u 2 ] =... = [l k, u k ] = [0, 1]; [l 1,1, u 1,1 ] = [l 1,1, u 1,2 ] =... = [l k,k, u k,k ] = {0}; [l i,j, u i,j ] = [0, 1] for all i, j [k] with i j. Theorem 8 (Testing GGPT properties). Given any k and Φ, the property P k,φ is testable with ( k )O(k) queries (and running time exponential in the query complexity). Proof sketch. Using the same high-level idea as in the bipartiteness tester: (1) First, draw a small set of vertices and query all pairs of these; call the resulting graph G (note that the number of possible partitions of G is exponential in G ). (2) If G P k,φ, then some partition of G is good; this good partition induces a partial partition of G that will (approximately) satisfy the constraints. (3) Like for bipartiteness, draw a 2 nd sample and see if it complies with any partition of G. ( only need to worry about exp( G ) many partitions). 4 Triangle-freeness Definition 9 ( -freeness). Fix any graph G = (V, E). A triangle is a triple (i, j, k) V 3 such that (i, j), (j, k), (j, i) E. G is said to be triangle-free (or -free) if it does not contain any triangle. V2 V1 V3 Let P def = { G N-vertex graph : G is -free }. Then, the distance of a graph G from P is dist(g, P) = #edges one needs to erase to kill all triangles N 2 Analysis First, it is easy to see that if a graph G is -free, then Test -Freeness accepts with probability 1; and that it always makes at most 3s queries. Furthemore, if s is chosen big enough (e.g., N 2, N 3... ), Test -Freeness will work.
4 TRIANGLE-FREENESS 5 Algorithm 1 Test -Freeness 1: for s iterations do 2: Randomly pick v 1, v 2, v 3 nodes [N] 3: Query (v 1, v 2 ), (v 1, v 3 ) and (v 2, v 3 ) 4: return REJECT if the three edges exist (i.e., there is a ) 5: end for 6: return ACCEPT (none of the s iterations found a ) Question How small can s be? More specifically, is s = O (1) enough? Note that for O (1) many queries to suffice, it must be the case that G -far from -free G has Ω (1) N 3 triangles Fortunately, this is true; and is a corollary of the following lemma, that we will prove in the remaining part of this lecture: Theorem 10 ( -Removal Lemma). For all > 0, there exists δ > 0 such that any N-vertex graph G which is -far from -free contains at least δ N 3 triangles. (given this, Test -Freeness works with s set to 10/δ, as then (1 δ ) s 1/3) Proof. Consider first the very special case where G is an N-vertex α-dense random graph (i.e. any possible edge is in E independently with probability α). In such a graph, the probability of a triangle existing between 3 nodes v 1, v 2 and v 3 is [ ] v 3 Pr v1 v2 = Pr[ (v 1, v 2 ) E ] Pr[ (v 1, v 3 ) E ] Pr[ (v 2, v 3 ) E ] = α 3 and by linearity the expected number of triangles is E[# ] = α 3( ) N 3 (and one can show that the graph is Θ(α)-far from -free). Thus, in this case, δ = α 3 works for the statement of the -Removal Lemma. However, we do not deal with random graphs here, but arbitrary graphs. The key will be to argue that these graphs still present some structure, namely have enough regularity and that this regularity is roughly equivalent, for our purposes, to behaving like random graphs.
4 TRIANGLE-FREENESS 6 Definition 11 (Density). Given disjoints sets X, Y [N] of a graph G, the density d(x, Y ) is defined as where e(x, Y ) def = E(X, Y ). d(x, Y ) def = e(x, Y ) X Y Recall that a partition of [N] is a collection of disjoint subsets V 1, V 2,..., V k such that k i=1 V i = [N]. Definition 12 (Regularity). Let A, B [N] be disjoint. The pair (A, B) is said to be -regular if for all X A, Y B with X A and Y B one has d(a, B) d(x, Y ). The idea is to show that regularity is sufficient to ensure lots of triangles, just like in the case of random graphs. Lemma 13. Fix 0 < α < 1 and 0 < < α. Suppose A, B, C [N] are disjoint 2 2 subsets such that each pair (A, B), (A, C), and (B, C) is both (i) -regular and (ii) α-dense. Then the number of A C B triangles is at least α3 A B C. 16 C Cu A A* u B Bu Proof. First we show that A has many well-connected vertices (adjacent to many elements of both B and C). Define A def = { a A : a has both at least (α ) B neighbors in B and (α ) C in C }. Claim 14. A (1 2) A.
4 TRIANGLE-FREENESS 7 Proof. Let A bad(b) A be { a A : a has < (α ) B neighbors in B }. We have d(a bad(b), B) = e(a bad(b), B) A bad(b) B < A bad(b) (α ) B A bad(b) B = (α ). Since d(a, B) α by assumption, we get d(a, B) d(abad(b), B) > ; and so, (A, B) being -regular, we must have by contrapositive A bad(b) < A. Analogously, define A bad(c) : we obtain Abad(C) < A, and therefore A A Abad(B) Abad(C) (1 2)A. Now we will use A to get a lot of triangles as follows: for a vertex u A (that is, a well-connected vertex see Figure 4), let B u def = { b B : (u, b) E } C u def = { c C : (u, c) E } Every edge between B u with C u gives a triangle with u; to get many of them, we want to lower bound But since u A, B u and C u both are large: e(b u, C u ) = d(b u, C u ) B u C u. (1) B u (α ) B α 2 C u (α ) C α 2 B B C C and this also implies, as (B, C is -regular, that d(b u, C u ) d(b, C) α > α 2 and thus, plugging these back in (1), e(b u, C u ) α3 B C. So finally, the total C number of A B triangles is at least e(b u, C u ) (1 2) A α3 u A 8 8 α3 B C A B C. 16
4 TRIANGLE-FREENESS 8 To conclude the proof of the -Removal Lemma, we conjure an amazing fact (or miracle): Szemerédi Regularity Lemma. This structural result, from [Sze78], is a cornerstone in graph property testing which states that every sufficiently large graph can be divided into subsets of about the same size so that the edges between different subsets behave almost as in a random graph. More formally: Theorem 15 (Szemerédi Regularity Lemma). Given > 0 and m 0 1, there exist M = M(, m 0 ) (upperbound on the number of pieces of the partition) and K = K(, m 0 ) such that for any graph G = (V, E) with at least K vertices there exists an integer m and a partition of V into V 0, V 1,..., V m satisfying: (i) V 1 = V 2 =... = V m ; (ii) V 0 V ; (iii) m 0 m M, and (iv) at most m 2 pairs (V i, V j ) are not -regular. (all same size) (slop bin) As a small catch, however, one feels compelled to point out that M can be as large as 2 2 222 where the tower of 2 s has height 1/ 5 (and, sadly, one cannot hope to get much improvement, as lowerbounds on this height have been proven). However, the amazing fact is that this is still completely independent of N: no matter how big N K is, M will not change by a iota. Now for the kill: prove: recall the statement of the -Removal Lemma we want to > 0 δ such that any N-vertex G -far from -free has at least δ N 3 s. We will use the Szemerédi Regularity Lemma (SRL) above to finish the proof. We are given > 0; set m 0 = 10, and apply SRL on G with parameter chosen to be. This guarantees the existence of M = M(), K = K() such that if N K 10 there exists a partition V 0 V 1... V m of V with 10 m M; at most 10 m2 pairs V i, V j are not 10 -regular;
4 TRIANGLE-FREENESS 9 V 0 N; and 10 V 1 = V 2 =... = V m [ (1 10 )N M, ] N 10. def Now, if N < K then set δ = 1 1, so that 0 < δ 2 K 3 1. It suffices to have G 2 containing at least one triangle for the statement of the theorem to hold, which is true as G is -far from -free (so must contain at least one triangle). Goal: assuming now N K, we want to modify G so that we can use Lemma 13. Let G be obtained from G by: (1) removing all edges incident to V 0, (at most 10 N V = 10 N 2 edges) (2) removing all edges within V i, for i [m], (at most m (N/m ) 2 m N 2 2m = N 2 2 2m 20 N 2 edges) (3) removing all edges between V i and V j if (V i, V j ) is not -regular, 10 (at most 10 m2 ( ) N 2 m 10 N 2 edges) (4) removing all edges between V i and V j if d(v i, V j ) 5 ( (at most m 2 N ) 2 5 m 5 N 2 edges). In total, this removes at most 9 20 N 2 edges from G: since G is -far from -free, it follows that G has at least one remaining triangle. This triangle, by construction of G, has to be between of the form V k V i i.e. between some V i, V j, V k for distinct i, j, k, with V i, V j, V k simultanously regular and -dense. Therefore, from Lemma 13 there are at least 5 V j ( 5 )3 16 V i V j V k ( 5 )3 16 ( (1 10 )N M ( def many triangles in G. Choosing δ = 1 N (1 2000 M 10 )) 3 then concludes the proof. ) 3 10 -
REFERENCES 10 References [GGR98] Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. J. ACM, 45(4):653 750, July 1998. [Sze78] E. Szemerédi. Regular partitions of graphs. Problemés combinatoires et theorie des graphes, pages 399 401, 1978.