Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng the largest clque n random graphs s a well nown hard problem. It s nown that a random graph Gn,1/ almost surely has a clque of sze about log n. A smple greedy algorthm fnds a clque of sze log n, and t s a long-standng open problem to fnd a clque of sze 1 + ǫlog n n randomzed polynomal tme. In ths paper, we study the generalzaton of fndng the largest subgraph of any gven edge densty. We show that a smple modfcaton of the greedy algorthm fnds a subset of log n vertces wth nduced edge densty at least 0.951. We also show that almost surely there s no subset of.784 log n vertces whose nduced edge densty s at least 0.951. 1 Introducton Fndng the largest clque s a notorously hard problem, even on random graphs. It s nown that the clque number of a random graph Gn, 1/ s almost surely ether or +1, where log n loglog n 1 Secton 4.5 n [1], also []. However, a smple greedy algorthm fnds a clque of sze only log n 1 + o1, wth hgh probablty, and fndng larger clques that of sze even 1+ǫlogn n randomzed polynomal tme has been a long-standng open problem [3]. In ths paper, we study the followng generalzaton: gven a random graph Gn, 1/ fnd the largest subgraph wth edge densty at least 1 δ. We show that a smple modfcaton of the greedy algorthm fnds a subset of log n vertces whose nduced subgraph has edge densty at least 0.951, wth hgh probablty. To complement ths, we show that almost surely there s no subset of.784 logn vertces whose nduced subgraph has edge densty 0.951 or more. We use Gn, p to denote a random graph on n vertces where each par of vertces appears as an edge ndependently wth probablty p. We use V to denote ts set of vertces and E to denote ts set of edges. Moreover, gven two subsets S V and T V, we use ES, T to denote the set of edges wth one endpont n S and another endpont n T. The densty of the subgraph nduced by vertces n S s gven by densty S ES, S. S Wor done whle at Mcrosoft Research
Therefore, the expected densty of Gn, 1/ s 1/ and the densty of any clque s 1. In Secton we descrbe our algorthm for fndng subgraphs of densty 1 δ. We gve a bound on the largest subgraph of densty 1 δ n the followng Secton 3. Fnally, n Secton 4, we present some open problems. Algorthm for fndng large subgraph of densty 1 δ In ths secton, we descrbe our algorthm and gve a relatonshp between the sze of the subgraph obtaned by the algorthm, and ts densty. In partcular, we show that the algorthm can be used to obtan a subset of logn vertces of densty 0.951, wth hgh probablty. Greedy Algorthm to pc a dense subgraph: Input: a random graph Gn, 1/ and δ > 0. Output: a subset S V of sze log n. 1. Partton the vertces nto dsjont sets V V 1 V V, each of sze n/.. Intalze S 0. 3. For 0 to 1 do: a Pc v +1 V +1 that has the maxmum number of edges to S,.e., b S +1 S {v +1 }. 4. Return S S. v +1 argmax v V +1 Ev +1, S. Notce that the algorthm frst parttons all nodes nto random subsets of the same sze, and then pcs one vertex from each partton. Ths parttonng s necessary to argue about ndependence n our analyss of choosng vertces greedly. In the analyss below, Hδ s the standard notaton of the Shannon entropy functon, whch s δ log δ + 1 δlog1 δ. The followng lemma gves a lower bound on the number of edges we can expect to add to our subgraph, for the -th vertex added by the algorthm. Lemma 1. For any 0 and δ that satsfes Hδ 1 1 log n, lnlog n we have Pr Ev +1, S 1 δ 1 1 log n.
Proof. We now by the prevous results, that as long as < log n, the vertex added has all edges to S. Consder log n. The algorthm has n l vertces to choose from. The expected number of vertces among these, wth at least 1 δ vertces s gven by, Fx v V +1. The probablty that v has at least 1 δ edges to S s Pr Ev, S 1 δ t1 δ Hδ+o1 1, t where Hδ δ log δ 1 δlog1 δ s the Shannon entropy here log s taen wth base. Usng ndependence of these events for dfferent v V +1, we get Pr Ev, S < 1 δ, v V +1 1 Hδ 1 n/ Therefore, 1 1 log n. Pr Ev +1, S 1 δ 1 1 log n. n/ lnlog n n We now gve a unon bound over all addtons of vertces, usng the prevous lemma. Lemma. Pr ES, S 1 δ 1 as n. 0 Proof. Snce V 1, V,...,V are dsjont, usng ndependence and Lemma 1 we get Pr ES, S 1 δ Pr Ev +1, S 1 δ 0 0 1 1 log n e 1/log n usng log n The pont s that we are pcng exactly one vertex from each vertex set/partton, and hence do not lose any randomness or ndependence of the edges. Ths now gves us a bound on the mnmum number of edges one can expect, w.h.p., n the chosen set of vertces. We are not able to express, n a closed form, the sze of a subgraph obtanable usng ths algorthm for a specfc densty. Therefore, we state the best densty one can guarantee w.h.p. for log n. Ths s stated as a theorem below, whch we prove subsequently.
Theorem 1. Our algorthm produces a subset S V of sze logn such that densty S 0.951, almost surely. Proof. From Lemma we have that, almost surely, ES, S 1 δ 0 1 H 1 1 1 log 0 0 log m log m n lnlog n H 1 1 log m H 1 1 log m, 1 where m n/ lnlog n. Here we use the fact that we can choose δ 0 for the frst log m steps. Now let 1 1 + αlog m. Then log m H 1 1 log m 1+αlog m log m αlog m t0 H 1 1 log m log m + th 1 1 log m log m + t α log m H 1 1 1 x0 α log m H 1 1 1 Now usng Equatons 1 and we have densty S 0 ES, S 1 log m α 1 1 1 + o1 α 0.951. 0 H 1 1 1 dx 0 H 1 1 1 dx, dx
usng α log m 1 logn log n log 4 log n lnlog n 1 1 + o1. and computng an upper bound on the ntegral numercally. 3 Upper bound on largest subgraph of densty 1 δ In ths secton, we upper bound the sze of the largest subgraph of densty 1 δ n Gn, 1/. Theorem. A random graph Gn, 1/ has no subgraph of sze log n + loge 1 Hδ o1 + 1 and densty at least 1 δ, almost surely. In partcular, there s no subgraph of sze.784 logn and densty at least 0.951, almost surely. Proof. For every S V of sze, defne an ndcator random varable X S as follows. { 1 f S nduces a subgraph of densty 1 δ X S 0 otherwse. Thus E [X S ] 1 δ Hδ+o1 1. By lnearty of expectaton, expected number of subgraphs of sze and densty 1 δ follows. E E [X S ] S : S X S S : S n en en Hδ+o1 1 Hδ+o1 1 Hδ+o1 1 1 Hδ o1 Hδ+o1 1 1 Hδ+o1/ 0, as n,
usng logn + loge 1 Hδ o1 + 1. Therefore, by Marov nequalty we have Pr X S 1 E S : S S : S X S 0, as n. Or n other words, almost surely there s no subset of vertces that nduce a subgraph of densty at least 1 δ. Notce that for densty 0.951, the gap/rato between the largest subgraph that exsts and the largest subgraph that we can fnd s smaller than n the case of clques. Ths s nterestng, although not entrely unexpected as for densty 0.5, the whole graph can be output. Ths rato for densty 0.951 s however sgnfcantly smaller than ; t s.784/ 1.39. 4 Conclusons For a concrete open problem, s there a polynomal tme algorthm that outputs a subgraph of densty 1 ǫ and sze logn for any choce of ǫ > 0? Are there smple algorthms that beat the densty bound of 0.95 for subgraphs of sze logn. If not, what s the maxmum densty obtanable for a subgraph of sze logn? Spectral technques could be tred. Is there an On log n tme algorthm that fnds the largest clque n Gn, 1/? We than an anonymous revewer for pontng out a soluton to ths: Smply enumerate all clques of all szes, generatng them from smaller ones to bgger ones; once all clques of sze have been found, try to add to each of them every vertex and see f t can be expanded. The expected number of clques of sze s n < n /, and wth hgh probablty the number of clques of each sze does not exceed ths expectaton by much. The functon n / attans ts maxmum around log n, and ts value for ths s roughly n 0.5log n, showng that the above algorthm wll run n tme n 0.5+o1log n and fnd the largest clque wth hgh probablty over the nput graphs. Can one do better than ths? References 1. Noga Alon and Joel Spencer, The probablstc method nd edton, second ed., Wley Interscence, 000.. Béla Bollobás, Random graphs, Academc Press, New Yor, 1985. 3. Rchard Karp, The probablstc analyss of some combnatoral search algorthms, 1976, 1 19.