arxiv: v1 [q-bio.qm] 6 Jun 2008

Size: px
Start display at page:

Download "arxiv: v1 [q-bio.qm] 6 Jun 2008"

Transcription

1 On the Approxmablty of Comparng Genomes wth Duplcates Sébasten Angbaud 1, Gullaume Fertn 1, Irena Rusu 1, Annelyse Thévenn 2, and Stéphane Valette 3 arxv: v1 [q-bo.qm] 6 Jun Laboratore d Informatque de Nantes-Atlantque (LINA), UMR CNRS 6241, Unversté de Nantes, 2 rue de la Houssnère, Nantes Cedex 3, France {sebasten.angbaud,gullaume.fertn,rena.rusu}@unv-nantes.fr 2 Laboratore de Recherche en Informatque (LRI), UMR CNRS 8623, Unversté Pars-Sud, Orsay, France thevenn@lr.fr 3 IGM-LabInfo, UMR CNRS 8049, Unversté Pars-Est, 5 Bd Descartes Marne-la-Vallée, France valette@unv-mlv.fr Abstract. A central problem n comparatve genomcs conssts n computng a(ds-)smlarty measure between two genomes, e.g. n order to construct a phylogenetc tree. A large number of such measures has been proposed n the recent past: number of reversals, number of breakponts, number of common or conserved ntervals, SAD etc. In ther ntal defntons, all these measures suppose that genomes contan no duplcates. However, we now know that genes can be duplcated wthn the same genome. One possble approach to overcome ths dffculty s to establsh a one-to-one correspondence(.e. a matchng) between genes of both genomes, where the correspondence s chosen n order to optmze the studed measure. Then, after a gene relabelng accordng to ths matchng and a deleton of the unmatched sgned genes, two genomes wthout duplcates are obtaned and the measure can be computed. In ths paper, we are nterested n three measures (number of breakponts, number of common ntervals and number of conserved ntervals) and three models of matchng(exemplar, ntermedate and maxmum matchng models). We prove that, for each model and each measure M, computng a matchng between two genomes that optmzes M s APX hard. We show that ths result remans true even for two genomes G 1 and G 2 such that G 1 contans no duplcates and no gene of G 2 appears more than twce. Therefore, our results extend those of [7, 10, 13]. Besdes, n order to evaluate the possble exstence of approxmaton algorthms concernng the number of breakponts, we also study the complexty of the followng decson problem: s there an exemplarzaton (resp. an ntermedate matchng, a maxmum matchng) that nduces no breakpont? In partcular, we extend a result of [13] by provng the problem to be NP complete n the exemplar model for a new class of nstances, we note that the problems are equvalent n the ntermedate and the exemplar models and we show that the problem s n P n the maxmum matchng model. Fnally, we focus on a fourth measure, closely related to the number of breakponts: the number of adjacences, for whch we gve several constant rato approxmaton algorthms n the maxmum matchng model, n the case where genomes contan the same number of duplcatons of each gene. Keywords: genome rearrangements, APX hardness, duplcate genes, breakponts, adjacences, common ntervals, conserved ntervals, approxmaton algorthms. 1 Introducton and Prelmnares In comparatve genomcs, computng a measure of (ds-)smlarty between two genomes s a central problem: such a measure can be used, for nstance, to construct phylogenetc trees. The measures defned so far essentally fall nto two categores: the frst one conssts n countng the mnmum number of operatons needed to transform a genome nto another (e.g. the edt dstance [21] or the number of reversals [4]). The second one contans (ds-)smlarty measures based on the genome structure, such as the number of breakponts [7], the conserved ntervals dstance [6], the number of common ntervals [10], SAD and MAD [24] etc.

2 When genomes contan no duplcates, most measures can be computed n polynomal tme. However, assumng that genomes contan no duplcates s too lmted. Indeed, t has been recently shown that a great number of duplcates exsts n some genomes. For example, n [20], authors estmate that 15% of genes are duplcated n the human genome. A possble approach to overcome ths dffculty s to specfy a one-to-one correspondence (.e. a matchng) between genes of both genomes and to remove the unmatched genes, thus obtanng two genomes wth dentcal gene content and no duplcates. Usually, the above mentoned matchng s chosen n order to optmze the studed measure, followng the parsmony prncple. Three models achevng ths correspondence have been proposed : the exemplar model [23], the ntermedate model [3] and the maxmum matchng model [25]. Before defnng precsely the measures and models studed n ths paper, we need to ntroduce some notatons. Notatons used n the paper. A genome G s represented by a sequence of sgned ntegers (called sgned genes). For any genome G, we denote by F G the set of unsgned ntegers (called genes) that are present n G. For any sgned gene g, let g be the sgned gene havng the opposte sgn and let g F G be the correspondng (unsgned) gene. Gven a genome G wthout duplcates and two sgned genes a, b such that a s located before b, let G[a,b] be the set S F G of genes located between genes a and b n G, a and b ncluded. We also note [a,b] G the substrng (.e. the sequence of consecutve elements) of G startng at a and fnshng at b n G. Let occ(g,g) be the number of occurrences of a gven gene g n a genome G and let occ(g) = max{occ(g,g) g F G }. A par of genomes (G 1,G 2 ) s sad to be of type (x,y) f occ(g 1 ) = x and occ(g 2 ) = y. A par of genomes (G 1,G 2 ) s sad to be balanced f, for each gene g F G1 F G2, we have occ(g,g 1 ) = occ(g,g 2 ) (otherwse, (G 1,G 2 ) wll be sad to be unbalanced). Note that a par (G 1,G 2 ) of type (x,x) s not necessary balanced. Denote by n G the sze of genome G, that s the number of sgned genes t contans. Let G[p], 1 p n G, be the sgned gene that occurs at poston p on genome G, and let G[p] F G be the correspondng (unsgned) gene. Let N G [p], 1 p n G, be the number of occurrences of G[p] n the frst (p 1) postons of G. We defne a duo n a genome G as a par of successve sgned genes.gven a duo d = (G[],G[+ 1]) n a genome G, we note d the duo equal to ( G[+1], G[]). Let (d 1,d 2 ) be a par of duos ; (d 1,d 2 ) s called a duo match f d 1 s a duo of G 1, d 2 s a duo of G 2, and f ether d 1 = d 2 or d 1 = d 2. For example, consder the genome G 1 = Then, F G = {1,2,3,4,5,6}, n G1 = 9, occ(1,g 1 ) = 2, occ(g 1 ) = 3, G 1 [7] = 2, G 1 [7] = +2, G 1 [7] = 2 and N G1 [7] = 1. Let G 2 be the genome G 2 = Then the par (G 1, G 2 ) s balanced and s of type (3,3). Let d 1 = (G 1 [4],G 1 [5]) be the duo (+4,+5) and d 2 be the duo (G 2 [5],G 2 [6]). Thepar (d 1,d 2 ) s a duomatch. Now, consder the genome G 3 = wthout duplcates. We have G 3 [+6, 1] = {1,4,6} and [+6, 1] G3 = (+6,+4, 1). Breakponts, adjacences, common and conserved ntervals. Let us now defne the four measures we wll study n ths paper. Let G 1 and G 2 be two genomes wthout duplcates and wth the same gene content, that s F G1 = F G2. Breakpont and Adjacency. Let (a,b) be a duo n G 1. We say that the duo (a,b) nduces a breakpont of (G 1,G 2 ) f nether (a,b) nor ( b, a) s a duo n G 2. Otherwse, we say that (a,b) nduces an adjacency of (G 1,G 2 ). For example, when G 1 = and G 2 = +5 2

3 , the duo (2,3) n G 1 nduces a breakpont of (G 1,G 2 ) whle (3,4) n G 1 nduces an adjacency of (G 1,G 2 ). We note B(G 1,G 2 ) (resp. A(G 1,G 2 )) the number of breakponts (resp. the number of adjacences) that exst between G 1 and G 2. Common nterval. A common nterval of (G 1,G 2 ) s a substrng of G 1 such that G 2 contans a permutaton of ths substrng (not takng sgns nto account). For example, consder G 1 = and G 2 = The substrng [+3,+5] G1 s a common nterval of (G 1,G 2 ). Conserved nterval. Consder two sgned genes a and b of G 1 such that a precedes b, where the precedence relaton s large n the sense that, possbly, a = b. The substrng [a,b] G1 s a conserved nterval of (G 1,G 2 ) f ether () a precedes b and G 2 [a,b] = G 1 [a,b], or () b precedes a and G 2 [ b, a] = G 1 [a,b]. For example, f G 1 = and G 2 = , the substrng [+2,+5] G1 s a conserved nterval of (G 1,G 2 ). We note that the noton of conserved nterval doesnot consder thesgn of genes. Note also that aconserved nterval s actually acommon nterval, but wth addtonal restrctons on ts extremtes. Dealng wth duplcates n genomes. When genomes contan duplcates, we cannot drectly compute the measures defned n the prevous paragraph. A soluton conssts n fndng a one-to-one correspondence (.e. a matchng) between duplcated genes of G 1 and G 2 ; we then use ths correspondence to rename genes of G 1 and G 2, and we delete the unmatched sgned genes n order to obtan two genomes G 1 and G 2 such that G 2 s a permutaton of G 1 ; thus, the measure computaton becomes possble. In ths paper, we wll focus on three models of matchng : the exemplar, ntermedate and maxmum matchng models. The exemplar model [23]: for each gene g, we keep n the matchng M only one occurrence of g n G 1 and n G 2, and we remove all the other occurrences. Hence, we obtan two genomes G E 1 and G E 2 wthout duplcates. The trplet (GE 1,GE 2,M) s called an exemplarzaton of (G 1,G 2 ). Note that n ths model, M can be nferred from the exemplarzed genomes G E 1 and GE 2. Thus, n the rest of the paper, any exemplarzaton (G E 1,GE 2,M) of (G 1,G 2 ) wll be only descrbed by the par (G E 1,GE 2 ). The ntermedate model [3]: n ths model, for each gene g, we keep n the matchng M an arbtrary number k g, 1 k g mn(occ(g,g 1 ),occ(g,g 2 )), n order to obtan genomes G I 1 and G I 2. We call the trplet (GI 1,GI 2,M) an ntermedate matchng of (G 1,G 2 ). The maxmum matchng model [25]: n ths case, we keep n the matchng M the maxmum number of sgned genes n both genomes. More precsely, we look for a one-to-one correspondence between sgned genes of G 1 and G 2 that matches, for each gene g, exactly mn(occ(g,g 1 ), occ(g,g 2 )) occurrences. After ths operaton, we delete each unmatched sgned gene. Thetrplet (G M 1,GM 2,M) obtaned by ths operaton s called a maxmum matchng of (G 1,G 2 ). Problems studed n ths paper. Consder two genomes G 1 and G 2 wth duplcates. Let EComI (resp. IComI, MComI) be the problem whch conssts n fndng an exemplarzaton (resp. ntermedate matchng, maxmum matchng) (G 1,G 2,M) of (G 1,G 2 ) such that the number of common ntervals of (G 1,G 2 ) s maxmzed. Moreover, let EConsI (resp. IConsI, MConsI) be the problem whch conssts n fndng an exemplarzaton (resp. ntermedate matchng, maxmum matchng) (G 1,G 2,M) of (G 1,G 2 ) such that the number of conserved ntervals of (G 1,G 2,M) s maxmzed. In Secton 2, we prove the APX hardness of EComI and EConsI, even for genomes G 1 and G 2 such that occ(g 1 ) = 1 and occ(g 2 ) = 2. These results nduce the APX hardness under the other models (.e., IComI, MComI, IConsI and MConsI are APX hard). These results extend n partcular those of [7, 10]. 3

4 Let EBD (resp. IBD, MBD) be the problem whch conssts n fndng an exemplarzaton (resp. ntermedate matchng, maxmum matchng) (G 1,G 2,M) of (G 1,G 2 ) that mnmzes the number of breakponts between G 1 and G 2. In Secton 3, we prove the APX hardness of EBD, even for genomes G 1 and G 2 such that occ(g 1 ) = 1 and occ(g 2 ) = 2. Ths result mples that IBD and MBD are also APX hard, and extends those of [13]. Let ZEBD(resp. ZIBD, ZMBD) be the problem whch conssts n determnng, for two genomes G 1 andg 2,whetherthereexstsanexemplarzaton (resp.ntermedatematchng, maxmummatchng) whch nduces zero breakpont. In secton 4, we study the complexty of ZEBD, ZMBD and ZIBD: n partcular, we extend a result of [13] by provng ZEBD to be NP complete for a new class of nstances. We also note that the problems ZEBD and ZIBD are equvalent, and we show that ZMBD s n P. Fnally, n Secton 5, we focus on a fourth measure, closely related to the number of breakponts: the number of adjacences, for whch we gve several constant rato approxmaton algorthms n the maxmum matchng model, n the case where genomes are balanced. 2 EComI and EConsI are APX hard Consder two genomes G 1 and G 2 wth duplcates, and let EComI (resp. IComI, MComI) be the problem whch conssts n fndng an exemplarzaton (resp. ntermedate matchng, maxmum matchng) (G 1,G 2,M) of (G 1,G 2 ) such that the number of common ntervals of (G 1,G 2 ) s maxmzed. Moreover, let EConsI (resp. IConsI, MConsI) be the problem whch conssts n fndng an exemplarzaton (resp. ntermedate matchng, maxmum matchng) (G 1,G 2,M) of (G 1,G 2 ) such that the number of conserved ntervals of (G 1,G 2,M) s maxmzed. EComI andmcomi have beenproved to benp complete even f occ(g 1 ) = 1and occ(g 2 ) = 2 n [10]. Besdes, n [6], Bln and Rzz have studed the problem of computng a dstance bult on the number of conserved ntervals. Ths dstance dffers from the number of conserved ntervals we study n ths paper, manly n the sense that () t can be appled to two sets of genomes (as opposed to two genomes n our case), and () the dstance between two dentcal genomes of length n s equal to 0 (as opposed to n(n+1) 2 n our case). Bln and Rzz [6] proved that fndng the mnmum dstance s NP complete, under both the exemplar and maxmum matchng models. A closer analyss of ther proof shows that t can be easly adapted to prove that EConsI and MConsI are NP complete, even n the case occ(g 1 ) = 1. We can conclude from the above results that IComI and IConsI are also NP complete, snce when one genome contans no duplcates, exemplar, ntermedate and maxmum matchng models are equvalent. In ths secton, we mprove the above results by showng that the sx problems EComI, IComI, MComI, EConsI, IConsI and MConsI are APX hard, even when genomes G 1 and G 2 are such that occ(g 1 ) = 1 and occ(g 2 ) = 2. The man result s Theorem 1, whch wll be completed by Corollary 1 at the end of the secton. Theorem 1. EComI and EConsI are APX hard even when genomes G 1 and G 2 are such that occ(g 1 ) = 1 and occ(g 2 ) = 2. We prove Theorem 1 by usng an L-reducton [22] from the Mn-Vertex-Cover problem on cubc graphs, denoted here Mn-Vertex-Cover-3. Let G = (V,E) be a cubc graph,.e. for all v V,degree(v) = 3. A set of vertces V V s called a vertex cover of G f for each edge e E, 4

5 there exsts a vertex v V such that e s ncdent to v. The problem Mn-Vertex-Cover-3 s defned as follows: Problem: Mn-Vertex-Cover-3 Input: A cubc graph G = (V,E). Soluton: A vertex cover V of G. Measure: The cardnalty of V. Mn-Vertex-Cover-3 was proved to be APX complete n [1]. 2.1 Reducton Let G = (V,E) be an nstance of Mn-Vertex-Cover-3, where G s a cubc graph wth V = {v 1...v n } and E = {e 1...e m }. Consder the transformaton R whch assocates to the graph G two genomes G 1 and G 2 n the followng way, where each gene has a postve sgn. wth : G 1 = b 1 b 2...b m x a 1 C 1 f 1 a 2 C 2 f 2...a n C n f n y b m+n,b m+n 1...b m+1 (1) G 2 = y a 1 D 1 f 1 b m+1 a 2 D 2 f 2 b m+2...b m+n 1 a n D n f n b m+n x (2) for each, 1 n,a = 6 5, f = 6 for each, 1 n,c = (a +1),(a +2),(a +3),(a +4) for each, 1 n+m,b = 6n+ x = 7n+m+1 and y = 7n+m+2 for each, 1 n,d = (a +3),(b j ),(a +1),(b k ),(a +4),(b l ),(a +2) where e j, e k and e l are the edges whch are ncdent to v n G, wth j < k < l. In the followng, genes b, 1 m, are called markers. There s no duplcated gene n G 1 and the markers are the only duplcated genes n G 2 ; these genes occur twce n G 2. Hence, we have occ(g 1 ) = 1 and occ(g 2 ) = 2. e 3 V 1 e 1 V 2 e 2 e 4 e 5 V 3 e V 4 6 Fg.1. The cubc graph G. To llustrate the reducton, consder the cubc graph G of Fgure 1. From G, we construct the followng genomes G 1 and G 2 : b 1 z} { 25 b 2 z} { 26 b 3 z} { 27 b 4 z} { 28 b 5 z} { 29 b 6 z} { 30 x C 1 C 2 z} { z } { z } { C 3 z } { C 4 z } { {z} {z } 6 {z} {z } 12 {z} {z } 18 {z} {z } 24 {z} 34 y D 1 b 7 D 2 b 8 D 3 b 9 D 4 y z} { 36 b 10 z} { 34 b 9 z} { 33 b 8 z} { 32 b 7 z} { 31 {z} 35 b 10 x 5

6 2.2 Prelmnary results In order to prove Theorem 1, we frst gve four ntermedate lemmas. In the followng, a common nterval for the EComI problem or a conserved nterval for EConsI s called a robust nterval. Besdes, a trval nterval wll denote ether an nterval of length one (.e. a sngleton), or the whole genome. Lemma 1. For any exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ), the non trval robust ntervals of (G 1,G E 2 ) are necessarly contaned n some sequence a C f of G 1 (1 n). Proof. We start by provng the lemma for common ntervals, and we wll then extend t to conserved ntervals. Frst, we prove that, for any exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ), each common nterval I such that I 2 contans ether both of x, y or none of them. Ths further mples that I covers the whole genome. Suppose there exsts a common nterval I x (recall that by defnton I x s on G 1 ) such that I x 2 and I x contans x. Let PI x be the permutaton of I x n G E 2. The nterval I x must contan ether b m or a 1. Let us detal each of the two cases: (a) If I x contans b m, then PI x contans b m too. Notce that there s some, 1 n, such that b m belongs to D n G E 2. Then PI x contans all genes between D and x n G E 2. Thus PI x contans b m+n. Consequently, I x contans b m+n and t also contans y. (b) If I x contans a 1, then PI x contans a 1 too. Then PI x contans all genes between a 1 and x. Thus PI x contans b m+n. Hence, I x contans b m+n and then t also contans y. Now, suppose that I y s a common nterval such that I y 2 and I y contans y. Let PI y be the permutaton of I y on G E 2. The nterval I y must contan ether b m+n or f n. Let us detal each of the two cases: (a) If I y contans b m+n, then PI y contans b m+n too. Thus PI y contans all genes between b m+n and y. Hence PI y contans all the sequences D, 1 n. In partcular, PI y contans all the markers and consequently I y must contan x. (b) If I y contans f n, then PI y contans f n too. Then PI y contans all genes between f n and y. In partcular, PI y contans b m+n 1 and then I y contans b m+n 1 too. Hence, I y also contans b m+n, smlarly to the prevous case. Thus I y contans x. We conclude that each non sngleton common nterval contanng ether x or y necessarly contans both x and y. Therefore, and by constructon of G 2, there s only one such nterval, that s G 1 tself. Hence, any non trval common nterval s necessarly, n G 1, ether strctly on the left of x, or between x and y, or strctly on the rght of y. Let us analyze these dfferent cases: LetI beanontrvalcommonntervalstuatedstrctlyontheleftofxng 1.ThusI sasequence of at least two consecutve markers. Snce n any exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ), every marker n G E 2 has neghborng genes whch are not markers, ths contradcts the fact that I s a common nterval. Let I be a non trval common nterval stuated strctly on the rght of y n G 1. Then I s a substrng of b m+n,...,b m+1 contanng at least two genes. In any exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ), for each par (b m+,b m++1 ) of G E 2, wth 1 < n, we have a +1 G E 2 [b m+,b m++1 ]. Ths contradcts the fact that I s strctly on the rght of y n G 1. 6

7 Let I be a non trval common nterval lyng between x and y n G 1. For any exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ), a common nterval cannot contan, n G 1, both f and a +1 for some, 1 n 1 (snce b m+ s stuated between f and a +1 n G E 2 and on the rght of x n G 1). Hence, a non trval common nterval of (G 1,G E 2 ) s ncluded n some sequence a C f n G 1, 1 n. Ths proves the lemma for common ntervals. By defnton, any conserved nterval s necessarly a common nterval. So, a non trval conserved nterval of (G 1,G E 2 ) s ncluded n some sequence a C f n G 1, 1 n. The lemma s proved. Lemma 2. Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and [1...n]. Let be a substrng of [a + 3,a + 2] G E 2 that does not contan any marker. If {2,3}, then there s no robust nterval I of (G 1,G E 2 ) such that s a permutaton of I. Proof. Frst, we prove that there s no permutaton I of such that I s a common nterval of (G 1,G E 2 ). Next, we show that there s no permutaton I of such that I s a conserved nterval. By Lemma 1, we know that a non trval common nterval of (G 1,G E 2 ) s a substrng of some sequence a C f, 1 n. Ths substrng contans only consecutve ntegers. Therefore, f there exsts a permutaton I of such that I s a common nterval of (G 1,G E 2 ), then must be a permutaton of consecutve ntegers. If = 2, we have = (p,q) where p and q are not consecutve ntegers and f = 3, then we have = (a +3,a +1,a +4) or = (a +1,a +4,a +2). In these three cases, s not a permutaton of consecutve ntegers. Hence, there s no permutaton I of such that I s a common nterval of (G 1,G E 2 ). Moreover, any conserved nterval s also a common nterval. Thus, there s no permutaton I of such that I s a conserved nterval of (G 1,G E 2 ). For more clarty, let us now ntroduce some notatons. Gven a graph G = (V,E), let VC = {v 1,v 2...v k } be a vertex cover of G. Let R(G) = (G 1,G 2 ) be the par of genomes defned by the constructon descrbed n (1) and (2). Now, let F be the functon whch assocates to VC, G 1 and G 2 an exemplarzaton F(VC) of (G 1,G 2 ) as follows. In G 2, all the markers are removed from the sequences D for all 1, 2... k. Next, for each marker whch s stll present twce, one of ts occurrences s arbtrarly removed. Snce n G 2 only markers are duplcated, we conclude that F(VC) s an exemplarzaton of (G 1,G 2 ). Gven a cubc graph G and genomes G 1 and G 2 obtaned by the transformaton R(G), let us defne the functon S whch assocates to an exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ) the vertex cover VC of G defned as follows: VC = {v 1 n j {1...m},b j G E 2 [a,f ]}. In other words, we keep n VC the vertces v of G for whch there exsts some gene b j such that b j s n G E 2 [a,f ]. We now prove that VC s a vertex cover. Consder an edge e p of G. By constructon of G 1 and G 2, there exsts some, 1 n, such that gene b p s located between a and f n G E 2. The presence of gene b p between a and f mples that vertex v belongs to VC. We conclude that each edge s ncdent to at least one vertex of VC. Let W be the functon defned on {EConsI,EComI} by W(pb) = 1 f pb = EConsI and W(pb) = 4 fpb = EComI. Let opt P (A) betheoptmum resultof annstance A foran optmzaton problem pb, pb {EcomI, EConsI, Mn-Vertex-Cover-3}. We now defne the functon T whose arguments are a problem pb {EConsI,EComI} and a cubc graph G. Let R(G) = (G 1,G E 2 ) as usual. Then T(pb,G) s defned as the number of robust trval ntervals of (G 1,G E 2 ) wth respect to pb. Let n and m be respectvely the number of vertces 7

8 andthenumberof edges of G. We have T(EConsI,G) = 7n+m+2andT(EComI,G) = 7n+m+3. Indeed, for EComI, there are 7n+m+2 sngletons and we also need to consder the whole genome. Lemma 3. Let pb {EcomI,EConsI}. Let G be a cubc graph and R(G) = (G 1,G 2 ). Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and let, 1 n. Then only two cases can occur wth respect to D. 1. Ether n G E 2, all the markers from D were removed, and n ths case, there are exactly W(pb) non trval robust ntervals nvolvng D. 2. Or n G E 2, at least one marker was kept n D, and n ths case, there s no non trval robust nterval nvolvng D. Proof. We frst prove the lemma for the EComI problem and then we extend t to EConsI. Lemma1mples that each non trval common nterval I of (G 1,G E 2 ) s contaned n somesubstrng of a C f, 1 n. So, the permutaton of I on G E 2 s contaned n a substrngof a D f, 1 n. Consder, 1 n, and suppose that all the markers from D are removed on G E 2. Thus, a C f, C, a C and C f are common ntervals of (G 1,G E 2 ). Let us now show that there s no other non trval common nterval nvolvng D. Let be a substrng of [a + 3,a + 2] G E such that 2 {2,3}. By Lemma 2, we know that s not a common nterval. The remanng ntervals are (a,a +3), (a,a +3,a +1), (a,a +3,a +1,a +4), (a +1,a +4,a +2,f ), (a +4,a +2,f ) and (a + 2,f ). By constructon, none of them can be a common nterval, because none of them s a permutaton of consecutve ntegers. Hence, there are only four non trval common ntervals nvolvng D n G E 2. Among these four common ntervals, only a C f s a conserved nterval too. In the end, f all the markers are removed from D, there are exactly four non trval common ntervals and one non trval conserved nterval nvolvng D. So, gven a problem pb {EcomI,EconsI}, there are exactly W(pb) non trval robust ntervals nvolvng D. Now, suppose that at least one marker of D s kept n G E 2. Lemma 1 shows that each non trval common nterval I of (G 1,G E 2 ) s contaned n some substrng of a C f, 1 n. Snce no marker s present n a sequence a C f, we deduce that there does not exst any trval common nterval contanng a marker. So, a non trval common nterval nvolvng D only must contan a substrng of [a +3,a +2] G E such that 2 contans no marker. Snce no marker s an extremty of [a +3,a +2] G E, we have 2 3. By Lemma 2, we know that s not a common nterval. The remanng ntervals to be consdered are the ntervals a and f. By constructon of a C f, these ntervals are not common ntervals (the absence of gene a + 2 for a and of gene a + 3 for f mples that these ntervals are not a permutaton of consecutve ntegers). Hence, these ntervals cannot be conserved ntervals ether. Lemma 4. Let pb {EcomI,EConsI}. Let G = (V,E) be a cubc graph wth V = {v 1...v n } and E = {e 1...e m } and let G 1, G 2 be the two genomes obtaned by R(G). 1. Let VC be a vertex cover of G and denote k = VC. Then the exemplarzaton F(VC) of (G 1,G 2 ) has at least N = nw(pb)+t(pb,g) W(pb) k robust ntervals. 2. Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and let VC be the vertex cover of G obtaned by S(G 1,G E 2 ). Then VC = W(pb) n+t(pb,g) N W(pb), where N s the number of robust ntervals of (G 1,G E 2 ). Proof. 1. Let pb {EcomI,EConsI}. Let G be a cubc graph and let G 1 and G 2 be the two genomesobtanedbyr(g).letvc beavertexcoverofganddenotek = VC.Let(G 1,G E 2 )bethe 8

9 exemplarzaton of(g 1,G 2 )obtanedbyf(vc).byconstructon, wehaveat least (n k)substrngs D n G E 2 for whchall themarkers areremoved. By Lemma3, weknow that each of thesesubstrngs mples the exstence of W(pb) non trval robust ntervals. So, we have at least W(pb)(n k) non trval robust ntervals. Moreover, by defnton of T(pb, G), the number of trval robust ntervals of (G 1,G E 2 ) s exactly T(pb,G). Thus, we have at least N = W(pb) n + T(pb,G) W(pb) k robust ntervals of (G 1,G E 2 ). 2. Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and let n j be the number of sequences D, 1 n, for whch all markers have been deleted n G E 2. Then, by Lemmas 1 and 3, the number of robust ntervals of (G 1,G E 2 ) s equal to N = W(pb) n+t(pb,g) W(pb) j. Let VC be the vertex cover obtaned by S(G 1,G E 2 ). Each marker has one occurrence n GE 2 and these occurrences le n j sequences D. So, by defnton of S, we conclude that VC = j = W(pb) n+t(pb,g) N W(pb). 2.3 Man result Let us frst defne the noton of L-reducton [22]: let A and B be two optmzaton problems and c A, c B be respectvely ther cost functons. An L-reducton from problem A to problem B s a par of polynomal-tme computable functons R and S wth the followng propertes: (a) If x s an nstance of A, then R(x) s an nstance of B ; (b) If x s an nstance of A and y s a soluton of R(x), then S(y) s a soluton of A ; (c) If x s an nstance of A and R(x) s ts correspondng nstance of B, then there s some postve constant α such that opt B (R(x)) α.opt A (x) ; (d) If s s a soluton of R(x), then there s some postve constant β such that opt A (x) c A (S(s)) β opt B (R(x)) c B (s). We prove Theorem 1 by showng that the par (R,S) defned prevously s an L-reducton from Mn-Vertex-Cover-3 to EConsI and from Mn-Vertex-Cover-3 to EComI. Frst note that propertes (a) and (b) are obvously satsfed by R and S. Consder pb {EcomI,EConsI}. Let G = (V,E) be a cubc graph wth n vertces and m edges. We now prove propertes (c) and (d). Consder the genomes G 1 and G 2 obtaned by R(G). For sake of clarty, we abbrevate here and n the followng opt Mn-Vertex-Cover-3 to opt Mn-VC. Frst, we need to prove that there exsts α 0 such that opt pb (G 1,G 2 ) α.opt Mn-Vertex-Cover-3 (G). Snce G s cubc, we have the followng propertes: n 4 (3) m = 1 n degree(v ) = 3n 2 2 (4) =1 opt Mn-VC (G) m 3 = n 2 To explan property (5), remark that, n a cubc graph G wth n vertces and m edges, each vertex covers three edges. Thus, a set of k vertces covers at most 3k edges. Hence, any vertex cover of G must contan at least m 3 vertces. By Lemma 3, we know that sequences of the form a C f, 1 n, contan ether zero or W(pb) non trval robust ntervals. By Lemma 1, there are no other non trval robust ntervals. So, we have the followng nequalty: (5) 9

10 If pb = EComI, we have: And f pb = EConsI, we have : opt pb (G 1,G 2 ) T(pb,G) }{{} +W(pb) n trval robust ntervals opt EComI (G 1,G 2 ) 7n+m+3+4n opt EComI (G 1,G 2 ) 27n by (3) and (4) (6) 2 opt EConsI (G 1,G 2 ) 7n+m+2+n opt EConsI (G 1,G 2 ) 21n by (3) and (4) (7) 2 Altogether, by (5), (6) and (7), we prove property (c) wth α = 27. Now, let us prove property (d). Let VC = {v 1,v 2...v P } be a mnmum vertex cover of G. Then P = opt Mn-VC (G). Let G 1 and G 2 be the genomes obtaned by R(G). Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and let k be the number of robust ntervals of (G 1,G E 2 ). Fnally, let VC be the vertex cover of G such that VC = S(G 1,G E 2 ). We need to fnd a postve constant β such that P VC β opt pb (G 1,G 2 ) k. For pb {EcomI,EConsI}, let N pb bethenumberofrobustntervals betweenthetwogenomes obtaned by F(VC). By the frst property of Lemma 4, we have opt pb (G 1,G 2 ) N pb W(pb) n+t(pb,g) W(pb) P So, t s suffcent to prove that there exsts some β 0 such that P VC β W(pb) n + T(pb,G) W(pb) P k. By the second property of Lemma4, we have VC = W(pb) n+t(pb,g) k W(pb). Snce P VC, we have P VC = VC P = W(pb) n+t(pb,g) k W(pb) P = 1 W(pb) (W(pb) n+ T(pb,G) W(pb) P k ). So β = 1 s suffcent n both cases, snce W(EComI) = 4 and W(EConsI) = 1, whch mples 1 W(pb) 1. Altogether, we then have opt Mn-VC (G) VC 1 opt pb (G 1,G 2 ) k. We proved that the reducton (R,S) s an L-reducton. Ths mples that for two genomes G 1 and G 2, both problems EConsI and EComI are APX hard even f occ(g 1 ) = 1 and occ(g 2 ) = 2. Theorem 1 s proved. We extend n Corollary 1 our results for the ntermedate and maxmum matchng models. Corollary 1. IComI, MComI, IConsI and MConsI are APX hard even when genomes G 1 and G 2 are such that occ(g 1 ) = 1 and occ(g 2 ) = 2. Proof. The ntermedate and maxmum matchng models are dentcal to the exemplar model when one of the two genomes contans no duplcates. Hence, the APX hardness result for EComI (resp. EConsI) also holds for IComI and MComI (resp. IConsI and MConsI). 10

11 3 EBD s APX hard Consder two genomes G 1 and G 2 wth duplcates, and let EBD (resp. IBD, MBD) be the problem whch conssts n fndng an exemplarzaton (resp. ntermedate matchng, maxmum matchng) (G 1,G 2,M) of (G 1,G 2 )that mnmzes the number of breakponts between G 1 and G 2. EBD has been proved to be NP complete even f occ(g 1 ) = 1 and occ(g 2 ) = 2 [7]. Some napproxmablty results also exst: n partcular, t has been proved n [13] that, n the general case, EBD cannot be approxmated wthn a factor c log n, where c > 0 s a constant, and cannot be approxmated wthn a factor 1.36 when occ(g 1 ) = occ(g 2 ) = 2. Moreover, for two balanced genomes G 1 and G 2 such that k = occ(g 1 ) = occ(g 2 ), several approxmaton algorthms for MBD are gven. These approxmaton algorthms admt respectvely a rato of when k = 2 [17], 4 when k = 3 [17] and 4k n the general case [19]. We can conclude from the above results that IBD and MBD problems are also NP complete, snce when one genome contans no duplcates, exemplar, ntermedate and maxmum matchng models are equvalent. Inths secton, we mprovetheabove results by showngthatthethreeproblems EBD, IBD and MBD are APX hard, even when genomes G 1 and G 2 are such that occ(g 1 ) = 1 and occ(g 2 ) = 2. The man result s Theorem 2 below, whch wll be completed by Corollary 2 at the end of the secton. Theorem 2. EBD s APX hard even when genomes G 1 and G 2 are such that occ(g 1 ) = 1 and occ(g 2 ) = 2. To prove Theorem 2, we use an L-Reducton from Mn-Vertex-Cover-3 to EBD. Let G = (V,E) be a cubc graph wth V = {v 1...v n } and E = {e 1...e m }. For each, 1 n, let e f, e g and e h be the three edges whch are ncdent to v n G wth f < g < h. Let R be the polynomal transformaton whch assocates to G the followng genomes G 1 and G 2, where each gene has a postve sgn: G 1 = a 0 a 1 b 1 a 2 b 2...a n b n c 1 d 1 c 2 d 2...c m d m c m+1 G 2 = a 0 a n d fn d gn d hn b n...a 2 d f2 d g2 d h2 b 2 a 1 d f1 d g1 d h1 b 1 c 1 c 2...c m c m+1 wth : a 0 = 0, and for each, 1 n, a = and b = n+ c m+1 = 2n+m+1, and for each, 1 m, c = 2n+ and d = 2n+m+1+ We remark that there s no duplcaton n G 1, so occ(g 1 ) = 1. In G 2, only the genes d, 1 m, are duplcated and occur twce. Thus occ(g 2 ) = 2. Let G be a cubc graph and VC be a vertex cover of G. Let G 1 and G 2 be the genomes obtaned by R (G). We defne F to be the polynomal transformaton whch assocates to VC, G 1 and G 2 the exemplarzaton F (VC) = (G 1,G E 2 ) of (G 1,G 2 ) as follows. For each such that v / VC, we remove from G 2 the genes d f,d g and d h. Then, for each j, 1 j m such that d j stll has two occurrences n G 2, we arbtrarly remove one of these occurrences n order to obtan the genome G E 2. Hence, (G 1,G E 2 ) s an exemplarzaton of (G 1,G 2 ). Gven a cubc graph G, we construct G 1 and G 2 by the transformaton R (G). Gven an exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ), let S be the polynomal transformaton whch assocates to (G 1,G E 2 ) the set VC = {v 1 n,a and b are not consecutve n G E 2 }. We clam that VC s a vertex cover of G. Indeed, let e p, 1 p m, be an edge of G. Genome G E 2 contans one occurrence of gene d p snce G E 2 s an exemplarzaton of G 2. By constructon, there exsts, 1 n, such 11

12 that d p s n G E 2 [a,b ] and such that e p s ncdent to v. The presence of d p n G E 2 [a,b ] mples that vertex v belongs to VC. We can conclude that each edge of G s ncdent to at least one vertex of VC. Lemmas 5 and 6 below are used to prove that (R,S ) s an L-Reducton from the Mn-Vertex- Cover-3 problem to the EBD problem. Let G = (V,E) be a cubc graph wth V = {v 1,v 2...v n } and E = {e 1,e 2...e m } and let us construct (G 1,G 2 ) by the transformaton R (G). Lemma 5. Let VC be a vertex cover of G and (G 1,G E 2 ) the exemplarzaton gven by F (VC). Then VC = k B(G 1,G E 2 ) n + 2m + k + 1, where B(G 1,G E 2 ) s the number of breakponts between G 1 and G E 2. Proof. Suppose VC = k. Let us lst the breakponts between genomes G 1 and G E 2 obtaned by F (VC). The pars (b,a +1 ), 1 n 1, and (b n,c 1 ) nduce one breakpont each. For all, 1 m, each par of the form (c,d ) (resp. (d,c +1 )) nduces one breakpont. For all, 1 n, such that v VC, (a,b ) nduces at most one breakpont. Fnally, the par (a 0,a 1 ) nduces one breakpont. Thus there are at most n+2m+k+1 breakponts of (G 1,G E 2 ). Lemma 6. Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and VC be the vertex cover of G obtaned by S (G 1,G E 2 ). We have B(G 1,G E 2 ) = k VC = k n 2m 1. Proof. Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and VC be the vertex cover obtaned by S (G 1,G E 2 ). Suppose B(G 1,G E 2 ) = k. For any exemplarzaton (G 1,G E 2 ) of (G 1,G 2 ), the followng breakponts always occur: the par (a 0,a 1 ) ; for each, 1 m, each par (c,d ) and (d,c +1 ) ; for each, 1 n 1, the par (b,a +1 ) ; the par (b n,c 1 ). Thus, we have at least n+2m+1 breakponts. The other possble breakponts are nduced by pars of the form of (a,b ). Snce we have B(G 1,G E 2 ) = k, there are exactly k n 2m 1 such breakponts. By constructon of VC, the cardnalty of VC s equal to the number of breakponts nduced by pars of the form (a,b ). So, we have: VC = k n 2m 1. To prove that (R,S ) s an L-reducton, we frst notce that propertes (a) and (b) of an L- reducton are trvally verfed. The next lemma proves property (c). Lemma 7. The nequalty opt EBD (G 1,G 2 ) 12 opt Mn-VC (G) holds. Proof. For a cubc graph G wth n vertces and m edges, we have 2m = 3n (see (4)) and opt Mn-VC (G) n 2 (see (5)). By constructon of the genomes G 1 and G 2, any exemplarzaton of (G 1,G 2 ) contans 2n+2m+2genes n each genome. Thus, we have opt EBD (G 1,G 2 ) 2n+2m+2 6n (n 4 n a cubc graph). Hence, we conclude that opt EBD (G 1,G 2 ) 12 opt Mn-VC (G). Now, we prove property (d) of our L-reducton. Lemma 8. Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and let VC be the vertex cover of G obtaned by S (G 1,G E 2 ). Then, we have opt Mn-VC(G) VC opt EBD (G 1,G 2 ) B(G 1,G E 2 ) Proof. Let (G 1,G E 2 ) be an exemplarzaton of (G 1,G 2 ) and VC be the vertex cover of G obtaned by S (G 1,G E 2 ). Let VC be a vertex cover of G such that VC = opt Mn-VC(G). We know that opt Mn-VC (G) VC and opt EBD (G 1,G 2 ) B(G 1,G E 2 ). So, t s suffcent to prove VC opt Mn-VC (G) B(G 1,G E 2 ) opt EBD(G 1,G 2 ). 12

13 By Lemma 5, we have B(F (VC)) n+2m+1+opt Mn-VC, whch mples opt EBD (G 1,G 2 ) B(F (VC)) n+2m+1+opt Mn-VC. Then B(G 1,G E 2 ) opt EBD(G 1,G 2 ) B(G 1,G E 2 ) n 2m 1 opt Mn-VC(G) (8) By Lemma 6, we have: VC = B(G 1,G E 2 ) n 2m 1 whch mples VC opt Mn-VC (G) = B(G 1,G E 2 ) n 2m 1 opt Mn-VC (G) (9) Fnally, by (8) and (9), we get VC opt Mn-VC B(G 1,G E 2 ) opt EBD(G 1,G 2 ). Lemmas 7 and 8 prove that the par (R,S ) s an L-reducton from Mn-Vertex-Cover-3 to EBD. Hence, EBD s APX hard even f occ(g 1 ) = 1 and occ(g 2 ) = 2, and Theorem 2 s proved. We extend n Corollary 2 our results for the ntermedate and maxmum matchng models. Corollary 2. The IBD and MBD problems are APX hard even when genomes G 1 and G 2 are such that occ(g 1 ) = 1 and occ(g 2 ) = 2. Proof. The ntermedate and maxmum matchng models are dentcal to the exemplar model when one of the two genomes contans no duplcates. Hence, the APX hardness result for EBD also holds for IBD and MBD. 4 Zero breakpont dstance Ths secton s devoted to zero breakpont dstance recognton ssues. Indeed, n [13], the authors showed that decdng whether the exemplar breakpont dstance between any two genomes s zero or not s NP complete even when no gene occurs more than three tmes n both genomes,.e., nstances of type (3, 3). Ths mportant result mples that the exemplar breakpont dstance problem does not admt any approxmaton n polynomal-tme, unless P = NP. Followng ths lne of research, we frst complement the result of [13] by provng that decdng whether the exemplar breakpont dstance between any two genomes s zero or not s NP complete, even when no gene s duplcated more than twce n one of the genomes (the maxmum number of duplcatons s however unbounded n the other genome). Ths result s next extended to the ntermedate matchng model and we gve a practcal - but exponental - algorthm for decdng whether the exemplar breakpont dstance between any two genomes s zero or not n case no gene occurs more than twce n both genomes (a problem whose complexty, P versus NP complete, remans open). Fnally, we show that decdng whether the maxmum matchng breakpont dstance between any two genomes s zero or not s polynomal-tme solvable and hence that such negatve approxmaton results (the ones we obtaned for the exemplar and ntermedate models) do no propagate to the maxmum matchng model. The followng easy observaton wll prove extremely useful n the sequel of the present secton. Observaton 3 Let G 1 and G 2 be two genomes. If the exemplar breakpont dstance between G 1 and G 2 s zero, then there exsts an exemplarzaton (G E 1,GE 2 ) of (G 1,G 2 ) such that (1) G E 1 = GE 2, or (2) (G E 1 )r = G E 2, where (GE 1 )r s the sgned reversal of genome G 1. The same observaton can be made for the ntermedate and maxmum matchng models. 13

14 4.1 Zero exemplar breakpont dstance The zero exemplar breakpont dstance (ZEBD) problem s formally defned as follows. Problem: ZEBD Input: Two genomes G 1 and G 2. Queston: Is the exemplar breakpont dstance between G 1 and G 2 equal to zero? Amng at precsely defnng the napproxmablty landscape of computng the exemplar breakpont dstance between two genomes, we complement the result of [13], who showed ZEBD to be NP complete even for nstances of type (3, 3), by the followng theorem. Theorem 4. ZEBD s NP complete even f no gene occurs more than twce n G 1. Proof. Membershp of ZEBD to NP s mmedate. The reducton we use to prove hardness s from Mn-Vertex-Cover [16]. Let an arbtrary nstance of Mn-Vertex-Cover be gven by a graph G = (V,E) and a postve nteger k. Wrte V = {v 1,v 2...v n } and E = {e 1,e 2...e m }. In the rest of the proof, elements of V (resp. E) wll be seen ether as vertces (resp. edges) or genes, dependng on the context. The correspondng nstance (G 1,G 2 ) of ZEBD s defned as follows: G 1 = v 1 X 1 v 2 X 2...v n X n G 2 = Y[1] Y[2]... Y[k] Y V. For each = 1,2,...,n, X s defned to be X = e 1 e 2... e j, where e 1,e 2,...,e j, 1 < 2 <... < j, are the edges ncdent to vertex v. The strngs Y[], 1 k, are all equal and are defned by Y[] = Y V Y E where Y V = v 1 v 2... v n and Y E = e 1 e 2... e m. Notce that no gene occurs more than twce n G 1 (actually genes v occur once and genes e occur twce). However, the number of occurrences of each gene n G 2 s upper bounded by k +1. Furthermore, all genes have postve sgn, and hence accordng to Observaton 3 we only need to consder exemplarzatons (G E 1,GE 2 ) of (G 1,G 2 ) such that G E 1 = GE 2. It s mmedate to check that our constructon can be carred out n polynomal-tme. We now clam that there exsts a vertex cover of sze k n G ff the exemplar breakpont dstance between G 1 and G 2 s zero. Supposefrstthat thereexsts avertex cover V V of szek ng. Wrte V = {v 1,v 2,...,v k }, 1 < 2 <... < k. For convenence, we also defne 0 to be 0. From V we construct an exemplarzaton (G E 1,GE 2 ) as follows. We obtan GE 1 from G 1 by a two step procedure. Frst we delete n G 1 all strngs X such that v / V. Second, for each 1 j m, f gene e j stll occurs twce, we delete ts second occurrence (ths second step s concerned wth edges connectng two vertces n V ). We now turn to G E 2. For 1 j k, we consder the strng Y[j] = Y V Y E that we process as follows: (1) we delete n Y V all genes but v j and those genes v l / V such that j 1 < l < j, and (2) we delete n Y E all genes but those e l that are not ncdent to v j or ncdent to v j and some smaller vertex n V (.e., e l = {v j,v j } for some j < j). Fnally, we delete n the tralng strng Y V = v 1 v 2... v n all genes but those v l (/ V ) such that k < l. Snce V s a vertex cover n G, then t follows that each gene occurs once n the obtaned genomes,.e., (G E 1,GE 2 ) s ndeed an exemplarzaton of (G 1,G 2 ). It s now easly seen that G E 1 = GE 2, and hence that the exemplar breakpont dstance between G 1 and G 2 s zero. 14

15 Conversely, suppose that the exemplar breakpont dstance between G 1 and G 2 s zero. Snce all genes have a postve sgn, then t follows that there exsts an exemplarzaton (G E 1,GE 2 ) of (G 1,G 2 ) such that G E 1 = GE 2. Exemplarzaton GE 2 can be wrtten as G E 2 = Y V [1] Y E [1] Y V [2] Y E [2]...Y V [k] Y E [k] Y V [k +1] where, Y V [], 1 k + 1, s a strng on V and Y E [], 1 k, s a strng on E, V and E beng vewed as alphabets. Now, defne V V as follows: v V ff gene v occurs n some Y V [j], 1 j k, as the last gene. By constructon, V k (we may ndeed have V < k f some Y V [j], 1 j k, denotes the empty strng). We now observe that, snce no gene v s duplcated n G 1, all genes e l that occur between some gene v V and some gene v j V n G E 2 should match genes n strng X n G 1. Then t follows that V s a vertex cover of sze at most k n G. The complexty of ZEBD remans open n case no gene occurs more than twce n G 1 and more than a constant tmes n G 2,.e., nstances of type (2,c) for some c = O(1) ; recall here that ZEBD s NP complete f no gene occurs more than three tmes n G 1 or n G 2 (nstances of type (3,3), [13]). In partcular, the complexty of ZEBD for nstances of type (2,2) s open. However, we propose here a practcal - but exponental - algorthm for ZEBD for nstances of type (2,2), whch s well-suted n case the number of genes that occur twce both n G 1 and n G 2 s relatvely small. Proposton 1. ZEBD for nstances of type (2,2) (no gene occurs more than twce n G 1 and n G 2 ) s solvable n O ( k ) tme, where k s upper-bounded by the number of genes that occur exactly twce n G 1 and n G 2. Proof. Accordng to Observaton 3, for any nstance (G 1,G 2 ), we only need to focus on exemplarzatons (G E 1,GE 2 ) such that GE 1 = GE 2 or (GE 1 )r = G E 2, where (GE 1 )r s the sgned reversal of G E 1. Let us frst consder the case GE 1 = GE 2 (the case (GE 1 )r = G E 2 s dentcal up to a sgned reversal and wll thereby be brefly dscussed at the end of the proof). Let (G 1,G 2 ) be an nstance of type (2,2) of ZEBD. Our algorthm s by transformng nstance (G 1,G 2 ) nto a CNF boolean formula φ wth only few large clauses such that φ s satsfable ff the exemplar breakpont dstance between G 1 and G 2 s zero. By hypothess, each sgned gene occurs at most twce n G 1 and n G 2. Therefore, for any sgned gene g, we have one out of four possble dstnct confguratons depcted n Fgure 2, where p 1, p 2, and are postons of occurrence of g n G 1 and G 2. Furthermore, snce we are lookng for an exemplarzaton (G E 2,GE 2 ) of (G 1,G 2 ) such that G E 1 = GE 2, we may assume, n case g occurs only once n G 1 or n G 2, that all occurrences of G have the same sgn (otherwse a trval self-reducton would ndeed apply). In other words, referrng at Fgure 2, we assume G 1 [p 1 ] = G 2 [ ] = G 2 [ ] n case (2), G 1 [p 1 ] = G 1 [p 2 ] = G 2 [ ] n case (3), and G 1 [p 1 ] = G 2 [ ] n case (4). Fnally, as for case (1), we may assume that ether all occurrences have the same sgn, or G 1 [p 1 ] = G 1 [p 2 ] and G 2 [ ] = G 2 [ ] (otherwse a trval self-reducton would agan apply). We now descrbe the constructon of the CNF boolean formula φ. Frst, the set of boolean varables X s defned as follows: for each gene g occurrng at poston p n G 1 and at poston q n G 2 (.e., G 1 [p] = G 2 [q]) ) we add to X the boolean varable x p q. We now turn to defnng the clauses of φ. Let g be any gene, and let the occurrence postons of g n G 1 and n G 2 be noted as n Fgure 2. f occ(g,g 1 ) = occ(g,g 2 ) = 2 (case(1)), 15

16 p 1 p 2 p 1 p 1 p 2 p 1 G 1 G 2 (1) (2) (3) (4) Fg.2. The 4 gene-confguratons for nstances of type (2,2): p 1 and p 2 are the occurrence postons of gene g n G 1, and and are the occurrence postons of gene g n G 2. f G 1 [p 1 ] = G 1 [p 2 ] = G 2 [ ] = G 2 [ ], we add to φ the clauses (x p 1 x p 1 x p 2 x p 2 ), (x p 1 x p 1 ), (x p 1 x p 2 ), (x p 1 x p 2 ), (x p 1 x p 2 ), (x p 1 x p 2 ) and (x p 2 x p 2 ), otherwse, we have G 1 [p 1 ] = G 1 [p 2 ] and G 2 [ ] = G 2 [ ] (see above dscusson), f G 1 [p 1 ] = G 2 [ ] and G 1 [p 2 ] = G 2 [ ])), we add to φ the clauses (x p 1 x p 2 ) and (x p 1 x p 2 ), f G 1 [p 1 ] = G 2 [ ] and G 1 [p 2 ] = G 2 [ ])), we add to φ the clauses (x p 1 x p 2 ) and (x p 1 x p 2 ), f occ(g,g 1 ) = 1andocc(g,G 2 ) = 2(case (2)), weaddtoφtheclauses (x p 1 x p 1 ) and(x p 1 x p 1 ), f occ(g,g 1 ) = 2andocc(g,G 2 ) = 1(case (3)), weaddtoφtheclauses (x p 1 x p 2 ) and(x p 1 x p 2 ), and f occ(g,g 1 ) = occ(g,g 2 ) = 1 (case (4)), we add to φ the clause (x p 1 ). The ratonale of ths constructon s that f formula φ evaluates to true for some assgnment f and f(x p q) s true for some gene g occurrng at poston p n G 1 and q n G 2, then all occurrences of g but the one at poston p should be deleted n G 1 and all occurrences of g but the one at poston q should be deleted n G 2, n order to obtan the exemplar soluton. What s left s to enforce that φ evaluates to true ff the exemplar breakpont dstance between G 1 and G 2 s zero. To ths am, we add to φ the followng clauses. For each par of varables (x 1 j1,x 2 j2 ) such that G 1 [ 1 ] G 1 [ 2 ], 1 < 2 and j 1 > j 2, we add to φ the clause (x 1 j1 x 2 j2 ). The constructon of φ s now complete. Clearly, φ evaluates to true ff the exemplar breakpont dstance between G 1 and G 2 s zero. Let k be the number of genes g that occur twce n G 1 and n G 2 wth the same sgn,.e., G 1 [p 1 ] = G 1 [p 2 ] = G 2 [ ] = G 2 [ ]. We now make the mportant observaton that all clauses n φ have sze less than or equal to 2 except those k clauses of sze 4 ntroduced n case gene g occurs twce n G 1 and n G 2 wth the same sgn. By ntroducng a new boolean varable, we can easly replace n φ each clause of sze 4 by two clauses of sze 3, and hence we may now assume that φ s a 3-CNF formula (.e., each clause has sze at most 3) wth exactly 2k clauses of sze 3. As for the case (G E 1 )r = G E 2, we replace G 1 by (G 1 ) r and construct another 3-CNF formula φ as descrbed above. The two 3-CNF formulas need, however, to be examned separately. Fernauproposedn[15]analgorthmforsolvng3-CNFbooleanformulasthatrunsnO ( l ) tme, where l s the number of clauses of sze 3. Therefore, ZEBD for nstances of type (2,2) s solvable n O ( k ) tme, where k s the number of genes g that occur twce n G 1 and n G 2. 16

17 4.2 Zero ntermedate matchng breakpont dstance We now turn to the zero ntermedate breakpont dstance (ZIBD) problem. It s defned as follows. Problem: ZIBD Input: Two genomes G 1 and G 2. Queston: Is the ntermedate breakpont dstance between G 1 and G 2 equal to zero? We show here that ZEBD and ZIBD are equvalent problems. We need the followng lemma. Lemma 9 ([2]). Let G 1 and G 2 be two genomes wthout duplcates and wth the same gene content, and G 1 and G 2 be the two genomes obtaned from G 1 and G 2 by deletng any gene g. Then B(G 1,G 2 ) B(G 1,G 2 ). Theorem 5. ZEBD and ZIBD are equvalent problems. Proof. One drecton s trval (any exemplarzaton s ndeed an ntermedate matchng). The other drecton follows from Lemma 9. It follows from Theorem 5 that the problem IBD s not approxmable even for nstances of type (3,3) (see [13]) and f no gene occurs more than twce n G 1 (see Theorem 4). 4.3 Zero maxmum matchng breakpont dstance We show here that, oppostely to the exemplar and the ntermedate matchng models, decdng whether the maxmum matchng breakpont dstance between two genomes s equal to zero s polynomal-tme solvable, and hence we cannot rule out the exstence of accurate approxmaton algorthms for the maxmum matchng model. We refer to ths problem as ZMBD. Problem: ZMBD Input: Two genomes G 1 and G 2. Queston: Is the maxmum matchng breakpont dstance between G 1 and G 2 equal to zero? The man dea of our approach s to transform any nstance of ZMBD nto a matchng dagram and next use an effcent algorthm for fndng a large set of non-ntersectng lne segments. Note that ths latter problem s equvalent to fndng a large ncreasng subsequence n permutatons. A matchng dagram [18] conssts of, say n, ponts on each of two parallel lnes, and n straght lne segments matchng dstnct pars of ponts. The ntersecton graph of the lne segments s called a permutaton graph (the reason for the name s that f the ponts on the top lne are numbered 1,2,...,n, then the ponts on the other lne are numbered by a permutaton on 1,2,...,n). We descrbe how to turn the par of genomes (G 1,G 2 ) nto a matchng dagram D(G 1,G 2 ). For sake of presentaton we ntroduce the followng notatons. For each gene famly g, we wrte occ pos (G,g) (resp. occ neg (G,g)) for the number of postve (resp. negatve) occurrences of gene g n genome G. Accordng to Observaton 3, t s enough to consder two cases: G M 1 = G M 2 or (G M 1 )r = G M 2, where (GM 1,GM 2,M) s a maxmum matchng of (G 1,G 2 ). Let us frst focus on testng G M 1 = G M 2 (the case (G M 1 )r = G M 2 s dentcal up to a sgned reversal). We descrbe the constructon of the top labeled ponts. Readng genome G 1 from left to rght, we replace gene g by the sequence of labeled ponts +g 1 (,occ pos (G 2,g)) +g 1 (,occ pos (G 2,g) 1)... +g 1 (,1) 17

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Affine transformations and convexity

Affine transformations and convexity Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/

More information

Maximizing the number of nonnegative subsets

Maximizing the number of nonnegative subsets Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Exercise Solutions to Real Analysis

Exercise Solutions to Real Analysis xercse Solutons to Real Analyss Note: References refer to H. L. Royden, Real Analyss xersze 1. Gven any set A any ɛ > 0, there s an open set O such that A O m O m A + ɛ. Soluton 1. If m A =, then there

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness. 20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

HMMT February 2016 February 20, 2016

HMMT February 2016 February 20, 2016 HMMT February 016 February 0, 016 Combnatorcs 1. For postve ntegers n, let S n be the set of ntegers x such that n dstnct lnes, no three concurrent, can dvde a plane nto x regons (for example, S = {3,

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

The L(2, 1)-Labeling on -Product of Graphs

The L(2, 1)-Labeling on -Product of Graphs Annals of Pure and Appled Mathematcs Vol 0, No, 05, 9-39 ISSN: 79-087X (P, 79-0888(onlne Publshed on 7 Aprl 05 wwwresearchmathscorg Annals of The L(, -Labelng on -Product of Graphs P Pradhan and Kamesh

More information

2.3 Nilpotent endomorphisms

2.3 Nilpotent endomorphisms s a block dagonal matrx, wth A Mat dm U (C) In fact, we can assume that B = B 1 B k, wth B an ordered bass of U, and that A = [f U ] B, where f U : U U s the restrcton of f to U 40 23 Nlpotent endomorphsms

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

Volume 18 Figure 1. Notation 1. Notation 2. Observation 1. Remark 1. Remark 2. Remark 3. Remark 4. Remark 5. Remark 6. Theorem A [2]. Theorem B [2].

Volume 18 Figure 1. Notation 1. Notation 2. Observation 1. Remark 1. Remark 2. Remark 3. Remark 4. Remark 5. Remark 6. Theorem A [2]. Theorem B [2]. Bulletn of Mathematcal Scences and Applcatons Submtted: 016-04-07 ISSN: 78-9634, Vol. 18, pp 1-10 Revsed: 016-09-08 do:10.1805/www.scpress.com/bmsa.18.1 Accepted: 016-10-13 017 ScPress Ltd., Swtzerland

More information

Graph Reconstruction by Permutations

Graph Reconstruction by Permutations Graph Reconstructon by Permutatons Perre Ille and Wllam Kocay* Insttut de Mathémathques de Lumny CNRS UMR 6206 163 avenue de Lumny, Case 907 13288 Marselle Cedex 9, France e-mal: lle@ml.unv-mrs.fr Computer

More information

5 The Rational Canonical Form

5 The Rational Canonical Form 5 The Ratonal Canoncal Form Here p s a monc rreducble factor of the mnmum polynomal m T and s not necessarly of degree one Let F p denote the feld constructed earler n the course, consstng of all matrces

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

9 Characteristic classes

9 Characteristic classes THEODORE VORONOV DIFFERENTIAL GEOMETRY. Sprng 2009 [under constructon] 9 Characterstc classes 9.1 The frst Chern class of a lne bundle Consder a complex vector bundle E B of rank p. We shall construct

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Problem Do any of the following determine homomorphisms from GL n (C) to GL n (C)?

Problem Do any of the following determine homomorphisms from GL n (C) to GL n (C)? Homework 8 solutons. Problem 16.1. Whch of the followng defne homomomorphsms from C\{0} to C\{0}? Answer. a) f 1 : z z Yes, f 1 s a homomorphsm. We have that z s the complex conjugate of z. If z 1,z 2

More information

arxiv: v1 [math.co] 1 Mar 2014

arxiv: v1 [math.co] 1 Mar 2014 Unon-ntersectng set systems Gyula O.H. Katona and Dánel T. Nagy March 4, 014 arxv:1403.0088v1 [math.co] 1 Mar 014 Abstract Three ntersecton theorems are proved. Frst, we determne the sze of the largest

More information

Introductory Cardinality Theory Alan Kaylor Cline

Introductory Cardinality Theory Alan Kaylor Cline Introductory Cardnalty Theory lan Kaylor Clne lthough by name the theory of set cardnalty may seem to be an offshoot of combnatorcs, the central nterest s actually nfnte sets. Combnatorcs deals wth fnte

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be

More information

arxiv: v2 [cs.ds] 1 Feb 2017

arxiv: v2 [cs.ds] 1 Feb 2017 Polynomal-tme Algorthms for the Subset Feedback Vertex Set Problem on Interval Graphs and Permutaton Graphs Chars Papadopoulos Spyrdon Tzmas arxv:170104634v2 [csds] 1 Feb 2017 Abstract Gven a vertex-weghted

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Min Cut, Fast Cut, Polynomial Identities

Min Cut, Fast Cut, Polynomial Identities Randomzed Algorthms, Summer 016 Mn Cut, Fast Cut, Polynomal Identtes Instructor: Thomas Kesselhem and Kurt Mehlhorn 1 Mn Cuts n Graphs Lecture (5 pages) Throughout ths secton, G = (V, E) s a mult-graph.

More information

a b a In case b 0, a being divisible by b is the same as to say that

a b a In case b 0, a being divisible by b is the same as to say that Secton 6.2 Dvsblty among the ntegers An nteger a ε s dvsble by b ε f there s an nteger c ε such that a = bc. Note that s dvsble by any nteger b, snce = b. On the other hand, a s dvsble by only f a = :

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Société de Calcul Mathématique SA

Société de Calcul Mathématique SA Socété de Calcul Mathématque SA Outls d'ade à la décson Tools for decson help Probablstc Studes: Normalzng the Hstograms Bernard Beauzamy December, 202 I. General constructon of the hstogram Any probablstc

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

Math 261 Exercise sheet 2

Math 261 Exercise sheet 2 Math 261 Exercse sheet 2 http://staff.aub.edu.lb/~nm116/teachng/2017/math261/ndex.html Verson: September 25, 2017 Answers are due for Monday 25 September, 11AM. The use of calculators s allowed. Exercse

More information

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space. Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

COMPLEX NUMBERS AND QUADRATIC EQUATIONS COMPLEX NUMBERS AND QUADRATIC EQUATIONS INTRODUCTION We know that x 0 for all x R e the square of a real number (whether postve, negatve or ero) s non-negatve Hence the equatons x, x, x + 7 0 etc are not

More information

REAL ANALYSIS I HOMEWORK 1

REAL ANALYSIS I HOMEWORK 1 REAL ANALYSIS I HOMEWORK CİHAN BAHRAN The questons are from Tao s text. Exercse 0.0.. If (x α ) α A s a collecton of numbers x α [0, + ] such that x α

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Every planar graph is 4-colourable a proof without computer

Every planar graph is 4-colourable a proof without computer Peter Dörre Department of Informatcs and Natural Scences Fachhochschule Südwestfalen (Unversty of Appled Scences) Frauenstuhlweg 31, D-58644 Iserlohn, Germany Emal: doerre(at)fh-swf.de Mathematcs Subject

More information

arxiv: v1 [cs.gt] 14 Mar 2019

arxiv: v1 [cs.gt] 14 Mar 2019 Stable Roommates wth Narcssstc, Sngle-Peaked, and Sngle-Crossng Preferences Robert Bredereck 1, Jehua Chen 2, Ugo Paavo Fnnendahl 1, and Rolf Nedermeer 1 arxv:1903.05975v1 [cs.gt] 14 Mar 2019 1 TU Berln,

More information

Problem Solving in Math (Math 43900) Fall 2013

Problem Solving in Math (Math 43900) Fall 2013 Problem Solvng n Math (Math 43900) Fall 2013 Week four (September 17) solutons Instructor: Davd Galvn 1. Let a and b be two nteger for whch a b s dvsble by 3. Prove that a 3 b 3 s dvsble by 9. Soluton:

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

Mixed-integer vertex covers on bipartite graphs

Mixed-integer vertex covers on bipartite graphs Mxed-nteger vertex covers on bpartte graphs Mchele Confort, Bert Gerards, Gacomo Zambell November, 2006 Abstract Let A be the edge-node ncdence matrx of a bpartte graph G = (U, V ; E), I be a subset the

More information

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41, The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no confuson

More information

Finding Primitive Roots Pseudo-Deterministically

Finding Primitive Roots Pseudo-Deterministically Electronc Colloquum on Computatonal Complexty, Report No 207 (205) Fndng Prmtve Roots Pseudo-Determnstcally Ofer Grossman December 22, 205 Abstract Pseudo-determnstc algorthms are randomzed search algorthms

More information

Anti-van der Waerden numbers of 3-term arithmetic progressions.

Anti-van der Waerden numbers of 3-term arithmetic progressions. Ant-van der Waerden numbers of 3-term arthmetc progressons. Zhanar Berkkyzy, Alex Schulte, and Mchael Young Aprl 24, 2016 Abstract The ant-van der Waerden number, denoted by aw([n], k), s the smallest

More information

Subset Topological Spaces and Kakutani s Theorem

Subset Topological Spaces and Kakutani s Theorem MOD Natural Neutrosophc Subset Topologcal Spaces and Kakutan s Theorem W. B. Vasantha Kandasamy lanthenral K Florentn Smarandache 1 Copyrght 1 by EuropaNova ASBL and the Authors Ths book can be ordered

More information

n ). This is tight for all admissible values of t, k and n. k t + + n t

n ). This is tight for all admissible values of t, k and n. k t + + n t MAXIMIZING THE NUMBER OF NONNEGATIVE SUBSETS NOGA ALON, HAROUT AYDINIAN, AND HAO HUANG Abstract. Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what

More information

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k. THE CELLULAR METHOD In ths lecture, we ntroduce the cellular method as an approach to ncdence geometry theorems lke the Szemeréd-Trotter theorem. The method was ntroduced n the paper Combnatoral complexty

More information

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule: 15-745 Lecture 6 Data Dependence n Loops Copyrght Seth Goldsten, 2008 Based on sldes from Allen&Kennedy Lecture 6 15-745 2005-8 1 Common loop optmzatons Hostng of loop-nvarant computatons pre-compute before

More information

A CLASS OF RECURSIVE SETS. Florentin Smarandache University of New Mexico 200 College Road Gallup, NM 87301, USA

A CLASS OF RECURSIVE SETS. Florentin Smarandache University of New Mexico 200 College Road Gallup, NM 87301, USA A CLASS OF RECURSIVE SETS Florentn Smarandache Unversty of New Mexco 200 College Road Gallup, NM 87301, USA E-mal: smarand@unmedu In ths artcle one bulds a class of recursve sets, one establshes propertes

More information

Polynomials. 1 More properties of polynomials

Polynomials. 1 More properties of polynomials Polynomals 1 More propertes of polynomals Recall that, for R a commutatve rng wth unty (as wth all rngs n ths course unless otherwse noted), we defne R[x] to be the set of expressons n =0 a x, where a

More information

MAT 578 Functional Analysis

MAT 578 Functional Analysis MAT 578 Functonal Analyss John Qugg Fall 2008 Locally convex spaces revsed September 6, 2008 Ths secton establshes the fundamental propertes of locally convex spaces. Acknowledgment: although I wrote these

More information

( 1) i [ d i ]. The claim is that this defines a chain complex. The signs have been inserted into the definition to make this work out.

( 1) i [ d i ]. The claim is that this defines a chain complex. The signs have been inserted into the definition to make this work out. Mon, Apr. 2 We wsh to specfy a homomorphsm @ n : C n ()! C n (). Snce C n () s a free abelan group, the homomorphsm @ n s completely specfed by ts value on each generator, namely each n-smplex. There are

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1 MATH 5707 HOMEWORK 4 SOLUTIONS CİHAN BAHRAN 1. Let v 1,..., v n R m, all lengths v are not larger than 1. Let p 1,..., p n [0, 1] be arbtrary and set w = p 1 v 1 + + p n v n. Then there exst ε 1,..., ε

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8 U.C. Berkeley CS278: Computatonal Complexty Handout N8 Professor Luca Trevsan 2/21/2008 Notes for Lecture 8 1 Undrected Connectvty In the undrected s t connectvty problem (abbrevated ST-UCONN) we are gven

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

find (x): given element x, return the canonical element of the set containing x;

find (x): given element x, return the canonical element of the set containing x; COS 43 Sprng, 009 Dsjont Set Unon Problem: Mantan a collecton of dsjont sets. Two operatons: fnd the set contanng a gven element; unte two sets nto one (destructvely). Approach: Canoncal element method:

More information

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS

MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS MATH 241B FUNCTIONAL ANALYSIS - NOTES EXAMPLES OF C ALGEBRAS These are nformal notes whch cover some of the materal whch s not n the course book. The man purpose s to gve a number of nontrval examples

More information

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP

FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP C O L L O Q U I U M M A T H E M A T I C U M VOL. 80 1999 NO. 1 FACTORIZATION IN KRULL MONOIDS WITH INFINITE CLASS GROUP BY FLORIAN K A I N R A T H (GRAZ) Abstract. Let H be a Krull monod wth nfnte class

More information

Week 2. This week, we covered operations on sets and cardinality.

Week 2. This week, we covered operations on sets and cardinality. Week 2 Ths week, we covered operatons on sets and cardnalty. Defnton 0.1 (Correspondence). A correspondence between two sets A and B s a set S contaned n A B = {(a, b) a A, b B}. A correspondence from

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

arxiv: v1 [math.ho] 18 May 2008

arxiv: v1 [math.ho] 18 May 2008 Recurrence Formulas for Fbonacc Sums Adlson J. V. Brandão, João L. Martns 2 arxv:0805.2707v [math.ho] 8 May 2008 Abstract. In ths artcle we present a new recurrence formula for a fnte sum nvolvng the Fbonacc

More information

Spectral Graph Theory and its Applications September 16, Lecture 5

Spectral Graph Theory and its Applications September 16, Lecture 5 Spectral Graph Theory and ts Applcatons September 16, 2004 Lecturer: Danel A. Spelman Lecture 5 5.1 Introducton In ths lecture, we wll prove the followng theorem: Theorem 5.1.1. Let G be a planar graph

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

The internal structure of natural numbers and one method for the definition of large prime numbers

The internal structure of natural numbers and one method for the definition of large prime numbers The nternal structure of natural numbers and one method for the defnton of large prme numbers Emmanul Manousos APM Insttute for the Advancement of Physcs and Mathematcs 3 Poulou str. 53 Athens Greece Abstract

More information

Math1110 (Spring 2009) Prelim 3 - Solutions

Math1110 (Spring 2009) Prelim 3 - Solutions Math 1110 (Sprng 2009) Solutons to Prelm 3 (04/21/2009) 1 Queston 1. (16 ponts) Short answer. Math1110 (Sprng 2009) Prelm 3 - Solutons x a 1 (a) (4 ponts) Please evaluate lm, where a and b are postve numbers.

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

General theory of fuzzy connectedness segmentations: reconciliation of two tracks of FC theory

General theory of fuzzy connectedness segmentations: reconciliation of two tracks of FC theory General theory of fuzzy connectedness segmentatons: reconclaton of two tracks of FC theory Krzysztof Chrs Ceselsk Department of Mathematcs, West Vrgna Unversty and MIPG, Department of Radology, Unversty

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

Perfect Competition and the Nash Bargaining Solution

Perfect Competition and the Nash Bargaining Solution Perfect Competton and the Nash Barganng Soluton Renhard John Department of Economcs Unversty of Bonn Adenauerallee 24-42 53113 Bonn, Germany emal: rohn@un-bonn.de May 2005 Abstract For a lnear exchange

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

On the set of natural numbers

On the set of natural numbers On the set of natural numbers by Jalton C. Ferrera Copyrght 2001 Jalton da Costa Ferrera Introducton The natural numbers have been understood as fnte numbers, ths wor tres to show that the natural numbers

More information

Inductance Calculation for Conductors of Arbitrary Shape

Inductance Calculation for Conductors of Arbitrary Shape CRYO/02/028 Aprl 5, 2002 Inductance Calculaton for Conductors of Arbtrary Shape L. Bottura Dstrbuton: Internal Summary In ths note we descrbe a method for the numercal calculaton of nductances among conductors

More information

Genericity of Critical Types

Genericity of Critical Types Genercty of Crtcal Types Y-Chun Chen Alfredo D Tllo Eduardo Fangold Syang Xong September 2008 Abstract Ely and Pesk 2008 offers an nsghtful characterzaton of crtcal types: a type s crtcal f and only f

More information

Perron Vectors of an Irreducible Nonnegative Interval Matrix

Perron Vectors of an Irreducible Nonnegative Interval Matrix Perron Vectors of an Irreducble Nonnegatve Interval Matrx Jr Rohn August 4 2005 Abstract As s well known an rreducble nonnegatve matrx possesses a unquely determned Perron vector. As the man result of

More information

10. Canonical Transformations Michael Fowler

10. Canonical Transformations Michael Fowler 10. Canoncal Transformatons Mchael Fowler Pont Transformatons It s clear that Lagrange s equatons are correct for any reasonable choce of parameters labelng the system confguraton. Let s call our frst

More information

Appendix B. Criterion of Riemann-Stieltjes Integrability

Appendix B. Criterion of Riemann-Stieltjes Integrability Appendx B. Crteron of Remann-Steltes Integrablty Ths note s complementary to [R, Ch. 6] and [T, Sec. 3.5]. The man result of ths note s Theorem B.3, whch provdes the necessary and suffcent condtons for

More information