Finding low error clusterings

Size: px
Start display at page:

Download "Finding low error clusterings"

Transcription

1 Fndng ow error custerngs Mara-Forna Bacan Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA Mark Braverman Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA Abstract A common approach for sovng custerng probems s to desgn agorthms to approxmatey optmze varous objectve functons (e.g., k-means or mn-sum) defned n terms of some gven parwse dstance or smarty nformaton. However, n many earnng motvated custerng appcatons (such as custerng protens by functon) there s some unknown target custerng; n such cases the parwse nformaton s merey based on heurstcs and the rea goa s to acheve ow error on the data. In these settngs, an arbtrary c-approxmaton agorthm for some objectve woud work we ony f any c-approxmaton to that objectve s cose to the target custerng. In recent work, Bacan et. a [7] have shown how both for the k-means and k-medan objectves ths property aows one to produce custerngs of ow error, even for vaues c such that gettng a c-approxmaton to these objectve functons s provaby NP-hard. In ths paper we anayze the mn-sum objectve from ths perspectve. Whe [7] aso consdered the mn-sum probem, the resuts they derved for ths objectve were substantay weaker. In ths work we derve new and more subte structura propertes for mn-sum n ths context and use these to desgn effcent agorthms for producng accurate custerngs, both n the transductve and n the nductve case. We aso anayze the correaton custerng probem from ths perspectve, and pont out nterestng dfferences between ths objectve and k-medan, k-means, or mn-sum objectves. 1 Introducton Probems of custerng data from parwse dstance or smarty nformaton are ubqutous n scence. A common approach for sovng such probems s to vew the data ponts as nodes n a weghted graph (wth the weghts based on the gven parwse nformaton), and then to desgn agorthms to optmze varous objectve functons such as k-means or mn-sum. For exampe, n the mn-sum custerng approach the goa s to produce a partton nto a gven number of custers k that mnmzes the sum of the ntracuster dstances. Many of the optmzaton probems correspondng to commony anayzed objectves (ncudng k-means, mnsum, k-medan, or correaton custerng) are NP-hard and so the focus n the theory communty has been n desgnng approxmaton agorthms for these objectves. 1 For exampe the best known approxmaton agorthm for the k- medan probem s a (3 + ɛ)-approxmaton [6], whe the best approxmaton for the mn-sum probem n genera metrc spaces s a O(og 1+δ n)-approxmaton. For many of these probems the approxmaton guarantees do not match the known hardness resuts, and sgnfcant effort s spent on obtanng tghter approxmaton guarantees and hardness resuts [3, 6, 9, 11, 13, 15, 12, 17, 21, 25, 28]. Standard custerng settngs used to motvate much of ths effort ncude probems such as custerng protens by functon, mages by subject, or documents by topc. In many of these settngs there s some unknown correct target custerng and the mpct hope s that approxmatey optmzng objectve functons such as those mentoned above w n fact produce a custerng of ow error,.e. a custerng whch agrees wth the truth on most of the ponts. In other words, mpct n takng the approxmaton-agorthms approach s the hope that any c-approxmaton to our gven objectve w be pontwse cose to the true answer, and our motvaton for mprovng a c 2 -approxmaton to a c 1 -approxmaton (for c 1 < c 2 ) s that perhaps ths coseness property hods for c 1 but not c 2. In recent work, Bacan et. a [7] have shown that f we make ths mpct assumpton expct, then one can get accurate custerngs even n cases where gettng a good approxmaton to these objectve functons s provaby NP-hard. In partcuar, say that a data set satsfes the (c, ɛ) property for some objectve functon Φ f any c-approxmaton to Φ on ths data must be ɛ-cose to the target custerng. [7] show that for any c = 1 + α > 1, f data satsfes the (c, ɛ) property for the k-medan (or k-means) objectves, then one can produce custerngs that are O(ɛ)-cose to the target, even for vaues c for whch obtanng a c-approxmaton s NP-hard. [7] aso consder the mn-sum objectve, however the resuts they present work ony for vaues of c > 2 and under the 1 A β-approxmaton agorthm for objectve φ s an agorthm that runs n poynoma tme and returns a souton whose vaue s wthn a mutpcatve β factor of the optma souton for the gven objectve φ.

2 assumpton that a the target custers are arge. 1.1 Our Resuts In ths work we sove the probem of gettng accurate custerngs for the mn-sum objectve under the (c, ɛ)-assumpton, mprovng on the resuts of Bacan et. a [7] for ths objectve n mutpe respects. In partcuar, we show t s possbe to dea wth any constant c = 1 + α > 1 (and not ony c > 2). More mportanty we are aso abe to dea wth the presence of sma target custers. To acheve ths we derve new and much more subte structura propertes mped by the (c, ɛ)-assumpton. In the case where k s sma compared to og n/ og og n we output a snge custerng whch s O(ɛ/α)-cose to the target, whe n the genera case our agorthm outputs a sma st of custerngs wth the property that the target custerng s cose to one of those n the st. We show that the agorthm we deveop for the mn-sum objectve s robust, whch aows us to extend t to the nductve mode. In the nductve mode S s merey a sma random subset of ponts from a much arger abstract nstance space X, and our goa s to produce a hypothess h : X Y whch mpcty represents a custerng of the whoe space X and whch has ow error on the whoe X. An appeang characterstc of the agorthm we obtan for the nductve case s that the nserton of new ponts (whch arrve onne) s extremey effcent: we ony need O(k)-comparsons for assgnng a new pont x to one of the custers. We further show that f we do requre the custers to be arge, we can reduce the approxmaton error from O(ɛ/α) down to O(ɛ) the best one can hope for. We thus affrmatvey answer severa open questons n [7]. We aso anayze the correaton custerng probem n ths framework. In correaton custerng, the nput s a graph wth edges abeed +1 or 1 and the goa s to fnd a partton of the nodes that best matches the sgns of the edges. Ths custerng formuaton was ntroduced by Bansa et. a n [11] and t has been extensvey studed n a seres of foow-up papers both n the theoretca computer scence and n the machne earnng communty [3, 13, 22]. In the orgna paper Bansa et a. [11] consdered two versons of the correaton custerng probem, mnmzng dsagreements and maxmzng agreements. 2 In ths paper we focus on the mnmzng dsagreements objectve functon. (The maxmzng agreement verson of correaton custerng s ess nterestng n our framework snce t admts a PTAS 3.) We show that ths objectve behaves much better than objectves such as k-medan, k-means, and mn-sum n terms of error rate. More specfcay, we show that for ths objectve, the (1 + α, ɛ) property mpes a (2.5, O(ɛ/α)) property, so one can use a state-of-the-art 2.5-approxmaton agorthm for 2 In the former case, the goa s to mnmze the number of 1 edges nsde custers pus the number of +1 edges between custers, whe n the atter case the goa s to maxmze the number of +1 edges nsde the custer pus the number of 1 edges between. These are equvaent at optmaty but dffer n ther dffcuty of approxmaton. 3 A PTAS (poynoma-tme approxmaton scheme) s an agorthm that for any gven fxed ɛ runs n poynoma tme and returns an approxmaton wthn a 1 + ɛ factor. Runnng tme may depend exponentay (or worse) on 1/ɛ, however mnmzng dsagreements n order to get an accurate custerng. Ths contrasts sharpy wth the prevous resuts proven n ths context for objectves such as mn-sum, k-medan, or k-means. Our work shows how for a custerng objectve such as mn-sum we can obtan resuts comparabe to what one coud obtan by beng abe to approxmate the objectve to an arbtrary sma constant. In other words f what we reay want s to obtan a custerng of ow error, then by makng mpct assumptons expct we can obtan ow error custerngs even n cases where gettng a c-approxmaton to the mn-sum objectve s NP-hard. Ths ponts out how one can get much better resuts than those obtaned so far n the approxmaton agorthms terature by wsey usng a the avaabe nformaton for the probem at hand. 1.2 Reated Work Work on approxmaton agorthms: We revew n the foowng state of the art resuts on approxmaton agorthms for the two custerng objectves we dscuss n ths paper. Mn-sum k-custerng on genera metrc spaces admts a PTAS for the case of constant k by Fernandez de a Vega et a. [17] (see aso [20]). For the case of arbtrary k there s an O(δ 1 og 1+δ n)-approxmaton agorthm that runs n tme n O(1/δ) due to Barta et a. [9]. The probem has aso been studed n geometrc spaces for constant k by Schuman [28] who gave an agorthm for (R d, 2 2) that ether outputs a (1 + ɛ)-approxmaton, or a souton that agrees wth the optmum custerng on (1 ɛ)-fracton of the ponts (but coud have much arger cost than optmum); the runtme s O(n og og n ) n the worst case and near for subogarthmc dmenson d. More recenty, Czumaj and Soher have deveoped a (4 + ɛ)-approxmaton agorthm for the case when k s sma compared to og n/ og og n [15]. Correaton Custerng was ntroduced by Bansa et. a n [11]. In the orgna paper Bansa et a. [11] have consdered two versons of the correaton custerng probem, mnmzng dsagreements and maxmzng agreements, focusng many on the case when the graph G s compete. They gave a poynoma tme approxmaton scheme (PTAS) for the maxmzng agreements verson on compete graphs, whe for the mnmzng dsagreements versons, they gave an approxmaton agorthm wth a constant performance rato. The constant was a rather arge one, and t has subsequenty mproved to 4 n [13] and then to 2.5 n [3]. In the case when the graph s not compete, the best known approxmaton s O(og n) [13]. Other work on Custerng: Our work s most reevant for settngs where there s a target custerng and t s motvated by resuts n [8] whch have nvestgated the goa of approxmatng a desred target custerng wthout makng any probabstc assumptons. In addton to ths, there has been sgnfcant work n machne earnng and theoretca computer scence on custerng or earnng wth mxture modes [1, 5, 19, 18, 23, 29, 16]. That work, ke ours, has an expct noton of a correct ground-truth custerng of the data ponts; however, t makes very specfc probabstc assumptons about the data. There s a arge body of other work whch does not assume the exstence of a target custerng. For exampe there

3 has been work on axomatzng custerng (n the sense of postuatng what natura axoms shoud a good custerng agorthm or quaty measure satsfy), both wth possbty [2] and mpossbty [24] resuts, on comparng custerngs [26, 27], and on effcenty testng f a gven data set has a custerng satsfyng certan propertes [4]. The man dfference between ths type of work and our work s that we have an expct noton of a correct ground-truth custerng of the data ponts, and ndeed the resuts we are tryng to prove are qute dfferent. Inductve Settng: In the nductve settng, where we magne our gven data s ony a sma random sampe of the entre data set, our framework s cose n sprt to recent work done on sampe-based custerng (e.g., [10, 14]) n the context of custerng agorthms desgned to optmze a certan objectve. Based on such a sampe, these agorthms have to output a custerng of the fu doman set, that s evauated wth respect to the underyng dstrbuton. 2 Defntons and Premnares The custerng probems n ths paper fa nto the foowng genera framework. We are gven a set S of n ponts whch we want to custer. We are aso gven a parwse smarty and/or dssmarty nformaton expressed through a weghted graph (G, d) on S. A k-custerng C s a partton of S nto k sets C 1, C 2,..., C k. In ths paper, we aways assume that there s a true or target k-custerng C T for the pont set S. A natura noton of dstance between two k-custerngs C = {C 1, C 2,..., C k } and C = {C 1, C 2,..., C k } whch we use throughout the paper s the fracton of ponts on whch they dsagree under the optma matchng of custers n C to custers n C ;.e., we defne dst(c, C 1 ) = mn σ S k n k C C σ(), =1 where S k s the set of bjectons σ : [k] [k]. We say that two custerngs C and C are ɛ-cose f dst(c, C ) < ɛ and we say that a custerng has error ɛ f t s ɛ-cose to the target. We can aso defne the dstance dst(c, C ) between two custerngs C = {C 1, C 2,..., C k1 } and C = {C 1, C 2,..., C k 2 } wth a dfferent number of custers k 1 and k 2 where k 1 > k 2 by smpy extendng the custerng C wth a few empty custers and then usng the noton of dstance defned above. We w now state a usefu fact about the dstance between two custerngs whch we use throughout the paper and whch s a smpe consequence of the defnton: Fact 1 Gven two custerngs C and C, f we produce a st L of dsjont subsets S 1, S 2,..., such that for each, a ponts n S are n the same custer n one of C or C and they are a n dfferent custers n the other, then C and C must have dstance at east 1 n ( S 1). In many cases we w use Fact 1 on sets {S } of sze 2. We consder two commony used custerng agorthms whch seek to mnmze some objectve functon or score. Mn-sum custerng The frst one s the mn-sum custerng probem [17, 9]. Here d : ( S 2) R 0 s a dstance functon and the goa s to fnd a custerng that mnmzes Φ Σ := k =1 x,y C d(x, y). In ths paper we focus on the case where d satsfes the trange nequaty, and we aso dscuss a few extensons of ths condton. Correaton custerng The second custerng setup we anayze s correaton custerng ntroduced n [11]. In ths settng the graph G s fuy connected wth edges (x, y) abeed d(x, y) = +1 (smar) or d(x, y) = 1 (dfferent). The goa s to fnd a partton of the vertces nto custers that agrees as much as possbe wth the edge abes. In partcuar, the Mn-Dsagreement correaton custerng objectve (Mn-Dsagreement CC) asks to fnd a custerng C = {C 1, C 2,..., C k } to mnmze the objectve functon the number of dsagreements (the number of 1 edges nsde custers pus the number of +1 edges between custers): Φ CC := #{x, y C : d(x, y) = } + #{x C, y C j, j : d(x, y) = +}. Note that n the correaton custerng settng, the target number of custers s not specfed as part of the nput. Gven a functon Φ (such as k-medan or mn-sum) and nstance (S, d), et OPT Φ = mn Φ(C), C where the mnmum s over a k-custerngs of (S, d). The (c, ɛ)-property The foowng noton orgnay ntroduced n [7] s centra to our dscusson: Defnton 2 Gven an objectve functon Φ (such as k-medan or mn-sum), and c = 1+α > 1, ɛ > 0, we say that nstance (S, d) satsfes the (c, ɛ)-property for Φ f a custerngs C wth Φ(C) c OPT Φ are ɛ-cose to the target custerng C T for (S, d). Note that for any c > 1, the (c, ɛ)-property does not requre that the target custerng C T exacty concde wth the optma custerng C under objectve Φ. However, t does mpy the foowng smpe facts: Fact 3 If (S, d) satsfes the (c, ɛ)-property for Φ, then: (a) The target custerng C T, and the optma custerng C are ɛ-cose. (b) The dstance between k-custerngs s a metrc, and hence a (c, ɛ) property wth respect to the target custerng C T mpes a (c, 2ɛ) property wth respect to the optma custerng C. Thus, we can act as f the optma custerng s ndeed the target up to a constant factor oss n the error rate. For smpcty, we w assume throughout the paper (except n Secton 4) that C T s ndeed the optma custerng C.

4 3 The Mn-sum Custerng Probem Reca that the mn-sum k-custerng probem asks to fnd a k-custerng C = {C 1, C 2,..., C k } to mnmze the objectve functon: k Φ(C) = 2 d(x, y). =1 x,y C We focus here on the case where d : ( S 2) R 0 s a dstance functon satsfyng the trange nequaty. As shown n [7] we have the foowng: Theorem 4 [7] For any 1 c 1 < c 2, any ɛ, δ > 0, there exsts a famy of metrc spaces G and target custerngs that satsfy the (c 1, ɛ) property for the mn-sum objectve and yet do not satsfy even the (c 2, 1/2 δ) property for that objectve. So, n the mn-sum objectve case t s not the case that f the data satsfes the (c 1, ɛ) property, then then we can use c 2 approxmaton agorthm n order to get a custerng of sma error rate, for some c 2 > c 1. [7] aso shows the foowng: Theorem 5 For the mn-sum objectve the probem of fndng a c-approxmaton can be reduced to the probem of fndng a c-approxmaton under the (c, ɛ) assumpton. Therefore, the probem of fndng a c-approxmaton under the (c, ɛ) assumpton s as hard as the probem of fndng a c- approxmaton n genera. Theorem 5 means that, generay speakng, the (c, ɛ) assumpton does not make optmzng mn-sum easer. Genera overvew of our constructon. The genera dea for our constructon s to obtan varous structura propertes for nstances that satsfy the (c, ɛ) assumpton, and then to use these propertes to gve an effcent agorthm for achevng ow error custerngs. These structura propertes are essenta snce as mentoned above the genera mn-sum custerng probem s APX-hard. The structura propertes stem from the fact that under the the (c, ɛ) assumpton the optma souton s fary stabe : changng t a tte ncreases the cost substantay. The frst key property (Lemma 6) we prove usng ths stabty s that most pars of custers are qute expensve to merge. Usng a vertex-cover argument on a specay desgned graph on custers, we show that we can remove few (O(ɛn)) ponts such that no two custers among the remanng ponts are cheap to merge. Next we show that for most of the remanng ponts x we can draw a ba B x of an approprate radus that essentay covers the custer C (x). Such bas B x and B y usuay do not overap when C (x) C (y) (Lemma 9) snce such an overap woud mean that x and y are suffcenty cose to merge C (x) wth C (y) cheapy, eadng to a contradcton. We desgnate a cass of good ponts for whch the above s true. A but O(ɛn/α) ponts are good. We ntroduce subsets B x B x to make sure that the ba around each x (whether good or bad) contans ony good ponts from at most one custer. At the same tme, B x centered around a good pont x st covers the buk of the custer C (x). The agorthm then uses a greedy coverng usng the { B x } x S to perform the actua custerng. The anayss of the custerng produced s done usng a carefu chargng argument. It shows that the fna custerng s O(ɛn/α) cose to the optma. Whe the anayss s qute nvoved, the custerng agorthm tsef s smpe, robust, and effcent. Ths smpcty and robustness aows us to extend t to the nductve settng. 3.1 Propertes of Mn-Sum Custerng We start by dervng a few structura propertes mped by the (1 + α, ɛ)-property for mn-sum. We emphasze that a the constructons n ths subsecton are for the anayss purpose; our agorthmc resuts are descrbed n Secton 3.2. Reca that C denotes the optma custerng. For x C, defne w(x) = y C d(x, y) and et w = avg x w(x) = OPT n. We start by creatng a badness graph G = (V, E) on the set of custers, connectng pars of custers that are not too expensve to merge. Formay, V s the set {C1,..., Ck } and for any two custers, C and C j we add an edge e between them f the addtona cost ncurred for mergng them s at most ( C + Cj ) wα 2ɛ. Lemma 6 Assume that the mn-sum nstance (S, d) satsfes the (1 + α, ɛ)-property wth respect to the target custerng. Then we can remove < 3ɛn ponts such that the remanng set of custers form an ndependent set n G. Proof: We start by showng that f the mn-sum nstance (S, d) satsfes the (1 + α, ɛ)-property wth respect to the target custerng, then there cannot be a coecton of dsjont custer pars (C 1, Cj 1 ), (C 2, Cj 2 ),..., (C r, Cj r ) such that C and Cj are connected n G and ( C + Cj ) 3ɛn. Let C and Cj be two custers such that the addtona cost ncurred for mergng them s at most ( C + Cj ) wα 2ɛ. Assume w..o.g. that C Cj. We show now that for any set sze s there exsts a subset A s of Cj of sze s whch we can move from Cj to C at an addtona cost n the mn-sum objectve of at most A s wα ɛ. Frst note that d(x, y) Cj ɛ. y C x C j Let α(x) = y C d(x, y). So x C j α(x) C j ɛ. Hence, for any set sze s = ɛ n we can seect a subset A s of of Cj (namey the frst s eements wth the smaest vaues of α(x)) such that x A s α(x) A s ɛ.

5 Ths mpes that for any set sze s there exsts a subset A s of Cj whch we can move from Cj to C at an addtona cost n the mn-sum objectve of at most A s wα ɛ, as desred. Assume that there exsts r and a coecton of dsjont pars (C 1, Cj 1 ), (C 2, Cj 2 ),..., (C r, Cj r ) such that C Cj, C and Cj are connected n G, and ( C + Cj ) 3ɛn. Let A C j of sze Snce and we have that ɛ n = mn(ɛn s< ɛ s n, max( C, C j /2)). max( C, C j /2) 1 3 ( C + C j ) ( C + Cj ) 3ɛn A = ɛ n = ɛn. Let C be the custerng obtaned by movng A from Cj to C for = 1,..., r. Each movement of a set A from Cj to C ncreases the dstance between C and C by ɛ. To prove ths we use Fact 1 and do a case anayss. If ɛ n = C, then we match up each pont x n C wth pont y n A and we defne the set S as {x, y }. If ɛ n = Cj /2 then we spt Cj nto two sets Cj 1 and Cj 2 of equa sze and match up a each pont n x n C 1 wth a pont y n C 2 and then defne the set S as {x, y }. If ɛ n < max( C, Cj /2), then we appy ether of the constructons above. In a cases we produce a st of dsjont subsets S 1, S 2,..., such that for each, a ponts n S are n the same custer n one of C or C and they are a n dfferent custers n the other. Usng Fact 1 we obtan that by movng the set A from Cj to C we ncrease the dstance between C and C by 1 n ( S 1) = ɛ. Overa we get dst(c, C ) = ɛ = ɛ. We aso have Φ(C ) = Φ(C) + (/ɛ) (ɛ n) Φ(C) + n Φ(C) + αopt. We thus obtan that C whch s ɛ-far from the target and whose mn-sum cost s wthn a 1 + α factor of OPT, contradctng the (1 + α, ɛ)-property. To fnsh the argument et L be the output of the greedy vertex cover on the graph G. Specfcay, et L be the st of custers constructed as foows: pck an arbtrary edge e n G, add both vertces ncdent to edge e n the st L, deete any edge sharng a vertex wth e, and repeatng unt the graph s out of edges. Note that L s a vertex cover n G; ths s because takng both of the vertces ncdent to a gven each edge n the st L and we ony deete edges ncdent to one of these vertces, and eventuay deete a the edges. Snce L s a vertex cover n G, we have that for any par (C, C ) whch forms an edge n G, ether C L or C L, so the remanng set of custers C \ L s an ndependent set of G. Snce L s a coecton of dsjont edges we aso have (accordng to what we have proved above) C L C 3ɛn. Ths concudes the proof. If C L n the above proof, et H 1 = C. Ese et H 1 =. By Lemma 6 we have that k =1 H1 3ɛn and that the cost of mergng two custers that have not been removed s ow. Ths ast condton mpes the foowng: Lemma 7 For any two x C \ H1, y C j \ H1 j, w(x) 15ɛ, w(y) 15ɛ, we have d(x, y) 3ɛ 1 mn ( C, C j ). Proof: Assume there exst x C, y C j w(y) 15ɛ, s.t. d(x, y) 3ɛ 1 mn ( C, C j )., w(x) 15ɛ, Note the addtona cost ncurred n the mnsum objectve by mergng C and C j s at most d(x, y ) x C y Cj x C y Cj (d(x, x) + d(x, y) + d(y, y )). Therefore the addtona cost ncurred n the mnsum objectve by mergng C and C j s at most Cj w(x) + C w(y) + 3ɛ C C j mn ( C, C j ) = Cj w(x) + C w(y) + 3ɛ max ( C, Cj ) ( ( Cj + C ) 15ɛ + ) 3ɛ < ( C + Cj ) wα 2ɛ, whch contradcts Lemma 6 and the defnton of H 1. For a x, et us now defne τ x and B x that w be used n Agorthm 1. To obtan τ x, we start τ = 0 and graduay ncrease t unt B(x, τ) 1 20 ɛτ ; once ths happens we set τ x = τ and B x = B(x, τ x ). We can now show the foowng. Lemma 8 For any pont x such that w(x) 15ɛ we have τ x 6ɛ C. Proof: Snce w(x) = y C d(x, y) 15ɛ, we have that at east C /2 ponts n a τ = 6ɛ C neghborhood of x. Ths mpes B(τ, x) τ > 12ɛ > 20ɛ, so τ x τ as desred.

6 Lemma 9 For any two ponts x C \ H1, y C j \ H1 j, such that w(x) 15ɛ, w(y) 15ɛ, we have B x B y =. Proof: By Lemmas 7 and 8 we have ( ) τ x + τ y 1 6ɛ C + 1 Cj 3ɛ 1 mn ( C, C j ) < d(x, y), ) whch together wth Lemma 8 mpes the desred resut. Let us denote by H 2 = { x C \ } H1 : w(x) > 15ɛ and H 2 = H 2. Snce E[w(x)] = w, by Markov nequaty, we have H 2 15ɛ α n. 3.2 Agorthm for Mn-Sum Custerng In ths secton, we show that f our data satsfes the (1 + α, ɛ)-property for the mn-sum objectve, then we can fnd a custerng that s O(ɛ/α)-cose to the target C T. We start by consderng the case where we know the vaue of OPT or w = OPT/n and we then show how to get rd of ths assumpton n Theorem 10. For the case of known w we show n the foowng that Agorthm 1 can be used to produces a custerng that s O(ɛ/α)-cose to the target. In ths agorthm we defne crtca threshods τ 0, τ 1, τ 2,... as: τ 0 = 0 and τ s the th smaest dstnct dstance d(x, y) for x, y S. We can show the foowng. Agorthm 1 Mn-Sum Agorthm Input: (S, d), w, ɛ 1, α > 0, k. For a x do: Let the nta threshod τ = τ 0. Construct the ba B(x, τ) by ncudng a ponts wthn dstance τ of x. ɛτ If B(x, τ) 1 20 then et τ x = τ and B x = B(x, τ x ) ese ncrease τ to the next crtca threshod For a x, et B x := {y : x B y, y B x }; set L =. For = 1... k do Let C o be the argest B x. Add C o to L. For a x x, set B x = B x \ C o. Output: Custerng L. Note that n the B x constructon phase one can aternatvey sort the ponts by ther dstance from x and add them to B(x, τ) one-by-one nstead of usng crtca threshods. Theorem 10 If the mn-sum nstance (S, d) satsfes the (1+ α, ɛ)-property and we are gven the vaue of w, then Agorthm 1 produces a custerng that s O(ɛ/α)-cose to the target. Proof: We frst note that the sets B x n Agorthm 1 are we defned snce for sma τ the condton B(x, τ) 1 20 ɛτ s obvousy fase and for very arge τ the condton s obvousy true because B(x, τ) 1. For a, et c be a pont n C that mnmzes x C d(x, c ). By trange nequaty, a x C satsfy w(x) C d(x, c ) w(c ). Moreover, f x C and d(x, c ) 60ɛ C then w(x) 60ɛ w(c ) 60ɛ w(x), whch mpes w(x) 120ɛ. Let G = { x C \ (H 1 H 2 ), d(x, c ) < } 60ɛ C, and et G = G. Let { H 3 = x C : d(x, c ) } 60ɛ C and H 3 = H 3. Thus G = C \ (H1 H1 H1 ) and G = S \ (H 1 H 2 H 3 ). By Markov nequaty we have H 3 120(ɛ/α)n. We say that the ponts of G are good and the ponts of H := H 1 H 2 H 3 = S \ G are bad. As we have seen so far there are not too many bad ponts: H = O( ɛ α n) a fact that we w use ater. Let B = x G B x \ C. Ceary, for a x G we have B x C B. (1) From Lemma 9 we know that f x G and y G j for j then B x B y =. Ths mpes that B B j = for j as we as that f x G then B x ntersects ony G no other G j. Let B x = {y : x B y, y B x }. We now show that for a ponts x, Bx ntersects at most one set G and no other G j for j. For x G, snce B x B x, we get the desred cam. For z S \ G we mght have B z ntersect two dfferent G and G j. However from Lemma 9 we have that for any two x G and y G j there s no z such that z B x and z B y. Ths mpes that there s no z such that we have both x B z and y B z, so for z S \ G, B z can ntersect ony one G. From above we aso have: ( ɛ ) B H 1 + H 2 + H 3 = O α n. We now cam that for any there exsts an x such that B x G G 2( B + C \ G ). (2) We frst prove that for a x G we have B x G. If τ x > 30ɛ C, then B x G. Ese, f τ x < 30ɛ C then B x 1 30 C ɛ 20 ɛ = 1.5 C > G.

7 So for every x G, we have by (1), B x G G B C \ G. Ths mpes that there exsts an x such that So, {x G : x B x } G B C \ G. {x G : x B x } B x G 2 B 2 C \ G. Snce {x G : x B x } B x B x, we get reaton (2), as desred. To fnsh the argument we need to argue that greedy coverng on B x works we. Let us thnk of each custer G as ntay unmarked, and then markng t the frst tme we ever choose a group that ntersects t. We now consder a few cases. If the jth Cj o ntersects an unmarked G, we w assgn σ(j) =. Note that f ths group msses α ponts from G, then snce we were greedy, accordng to reaton (2), we must have pcked at east α 2( B + C \ G ) eements from H n ths group. Overa, we must have (α 2( B + C \ G )) H, whch together wth B H and C \ G H mpes α 5 H. Thus tota error ncurred n ths way w.r.t. the good set G s gven by the number of ponts mssed from G, so t s at most α 5 H. The other case s when the jth group C o j ntersects a marked G. In ths case we assgn σ(j) to any arbtrary custer C not marked by the end of the process. The error ncurred from these cases s at most H + α 6 H, snce ths s an upper bound on the number of ponts eft that aren t n unmarked custers. Fnay, we need to aso consder the error wth respect to the bad set H. Addng a these up, we obtan that the tota error s bounded by 5 H + 6 H + H = 12 H = O(ɛ/αn). In the case of unknown w, we show the foowng: Theorem 11 If k og n/og og n and f the mn-sum nstance (S, d) satsfes the (1 + α, ɛ)-property even f we are not gven w, we can use Agorthm 1 as a subroutne to produce a custerng that s O(ɛ/α)-cose to the target. For the case of genera k, we can use Agorthm 1 as a subroutne to produce a st of og og n custerngs such that one of them s O(ɛ/α)-cose to the target. Proof: It s not dffcut to verfy that the argument n Theorem 10 hods (wth ony a constant factor oss n the fna guarantee on the error rate), even f we use a constant factor approxmaton for w nstead of usng the exact vaue of w n Agorthm 1. If k og n/og og n, then we can use the resuts n [15] for fndng a constant factor approxmaton for w, and thus we are abe produce a custerng that s O(ɛ/α)-cose to the target. For the case of genera k, we use the fact that there exsts an O(δ 1 og 1+δ n)-approxmaton agorthm n tme n O(1/δ) for the case of arbtrary k [9]. The man dea s to use the agorthm n [9] wth δ = 1 to fnd a ower bound an upper bound L for w that are wthn a mutpcatve O(og 2 n) factor of each other. We then try a the vaues of, 2,..., 2,... and run Agorthm 1 for each of them. One of the vaues 2 w be a 2-approxmaton for w and an argument smar to the one n Theorem 10 shows that n that case we get a custerng whch s O(ɛ/α)-cose to the target. Note: A our arguments above can be extended (wth an approprate oss n the fna accuracy guarantees) to the case where the gven dssmarty functon d satsfes ony the foowng d(x, y) γ(d(x, z) + d(z, y)) for some γ > 1. Theorem 12 If the mn-sum nstance (S, d) satsfes the (1+ α, ɛ)-property, then so ong as then so ong as the smaest correct custer has sze greater than 100ɛn/α 2 we can effcenty fnd a custerng that s O(ɛ)-cose to the target. Proof Sketch: Assume that we are gven the vaue of w. We frst use the constructon n Theorem 10 to produce a custerng C 1,..., C k wth the property that the target custerng s O(ɛ/α)-cose to the target. For each custer C we compute the center c be a pont n C that mnmzes x C d(x, c ). We then defne the set { Cs = x C, d(x, c ) < } 60ɛ C. The fact that the custers are ɛn/α 2 means that each Cs captures at east a (1 O(α))-fracton of the correspondng C. We now construct a new custerng C 1,..., C k as foows: for each pont x and each custer C j, we compute the weght w s (x, j) as y C d(x, y). We fnay nsert x nto s the custer C wth = argmn j w s (x, j). The man steps n the correctness proof are the foowng. We frst show that (up to re-ndexng of the custers) Cs G and that Cs = n O((ɛ/α)n). We then use these facts together fact that each Cs s a (1 O(α))-approxmaton to C n order to show that a but O(ɛn) ponts w make the rght choce. In the case where we do not know w, we use the technque n [7] of tryng ncreasng vaues of w: we then stop the frst tme when we output k custers that cover at east n O((ɛ/α)n of the ponts n S. 3.3 Inductve Settng In ths secton we consder an nductve mode n whch the set S s merey a sma random subset of ponts of sze n from a much arger abstract nstance space X, X = N, N n and the custerng we output custerng s represented mpcty through a hypothess h : X Y. In the case where k og n/og og n we produce a custerng of error at most O(ɛ/α). In the case where k > og n/og og n we produce a st of hypotheses, {h 1,..., h?? } such that at east one of them has error at most ɛ/α. We can adapt the agorthm n Theorem 10 to the nductve settng as shown n Agorthm 2. The man dea s to show that our agorthm from the transductve settng s

8 pretty robust, and t can survve emnatng sma custers, makng B and the set sze estmates fuzzy. Specfcay, n the case of known w we can show the foowng: Theorem 13 Assume that the mn-sum nstance (X, d) satsfes the (1 + α, ɛ)-property and that we are gven the vaue of w. If we draw a sampe S of sze n = O ( k 2 ɛ n ( )) kn 2 δ then we can use Agorthm 2 to produce a custerng whch s O(ɛ/α)-cose to the target wth probabty > 1 δ. Moreover, nsertng a new eement ony takes O(k) tme. Proof Sketch: The proof works n two phases. In the frst stage we redo the anayss of Theorem 10 to show that Agorthm 2 works as we as Agorthm 1 (up to a oss of mutpcatve constants) n producng the approxmate custerng. The dfference s that Agorthm 2 s fuzzer than Agorthm 1 n severa respects the comparsons need not be exact, and set-sze estmates are ony needed wthn a constant precson. In the second phase we observe that Agorthm 2 can be executed n the nductve settng wth hgh probabty. In partcuar, set szes can be estmated wthn the requred precson from few sampes, and for each suffcenty arge custer there s a sutabe center x n the custer such that B S x > (1 γ) B max x. Ths mpes that the resut of the executon of Agorthm 2 on the sampe s actuay the projecton of a vad executon of the agorthm on the entre nput to the sampe. Thus by the correctness of the agorthm n the transductve settng we obtan ts correctness n the nductve mode. Fnay, the correctness of the testng phase foows from the structura propertes of the custerng we proved n Theorem 10. Theorem 13 aso works f we are gven a constant factor approxmaton rather than an exact vaue for w. We now state our man resut for the case of unknown w. In the foowng, we denote by D the dameter of the metrc space,.e, D = max x,y d(x, y). Usng resuts from [14, 15] on estmatng the vaue of the optma mn-sum based on the sampe, we obtan the foowng theorem. Theorem 14 Assume that the mnsum nstance (X, d) satsfes the (1 + α, ɛ)-property and that we are not gven thew vaue of w. If we draw a sampe S of sze satsfyng both n = O ( k 2 ɛ n ( )) kn 2 δ and n = Õ ( D(k + n(1/δ))(og n + Dk 2 ) ), and f k og n/og og n then we can use Agorthm 2 as a subroutne to produce a custerng that s O(ɛ/α)-cose to the target. For the case of genera k, we can use Agorthm 2 as a subroutne to produce a st of og og n custerngs such that one of them s O(ɛ/α)-cose to the target. 4 The Correaton Custerng Probem The correaton custerng setup ntroduced n [11] s as foows. We are gven a fuy-connected graph G wth edges abeed +1 (smar) or 1 (dfferent), and the goa s to fnd a partton of the vertces nto custers that agrees as much as possbe wth the edge abes. 4 4 Note that the probem s not trva snce we mght have nconsstences. In partcuar, t s possbe to have x, y, z such that Agorthm 2 Fuzzy Mn-Sum Agorthm Input: (S, d), w, ɛ 1, α > 0, k, n, N. Tranng phase Set w = wn/n, I =, N =, γ = ɛ/k. For a x do: Let the nta threshod τ = τ 0. Construct the ba B S (x, τ) by ncudng a ponts wthn dstance τ of x. 1 If 17 ɛτ B S 1 (x, τ) 18 ɛτ and B S (x, τ) ɛn 2k then et τx S = τ and B S x = B S (x, τx S ); add x to I ese ncrease τ to the next crtca threshod For a x, et { } y : x B S y, B S y ɛn k BS x { } y : x B S y, B S y ɛn 8k. Set L =. For = 1... k do Let C o be a custer B S x of sze of at east (1 γ) of the argest B S x wth x I. Add C o to L. For x x, set B S x = B S x \ Co. Testng Phase:. When a new pont z arrves, assgn t to the custer C o whch mnmzes d(z, x ). In partcuar, the Mn-Dsagreement correaton custerng objectve (Mn-Dsagreement CC) asks to fnd a custerng C = {C 1, C 2,..., C k } the mnmzes the number of dsagreements: the number of 1 edges nsde custers pus the number of +1 edges between custers. In ths custerng formuaton one does not need to specfy the number of custers k as a separate parameter, as n measures such as k- medan or mn-sum custerng. Instead, n correaton custerng, the optma number of custers can take any vaue between 1 and n, dependng on the edge abes. The currenty best approxmaton agorthm for mnmzng dsagreements s a 2.5 approxmaton [3] and the probem s known to be APX-hard [13]. We can show that the (c, ɛ) assumpton does not make optmzng Mn-Dsagreement CC objectve easer. Theorem 15 For the Mn-Dsagreement CC objectve the probem of fndng a c-approxmaton can be reduced to the probem of fndng a c-approxmaton under the (c, ɛ) assumpton. Therefore, the probem of fndng a c-approxmaton under the (c, ɛ) assumpton s as hard as the probem of fndng a c-approxmaton n genera. We show now that f our nput satsfes the (1 + α, ɛ)- property for the Mn-Dsagreement CC objectve, then the data satsfes the (2.5, (49/α + 1)ɛ) property as we. Specfcay: the edge (x, y)s abeed +1, the edge (y, z) s abeed +1 and the (x, z) s abeed 1.

9 Theorem 16 For the Mn-Dsagreement CC objectve, f the nstance (S, d) satsfes the (1+α, ɛ)-property wth respect to the target custerng C T, then the nstance (S, d) aso satsfes the (2.5, (49/α + 1)ɛ) property wth respect to the target custerng C T. Interpretaton: Ths means that under the (1+α, ɛ)-property we can use a state of the art 2.5 approxmaton agorthm for mnmzng dsagreements n order to get a (49/α + 1)ɛ accurate custerng. Proof: We prove the contrapostve. We show that f the nstance that does not satsfy the (2.5, (49/α + 1)ɛ) property wth respect to the target custerng, then the nstance does not satsfy the (1 + α, ɛ) property wth respect to the target custerng. Reca that C s the optma Mn-Dsagreement CC custerng. Assume that the nstance (S, d) that does not satsfy the (2.5, (49/α + 1)ɛ) wth respect to the target custerng. Ths means that there exsts a custerng C = {C 1, C 2,..., C k } such that cost(c ) 2.5OPT and dst(c T, C ) (49/α + 1)ɛ; snce dst(c, C T ) ɛ we have dst(c, C ) 49ɛ α. For x S we denote by C (x) ts custer n C and by C (x) ts custer n C. We w ca a pont unnterestng f t does not change too many neghbors between the two custerngs C and C ; formay, x s unnterestng f C (x) C (x) < C (x) C (x). We show n the foowng that there are at east 49 ɛn α nterestng ponts. In order to do ths we exhbt a parta matchng of the custers n C and C ; specfcay, we connect two custers C and C j f C C j < C C j and we et π() = j. We prove now that ths s a parta matchng of the custers n C and C. Assume by contradcton that ths s not the case;.e. assume that there exst, j, k such that C C k = and C C j < C C j and Ck C j < C k C j, whch mpes C C j + C k C j < C C j + C k C j. (3) However snce C C k = we have both and whch mpes C C j + C k C j C j C C j + C k C j C j, C C j + C k C j C C j + C k C j, thus contradctng (3). Ths proves that π s a parta matchng of the custers n C and C. Let σ be an arbtrary permutaton of the whoe set S that matches a unnterestng ponts accordng to π;.e., σ s defned as σ() = j such that f x s unnterestng and C(x) = C, then C (x) = C π() = C j. By defnton we have dst σ (C, C ) dst(c, C ) 49 ɛn α, whch mpes that there exsts a set I wth at east 49ɛn/α nterestng ponts. We now compute the cost of soatng an nterestng pont x. Let us denote by w(x) the contrbuton of x to the Mn- Dsagreement CC n C and by w (x) the contrbuton of x to the Mn-Dsagreement CC n C. We ceary have w(x) + w (x) C (x) C (x) C (x) C (x), whch mpes: 2(w(x) + w (x)) max( C (x), C (x) ). So, for an nterestng pont x we get that {y : R(x, y) = +} C (x) + w(x) 3(w(x) + w (x)). So, the cost of soatng an nterestng pont x s at most 3(w(x) + w (x)). Snce cost(c ) 2.5OPT, cost(c ) = OPT and I we have: 49 ɛn α 1 I x I (w(x) + w (x)) 3.5OPT I 3.5OPTα 49ɛn. Ths mpes that for any set sze s I there exst a set A I of sze s such that 1 (w(x) + w (x)) 3.5OPTα A 49ɛn. x A Note aso that for any nterestng pont x we have therefore w(x) + w (x) 1, 49ɛn/α I x I(w(x) + w (x)) 3.5OPT, whch mpes OPT 14ɛn/α (4) Let A I of sze s = 4ɛn such that 1 (w(x) + w (x)) 3.5OPTα A 49ɛn. x A Let A s A be the set of sngeton ponts n the target custerng,.e, x A s f C(x) = {x}, and et A ns = A \ A s. We produce a new custerng C from C by soatng the ponts n A ns and by parng up the ponts n A s and mergng any two ponts n the same par. By Fact 1 have get dst(c, C ) 2ɛ, so dst(c T, C ) ɛ. Aso as shown above, the cost of soatng a the ponts n A ns s at most (10.5αOPT/(49ɛn)) A α(42/49) OPT; aso the tota cost of mergng the sngeton nterestng pars s at most A s /2 2ɛn whch by (4) s at most α(7/49) OPT. Ths mpes that the cost of soatng a the ponts n A ns pus the cost of mergng the sngeton nterestng pars s at most αopt. So the Mn-Dsagreement CC cost of C s wthn a 1 + α factor of OPT, and yet C whch s ɛ-far from the target. Thus our custerng nstance does not satsfy the (1+α, ɛ) property wth respect to the target custerng, whch s a contradcton. Ths competes the proof. Note: The other correaton custerng objectve s of maxmzng agreements, the number of +1 edges nsde custers pus the number of 1 edges between custers. For maxmzng agreements there exsts a PTAS [11], so ths objectve s not nterestng n our framework.

10 4.1 The Non Compete Graph Case In the case where the graph G s not fuy-connected, we do not get the strong resut as n Theorem 16. On the contrary, we can show the foowng: Theorem 17 For any α, β < 1/6, there exsts a famy of graphs G and target custerngs that satsfy the (1 + α, 0) property for the the Mn-Dsagreement CC objectve and yet do not satsfy even the (1 + α + β, 1/2) property for that objectve. Proof: Consder a set of n ponts such that the target custerng conssts of one custer C 1 wth n/2 ponts and one custer C 2 wth n/2 ponts. we set both C 1 and C 2 to be fuy connected wth a the edges nsde C 1 and C 2 abeed +. Aso we desgnate a snge vertex n C 1 whch s connected wth n/3 vertces n C 2 wth edges a abeed as +, and a vertex n C 2 connected wth (n/3) (1 + α + β) edges n C 1, a abeed as. It s easy to verfy that the nstance satsfes the (1 + α, 0) property; we have OPT = n 3 and any other souton has cost greater than n/3(1 + α + β). However the souton does not even satsfy the (1 + α + β, 1/2) property. The custerng wth a the ponts n one bg custer has cost 1 + α + β and yet, t s dstance from the target s 1/2. 5 Concusons and Open Questons In ths work we get around nherent napproxmabty resuts for the mn-sum objectve n the case where good approxmaton to the mn-sum objectve ndeed mpes an accurate custerng. We derve strong structura propertes from ths assumpton, and use them to gve an effcent agorthm that produce accurate custerngs. In the mnmzng dsagreements settng for correaton custerng we show that the same assumpton aows us to fnd an accurate custerng usng exstng approxmaton agorthms. One concrete open queston remanng s deang wth a non-compete graph n the context correaton custerng for the mnmzng dsagreements objectve. More generay, t woud be nterestng to further expore and anayze n ths framework other natura casses of commony used custerng objectve functons. It woud aso be nterestng to consder an agnostc verson of mode, where the (c, ɛ) property s satsfed ony after some sma number of outers or -behaved data ponts have been removed. Acknowedgments: usefu dscussons. References We thank Avrm Bum for numerous [1] D. Achoptas and F. McSherry. On spectra earnng of mxtures of dstrbutons. In Prooceedngs of the Eghteenth Annua Conference on Learnng Theory, [2] M. Ackerman and S. Ben-Davd. Whch data sets are custerabe? - a theoretca study of custerabty. In NIPS, [3] N. Aon, M. Charkar, and A. Newman. Aggregatng nconsstent nformaton: rankng and custerng. In Proceedngs of the 37th ACM Symposum on Theory of Computng, [4] N. Aon, S. Dar, M. Parnas, and D. Ron. Testng of custerng. In Proceedngs of STOC, [5] S. Arora and R. Kannan. Learnng mxtures of arbtrary gaussans. In STOC, [6] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagaa, and V. Pandt. Loca search heurstcs for k-medan and facty ocaton probems. SIAM J. Comput., 33(3), [7] M.-F. Bacan, A. Bum, and A. Gupta. Approxmate custerng wthout the approxmaton. In Proceedngs of the ACM- SIAM Symposum on Dscrete Agorthms, [8] M.-F. Bacan, A. Bum, and S. Vempaa. A dscrmantve framework for custerng va smarty functons. In STOC, [9] Y. Barta, M. Charkar, and D. Raz. Approxmatng mn-sum k-custerng n metrc spaces. In Proceedngs on 33rd Annua ACM Symposum on Theory of Computng, [10] S. Ben-Davd. A framework for statstca custerng wth constant tme approxmaton for k-medan and k-means custerng. Machne Learnng, 66(2-3), [11] A. Bum, N. Bansa, and S. Chawa. Correaton custerng. Machne Learnng, 56:89 113, [12] M. Charkar, S. Guha, E. Tardos, and D. B. Shmoy. A constant-factor approxmaton agorthm for the k-medan probem. In STOC, [13] M. Charkar, V. Guruswam, and A. Wrth. Custerng wth quatatve nformaton. In Proceedngs of the 44th Annua IEEE Symposum on Foundatons of Computer Scence (FOCS), pages , [14] A. Czumaj and C. Soher. Subnear-tme approxmaton for custerng va random sampes. In Proceedngs of the 31st Internatona Cooquum on Automata, Languages and Programmng (ICALP), [15] A. Czumaj and C. Soher. Sma space representatons for metrc mn-sum k-custerng and ther appcatons. In Proceedngs of the 24th Internatona Symposum on Theoretca Aspects of Computer Scence, [16] S. Dasgupta. Learnng mxtures of gaussans. In Proceedngs of the 40th Annua Symposum on Foundatons of Computer Scence, [17] W. Fernandez de a Vega, Marek Karpnsk, Care Kenyon, and Yuva Raban. Approxmaton schemes for custerng probems. In In Proceedngs for the Thrty-Ffth Annua ACM Symposum on Theory of Computng, [18] L. Devroye, L. Gyorf, and G. Lugos. A Probabstc Theory of Pattern Recognton. Sprnger-Verag, [19] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Cassfcaton. Wey, [20] P. Indyk. Subnear tme agorthms for metrc space probems. In STOC, [21] K. Jan, M. Mahdan, and A. Saber. A new greedy approach for facty ocaton probems. In 34th STOC, [22] T. Joachms and J. Hopcroft. Error bounds for correaton custerng. In Proceedngs of the Internatona Conference on Machne Learnng, [23] R. Kannan, H. Samasan, and S. Vempaa. The spectra method for genera mxture modes. In 18th COLT, [24] J. Kenberg. An mpossbty theorem for custerng. In NIPS, [25] A. Kumar, Y. Sabharwa, and S. Sen. A smpe near tme (1 + ɛ)-approxmaton agorthm for k-means custerng n any dmensons. In 45th FOCS, [26] M. Mea. Comparng custerngs by the varaton of nformaton. In COLT, [27] M. Mea. Comparng custerngs an axomatc vew. In Internatona Conference on Machne Learnng, [28] L.J. Schuman. Custerng for edge-cost mnmzaton. In Proceedngs of the Thrty-Second Annua ACM Symposum on Theory of Computng,, [29] S. Vempaa and G. Wang. A spectra agorthm for earnng mxture modes. JCSS, 68(2): , 2004.

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not

More information

Image Classification Using EM And JE algorithms

Image Classification Using EM And JE algorithms Machne earnng project report Fa, 2 Xaojn Sh, jennfer@soe Image Cassfcaton Usng EM And JE agorthms Xaojn Sh Department of Computer Engneerng, Unversty of Caforna, Santa Cruz, CA, 9564 jennfer@soe.ucsc.edu

More information

Associative Memories

Associative Memories Assocatve Memores We consder now modes for unsupervsed earnng probems, caed auto-assocaton probems. Assocaton s the task of mappng patterns to patterns. In an assocatve memory the stmuus of an ncompete

More information

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory Advanced Scence and Technoogy Letters Vo.83 (ISA 205), pp.60-65 http://dx.do.org/0.4257/ast.205.83.2 Research on Compex etworks Contro Based on Fuzzy Integra Sdng Theory Dongsheng Yang, Bngqng L, 2, He

More information

NP-Completeness : Proofs

NP-Completeness : Proofs NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students. Exampe: Suppose we want to bud a cassfer that recognzes WebPages of graduate students. How can we fnd tranng data? We can browse the web and coect a sampe of WebPages of graduate students of varous unverstes.

More information

Neural network-based athletics performance prediction optimization model applied research

Neural network-based athletics performance prediction optimization model applied research Avaabe onne www.jocpr.com Journa of Chemca and Pharmaceutca Research, 04, 6(6):8-5 Research Artce ISSN : 0975-784 CODEN(USA) : JCPRC5 Neura networ-based athetcs performance predcton optmzaton mode apped

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Finding Dense Subgraphs in G(n, 1/2)

Finding Dense Subgraphs in G(n, 1/2) Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

More information

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION European Journa of Mathematcs and Computer Scence Vo. No. 1, 2017 ON AUTOMATC CONTNUTY OF DERVATONS FOR BANACH ALGEBRAS WTH NVOLUTON Mohamed BELAM & Youssef T DL MATC Laboratory Hassan Unversty MORO CCO

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle

Lower bounds for the Crossing Number of the Cartesian Product of a Vertex-transitive Graph with a Cycle Lower bounds for the Crossng Number of the Cartesan Product of a Vertex-transtve Graph wth a Cyce Junho Won MIT-PRIMES December 4, 013 Abstract. The mnmum number of crossngs for a drawngs of a gven graph

More information

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory Proceedngs of the 2009 IEEE Internatona Conference on Systems Man and Cybernetcs San Antono TX USA - October 2009 Mutspectra Remote Sensng Image Cassfcaton Agorthm Based on Rough Set Theory Yng Wang Xaoyun

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Nested case-control and case-cohort studies

Nested case-control and case-cohort studies Outne: Nested case-contro and case-cohort studes Ørnuf Borgan Department of Mathematcs Unversty of Oso NORBIS course Unversty of Oso 4-8 December 217 1 Radaton and breast cancer data Nested case contro

More information

L-Edge Chromatic Number Of A Graph

L-Edge Chromatic Number Of A Graph IJISET - Internatona Journa of Innovatve Scence Engneerng & Technoogy Vo. 3 Issue 3 March 06. ISSN 348 7968 L-Edge Chromatc Number Of A Graph Dr.R.B.Gnana Joth Assocate Professor of Mathematcs V.V.Vannaperuma

More information

On the Power Function of the Likelihood Ratio Test for MANOVA

On the Power Function of the Likelihood Ratio Test for MANOVA Journa of Mutvarate Anayss 8, 416 41 (00) do:10.1006/jmva.001.036 On the Power Functon of the Lkehood Rato Test for MANOVA Dua Kumar Bhaumk Unversty of South Aabama and Unversty of Inos at Chcago and Sanat

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

A finite difference method for heat equation in the unbounded domain

A finite difference method for heat equation in the unbounded domain Internatona Conerence on Advanced ectronc Scence and Technoogy (AST 6) A nte derence method or heat equaton n the unbounded doman a Quan Zheng and Xn Zhao Coege o Scence North Chna nversty o Technoogy

More information

Facility Location with Service Installation Costs

Facility Location with Service Installation Costs Facty Locaton wth Servce Instaaton Costs (Extended Abstract) Davd B. Shmoys Chatanya Swamy Retsef Lev Abstract We consder a generazaton of the uncapactated facty ocaton probem whch we ca Facty Locaton

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

Lecture 4: Constant Time SVD Approximation

Lecture 4: Constant Time SVD Approximation Spectral Algorthms and Representatons eb. 17, Mar. 3 and 8, 005 Lecture 4: Constant Tme SVD Approxmaton Lecturer: Santosh Vempala Scrbe: Jangzhuo Chen Ths topc conssts of three lectures 0/17, 03/03, 03/08),

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Maximizing the number of nonnegative subsets

Maximizing the number of nonnegative subsets Maxmzng the number of nonnegatve subsets Noga Alon Hao Huang December 1, 213 Abstract Gven a set of n real numbers, f the sum of elements of every subset of sze larger than k s negatve, what s the maxmum

More information

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks Shengyang Sun, Changyou Chen, Lawrence Carn Suppementary Matera: Learnng Structured Weght Uncertanty n Bayesan Neura Networks Shengyang Sun Changyou Chen Lawrence Carn Tsnghua Unversty Duke Unversty Duke

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

CLUSTERING UNDER PERTURBATION RESILIENCE

CLUSTERING UNDER PERTURBATION RESILIENCE CLUSTERING UNDER PERTURBATION RESILIENCE MARIA FLORINA BALCAN AND YINGYU LIANG Abstract. Motvated by the fact that dstances between data ponts n many real-world clusterng nstances are often based on heurstc

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach Achevng Optma Throughput Utty and Low Deay wth SMA-ke Agorthms: A Vrtua Mut-hanne Approach Po-Ka Huang, Student Member, IEEE, and Xaojun Ln, Senor Member, IEEE Abstract SMA agorthms have recenty receved

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

COXREG. Estimation (1)

COXREG. Estimation (1) COXREG Cox (972) frst suggested the modes n whch factors reated to fetme have a mutpcatve effect on the hazard functon. These modes are caed proportona hazards (PH) modes. Under the proportona hazards

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Inthem-machine flow shop problem, a set of jobs, each

Inthem-machine flow shop problem, a set of jobs, each THE ASYMPTOTIC OPTIMALITY OF THE SPT RULE FOR THE FLOW SHOP MEAN COMPLETION TIME PROBLEM PHILIP KAMINSKY Industra Engneerng and Operatons Research, Unversty of Caforna, Bereey, Caforna 9470, amnsy@eor.bereey.edu

More information

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2 Note 2 Lng fong L Contents Ken Gordon Equaton. Probabty nterpretaton......................................2 Soutons to Ken-Gordon Equaton............................... 2 2 Drac Equaton 3 2. Probabty nterpretaton.....................................

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

A General Column Generation Algorithm Applied to System Reliability Optimization Problems

A General Column Generation Algorithm Applied to System Reliability Optimization Problems A Genera Coumn Generaton Agorthm Apped to System Reabty Optmzaton Probems Lea Za, Davd W. Cot, Department of Industra and Systems Engneerng, Rutgers Unversty, Pscataway, J 08854, USA Abstract A genera

More information

Lower Bounding Procedures for the Single Allocation Hub Location Problem

Lower Bounding Procedures for the Single Allocation Hub Location Problem Lower Boundng Procedures for the Snge Aocaton Hub Locaton Probem Borzou Rostam 1,2 Chrstoph Buchhem 1,4 Fautät für Mathemat, TU Dortmund, Germany J. Faban Meer 1,3 Uwe Causen 1 Insttute of Transport Logstcs,

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

REAL ANALYSIS I HOMEWORK 1

REAL ANALYSIS I HOMEWORK 1 REAL ANALYSIS I HOMEWORK CİHAN BAHRAN The questons are from Tao s text. Exercse 0.0.. If (x α ) α A s a collecton of numbers x α [0, + ] such that x α

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

arxiv: v1 [math.co] 1 Mar 2014

arxiv: v1 [math.co] 1 Mar 2014 Unon-ntersectng set systems Gyula O.H. Katona and Dánel T. Nagy March 4, 014 arxv:1403.0088v1 [math.co] 1 Mar 014 Abstract Three ntersecton theorems are proved. Frst, we determne the sze of the largest

More information

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA RESEARCH ARTICLE MOELING FIXE OS BETTING FOR FUTURE EVENT PREICTION Weyun Chen eartment of Educatona Informaton Technoogy, Facuty of Educaton, East Chna Norma Unversty, Shangha, CHINA {weyun.chen@qq.com}

More information

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7

Stanford University CS254: Computational Complexity Notes 7 Luca Trevisan January 29, Notes for Lecture 7 Stanford Unversty CS54: Computatonal Complexty Notes 7 Luca Trevsan January 9, 014 Notes for Lecture 7 1 Approxmate Countng wt an N oracle We complete te proof of te followng result: Teorem 1 For every

More information

Cyclic Codes BCH Codes

Cyclic Codes BCH Codes Cycc Codes BCH Codes Gaos Feds GF m A Gaos fed of m eements can be obtaned usng the symbos 0,, á, and the eements beng 0,, á, á, á 3 m,... so that fed F* s cosed under mutpcaton wth m eements. The operator

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Clustering gene expression data & the EM algorithm

Clustering gene expression data & the EM algorithm CG, Fall 2011-12 Clusterng gene expresson data & the EM algorthm CG 08 Ron Shamr 1 How Gene Expresson Data Looks Entres of the Raw Data matrx: Rato values Absolute values Row = gene s expresson pattern

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Hardness of Learning Halfspaces with Noise

Hardness of Learning Halfspaces with Noise Hardness of Learnng Halfspaces wth Nose Venkatesan Guruswam Prasad Raghavendra Department of Computer Scence and Engneerng Unversty of Washngton Seattle, WA 98195 Abstract Learnng an unknown halfspace

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Journal of Multivariate Analysis

Journal of Multivariate Analysis Journa of Mutvarate Anayss 3 (04) 74 96 Contents sts avaabe at ScenceDrect Journa of Mutvarate Anayss journa homepage: www.esever.com/ocate/jmva Hgh-dmensona sparse MANOVA T. Tony Ca a, Yn Xa b, a Department

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06 Dervng the Dua Prof. Bennett Math of Data Scence /3/06 Outne Ntty Grtty for SVM Revew Rdge Regresson LS-SVM=KRR Dua Dervaton Bas Issue Summary Ntty Grtty Need Dua of w, b, z w 2 2 mn st. ( x w ) = C z

More information

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach

Achieving Optimal Throughput Utility and Low Delay with CSMA-like Algorithms: A Virtual Multi-Channel Approach IEEE/AM TRANSATIONS ON NETWORKING, VOL. X, NO. XX, XXXXXXX 20X Achevng Optma Throughput Utty and Low Deay wth SMA-ke Agorthms: A Vrtua Mut-hanne Approach Po-Ka Huang, Student Member, IEEE, and Xaojun Ln,

More information

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem

n-step cycle inequalities: facets for continuous n-mixing set and strong cuts for multi-module capacitated lot-sizing problem n-step cyce nequates: facets for contnuous n-mxng set and strong cuts for mut-modue capactated ot-szng probem Mansh Bansa and Kavash Kanfar Department of Industra and Systems Engneerng, Texas A&M Unversty,

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness.

20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The first idea is connectedness. 20. Mon, Oct. 13 What we have done so far corresponds roughly to Chapters 2 & 3 of Lee. Now we turn to Chapter 4. The frst dea s connectedness. Essentally, we want to say that a space cannot be decomposed

More information

Discriminating Fuzzy Preference Relations Based on Heuristic Possibilistic Clustering

Discriminating Fuzzy Preference Relations Based on Heuristic Possibilistic Clustering Mutcrtera Orderng and ankng: Parta Orders, Ambgutes and Apped Issues Jan W. Owsńsk and aner Brüggemann, Edtors Dscrmnatng Fuzzy Preerence eatons Based on Heurstc Possbstc Custerng Dmtr A. Vattchenn Unted

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

A 2D Bounded Linear Program (H,c) 2D Linear Programming

A 2D Bounded Linear Program (H,c) 2D Linear Programming A 2D Bounded Lnear Program (H,c) h 3 v h 8 h 5 c h 4 h h 6 h 7 h 2 2D Lnear Programmng C s a polygonal regon, the ntersecton of n halfplanes. (H, c) s nfeasble, as C s empty. Feasble regon C s unbounded

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

THE METRIC DIMENSION OF AMALGAMATION OF CYCLES

THE METRIC DIMENSION OF AMALGAMATION OF CYCLES Far East Journa of Mathematca Scences (FJMS) Voume 4 Number 00 Pages 9- Ths paper s avaabe onne at http://pphm.com/ournas/fms.htm 00 Pushpa Pubshng House THE METRIC DIMENSION OF AMALGAMATION OF CYCLES

More information

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry

Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry Quantum Runge-Lenz ector and the Hydrogen Atom, the hdden SO(4) symmetry Pasca Szrftgser and Edgardo S. Cheb-Terrab () Laboratore PhLAM, UMR CNRS 85, Unversté Le, F-59655, France () Mapesoft Let's consder

More information

Graph Reconstruction by Permutations

Graph Reconstruction by Permutations Graph Reconstructon by Permutatons Perre Ille and Wllam Kocay* Insttut de Mathémathques de Lumny CNRS UMR 6206 163 avenue de Lumny, Case 907 13288 Marselle Cedex 9, France e-mal: lle@ml.unv-mrs.fr Computer

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Affine transformations and convexity

Affine transformations and convexity Affne transformatons and convexty The purpose of ths document s to prove some basc propertes of affne transformatons nvolvng convex sets. Here are a few onlne references for background nformaton: http://math.ucr.edu/

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization Journa of Machne Learnng Research 18 17 1-5 Submtted 9/16; Revsed 1/17; Pubshed 1/17 A Genera Dstrbuted Dua Coordnate Optmzaton Framework for Reguarzed Loss Mnmzaton Shun Zheng Insttute for Interdscpnary

More information

Communication Complexity 16:198: February Lecture 4. x ij y ij

Communication Complexity 16:198: February Lecture 4. x ij y ij Communcaton Complexty 16:198:671 09 February 2010 Lecture 4 Lecturer: Troy Lee Scrbe: Rajat Mttal 1 Homework problem : Trbes We wll solve the thrd queston n the homework. The goal s to show that the nondetermnstc

More information

Sampling Self Avoiding Walks

Sampling Self Avoiding Walks Samplng Self Avodng Walks James Farbanks and Langhao Chen December 3, 204 Abstract These notes present the self testng algorthm for samplng self avodng walks by Randall and Snclar[3] [4]. They are ntended

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

HMMT February 2016 February 20, 2016

HMMT February 2016 February 20, 2016 HMMT February 016 February 0, 016 Combnatorcs 1. For postve ntegers n, let S n be the set of ntegers x such that n dstnct lnes, no three concurrent, can dvde a plane nto x regons (for example, S = {3,

More information

Clustering Affine Subspaces: Algorithms and Hardness

Clustering Affine Subspaces: Algorithms and Hardness Clusterng Affne Subspaces: Algorthms and Hardness Thess by Euwoong Lee In Partal Fulfllment of the Requrements for the Degree of Master of Scence Calforna Insttute of Technology Pasadena, Calforna 01 (Submtted

More information

Single-Source/Sink Network Error Correction Is as Hard as Multiple-Unicast

Single-Source/Sink Network Error Correction Is as Hard as Multiple-Unicast Snge-Source/Snk Network Error Correcton Is as Hard as Mutpe-Uncast Wentao Huang and Tracey Ho Department of Eectrca Engneerng Caforna Insttute of Technoogy Pasadena, CA {whuang,tho}@catech.edu Mchae Langberg

More information

Formulas for the Determinant

Formulas for the Determinant page 224 224 CHAPTER 3 Determnants e t te t e 2t 38 A = e t 2te t e 2t e t te t 2e 2t 39 If 123 A = 345, 456 compute the matrx product A adj(a) What can you conclude about det(a)? For Problems 40 43, use

More information

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel

Interference Alignment and Degrees of Freedom Region of Cellular Sigma Channel 2011 IEEE Internatona Symposum on Informaton Theory Proceedngs Interference Agnment and Degrees of Freedom Regon of Ceuar Sgma Channe Huaru Yn 1 Le Ke 2 Zhengdao Wang 2 1 WINLAB Dept of EEIS Unv. of Sc.

More information

A new construction of 3-separable matrices via an improved decoding of Macula s construction

A new construction of 3-separable matrices via an improved decoding of Macula s construction Dscrete Optmzaton 5 008 700 704 Contents lsts avalable at ScenceDrect Dscrete Optmzaton journal homepage: wwwelsevercom/locate/dsopt A new constructon of 3-separable matrces va an mproved decodng of Macula

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Appendix for An Efficient Ascending-Bid Auction for Multiple Objects: Comment For Online Publication

Appendix for An Efficient Ascending-Bid Auction for Multiple Objects: Comment For Online Publication Appendx for An Effcent Ascendng-Bd Aucton for Mutpe Objects: Comment For Onne Pubcaton Norak Okamoto The foowng counterexampe shows that sncere bddng by a bdders s not aways an ex post perfect equbrum

More information

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 493 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces you have studed thus far n the text are real vector spaces because the scalars

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Spectral Graph Theory and its Applications September 16, Lecture 5

Spectral Graph Theory and its Applications September 16, Lecture 5 Spectral Graph Theory and ts Applcatons September 16, 2004 Lecturer: Danel A. Spelman Lecture 5 5.1 Introducton In ths lecture, we wll prove the followng theorem: Theorem 5.1.1. Let G be a planar graph

More information