Finding low error clusterings

Size: px

Start display at page:

Download "Finding low error clusterings"

Violet Quinn
6 years ago
Views:

1 Fndng ow error custerngs Mara-Forna Bacan Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA Mark Braverman Mcrosoft Research, New Engand One Memora Drve, Cambrdge, MA Abstract A common approach for sovng custerng probems s to desgn agorthms to approxmatey optmze varous objectve functons (e.g., k-means or mn-sum) defned n terms of some gven parwse dstance or smarty nformaton. However, n many earnng motvated custerng appcatons (such as custerng protens by functon) there s some unknown target custerng; n such cases the parwse nformaton s merey based on heurstcs and the rea goa s to acheve ow error on the data. In these settngs, an arbtrary c-approxmaton agorthm for some objectve woud work we ony f any c-approxmaton to that objectve s cose to the target custerng. In recent work, Bacan et. a [7] have shown how both for the k-means and k-medan objectves ths property aows one to produce custerngs of ow error, even for vaues c such that gettng a c-approxmaton to these objectve functons s provaby NP-hard. In ths paper we anayze the mn-sum objectve from ths perspectve. Whe [7] aso consdered the mn-sum probem, the resuts they derved for ths objectve were substantay weaker. In ths work we derve new and more subte structura propertes for mn-sum n ths context and use these to desgn effcent agorthms for producng accurate custerngs, both n the transductve and n the nductve case. We aso anayze the correaton custerng probem from ths perspectve, and pont out nterestng dfferences between ths objectve and k-medan, k-means, or mn-sum objectves. 1 Introducton Probems of custerng data from parwse dstance or smarty nformaton are ubqutous n scence. A common approach for sovng such probems s to vew the data ponts as nodes n a weghted graph (wth the weghts based on the gven parwse nformaton), and then to desgn agorthms to optmze varous objectve functons such as k-means or mn-sum. For exampe, n the mn-sum custerng approach the goa s to produce a partton nto a gven number of custers k that mnmzes the sum of the ntracuster dstances. Many of the optmzaton probems correspondng to commony anayzed objectves (ncudng k-means, mnsum, k-medan, or correaton custerng) are NP-hard and so the focus n the theory communty has been n desgnng approxmaton agorthms for these objectves. 1 For exampe the best known approxmaton agorthm for the k- medan probem s a (3 + ɛ)-approxmaton [6], whe the best approxmaton for the mn-sum probem n genera metrc spaces s a O(og 1+δ n)-approxmaton. For many of these probems the approxmaton guarantees do not match the known hardness resuts, and sgnfcant effort s spent on obtanng tghter approxmaton guarantees and hardness resuts [3, 6, 9, 11, 13, 15, 12, 17, 21, 25, 28]. Standard custerng settngs used to motvate much of ths effort ncude probems such as custerng protens by functon, mages by subject, or documents by topc. In many of these settngs there s some unknown correct target custerng and the mpct hope s that approxmatey optmzng objectve functons such as those mentoned above w n fact produce a custerng of ow error,.e. a custerng whch agrees wth the truth on most of the ponts. In other words, mpct n takng the approxmaton-agorthms approach s the hope that any c-approxmaton to our gven objectve w be pontwse cose to the true answer, and our motvaton for mprovng a c 2 -approxmaton to a c 1 -approxmaton (for c 1 < c 2 ) s that perhaps ths coseness property hods for c 1 but not c 2. In recent work, Bacan et. a [7] have shown that f we make ths mpct assumpton expct, then one can get accurate custerngs even n cases where gettng a good approxmaton to these objectve functons s provaby NP-hard. In partcuar, say that a data set satsfes the (c, ɛ) property for some objectve functon Φ f any c-approxmaton to Φ on ths data must be ɛ-cose to the target custerng. [7] show that for any c = 1 + α > 1, f data satsfes the (c, ɛ) property for the k-medan (or k-means) objectves, then one can produce custerngs that are O(ɛ)-cose to the target, even for vaues c for whch obtanng a c-approxmaton s NP-hard. [7] aso consder the mn-sum objectve, however the resuts they present work ony for vaues of c > 2 and under the 1 A β-approxmaton agorthm for objectve φ s an agorthm that runs n poynoma tme and returns a souton whose vaue s wthn a mutpcatve β factor of the optma souton for the gven objectve φ.

2 assumpton that a the target custers are arge. 1.1 Our Resuts In ths work we sove the probem of gettng accurate custerngs for the mn-sum objectve under the (c, ɛ)-assumpton, mprovng on the resuts of Bacan et. a [7] for ths objectve n mutpe respects. In partcuar, we show t s possbe to dea wth any constant c = 1 + α > 1 (and not ony c > 2). More mportanty we are aso abe to dea wth the presence of sma target custers. To acheve ths we derve new and much more subte structura propertes mped by the (c, ɛ)-assumpton. In the case where k s sma compared to og n/ og og n we output a snge custerng whch s O(ɛ/α)-cose to the target, whe n the genera case our agorthm outputs a sma st of custerngs wth the property that the target custerng s cose to one of those n the st. We show that the agorthm we deveop for the mn-sum objectve s robust, whch aows us to extend t to the nductve mode. In the nductve mode S s merey a sma random subset of ponts from a much arger abstract nstance space X, and our goa s to produce a hypothess h : X Y whch mpcty represents a custerng of the whoe space X and whch has ow error on the whoe X. An appeang characterstc of the agorthm we obtan for the nductve case s that the nserton of new ponts (whch arrve onne) s extremey effcent: we ony need O(k)-comparsons for assgnng a new pont x to one of the custers. We further show that f we do requre the custers to be arge, we can reduce the approxmaton error from O(ɛ/α) down to O(ɛ) the best one can hope for. We thus affrmatvey answer severa open questons n [7]. We aso anayze the correaton custerng probem n ths framework. In correaton custerng, the nput s a graph wth edges abeed +1 or 1 and the goa s to fnd a partton of the nodes that best matches the sgns of the edges. Ths custerng formuaton was ntroduced by Bansa et. a n [11] and t has been extensvey studed n a seres of foow-up papers both n the theoretca computer scence and n the machne earnng communty [3, 13, 22]. In the orgna paper Bansa et a. [11] consdered two versons of the correaton custerng probem, mnmzng dsagreements and maxmzng agreements. 2 In ths paper we focus on the mnmzng dsagreements objectve functon. (The maxmzng agreement verson of correaton custerng s ess nterestng n our framework snce t admts a PTAS 3.) We show that ths objectve behaves much better than objectves such as k-medan, k-means, and mn-sum n terms of error rate. More specfcay, we show that for ths objectve, the (1 + α, ɛ) property mpes a (2.5, O(ɛ/α)) property, so one can use a state-of-the-art 2.5-approxmaton agorthm for 2 In the former case, the goa s to mnmze the number of 1 edges nsde custers pus the number of +1 edges between custers, whe n the atter case the goa s to maxmze the number of +1 edges nsde the custer pus the number of 1 edges between. These are equvaent at optmaty but dffer n ther dffcuty of approxmaton. 3 A PTAS (poynoma-tme approxmaton scheme) s an agorthm that for any gven fxed ɛ runs n poynoma tme and returns an approxmaton wthn a 1 + ɛ factor. Runnng tme may depend exponentay (or worse) on 1/ɛ, however mnmzng dsagreements n order to get an accurate custerng. Ths contrasts sharpy wth the prevous resuts proven n ths context for objectves such as mn-sum, k-medan, or k-means. Our work shows how for a custerng objectve such as mn-sum we can obtan resuts comparabe to what one coud obtan by beng abe to approxmate the objectve to an arbtrary sma constant. In other words f what we reay want s to obtan a custerng of ow error, then by makng mpct assumptons expct we can obtan ow error custerngs even n cases where gettng a c-approxmaton to the mn-sum objectve s NP-hard. Ths ponts out how one can get much better resuts than those obtaned so far n the approxmaton agorthms terature by wsey usng a the avaabe nformaton for the probem at hand. 1.2 Reated Work Work on approxmaton agorthms: We revew n the foowng state of the art resuts on approxmaton agorthms for the two custerng objectves we dscuss n ths paper. Mn-sum k-custerng on genera metrc spaces admts a PTAS for the case of constant k by Fernandez de a Vega et a. [17] (see aso [20]). For the case of arbtrary k there s an O(δ 1 og 1+δ n)-approxmaton agorthm that runs n tme n O(1/δ) due to Barta et a. [9]. The probem has aso been studed n geometrc spaces for constant k by Schuman [28] who gave an agorthm for (R d, 2 2) that ether outputs a (1 + ɛ)-approxmaton, or a souton that agrees wth the optmum custerng on (1 ɛ)-fracton of the ponts (but coud have much arger cost than optmum); the runtme s O(n og og n ) n the worst case and near for subogarthmc dmenson d. More recenty, Czumaj and Soher have deveoped a (4 + ɛ)-approxmaton agorthm for the case when k s sma compared to og n/ og og n [15]. Correaton Custerng was ntroduced by Bansa et. a n [11]. In the orgna paper Bansa et a. [11] have consdered two versons of the correaton custerng probem, mnmzng dsagreements and maxmzng agreements, focusng many on the case when the graph G s compete. They gave a poynoma tme approxmaton scheme (PTAS) for the maxmzng agreements verson on compete graphs, whe for the mnmzng dsagreements versons, they gave an approxmaton agorthm wth a constant performance rato. The constant was a rather arge one, and t has subsequenty mproved to 4 n [13] and then to 2.5 n [3]. In the case when the graph s not compete, the best known approxmaton s O(og n) [13]. Other work on Custerng: Our work s most reevant for settngs where there s a target custerng and t s motvated by resuts n [8] whch have nvestgated the goa of approxmatng a desred target custerng wthout makng any probabstc assumptons. In addton to ths, there has been sgnfcant work n machne earnng and theoretca computer scence on custerng or earnng wth mxture modes [1, 5, 19, 18, 23, 29, 16]. That work, ke ours, has an expct noton of a correct ground-truth custerng of the data ponts; however, t makes very specfc probabstc assumptons about the data. There s a arge body of other work whch does not assume the exstence of a target custerng. For exampe there

3 has been work on axomatzng custerng (n the sense of postuatng what natura axoms shoud a good custerng agorthm or quaty measure satsfy), both wth possbty [2] and mpossbty [24] resuts, on comparng custerngs [26, 27], and on effcenty testng f a gven data set has a custerng satsfyng certan propertes [4]. The man dfference between ths type of work and our work s that we have an expct noton of a correct ground-truth custerng of the data ponts, and ndeed the resuts we are tryng to prove are qute dfferent. Inductve Settng: In the nductve settng, where we magne our gven data s ony a sma random sampe of the entre data set, our framework s cose n sprt to recent work done on sampe-based custerng (e.g., [10, 14]) n the context of custerng agorthms desgned to optmze a certan objectve. Based on such a sampe, these agorthms have to output a custerng of the fu doman set, that s evauated wth respect to the underyng dstrbuton. 2 Defntons and Premnares The custerng probems n ths paper fa nto the foowng genera framework. We are gven a set S of n ponts whch we want to custer. We are aso gven a parwse smarty and/or dssmarty nformaton expressed through a weghted graph (G, d) on S. A k-custerng C s a partton of S nto k sets C 1, C 2,..., C k. In ths paper, we aways assume that there s a true or target k-custerng C T for the pont set S. A natura noton of dstance between two k-custerngs C = {C 1, C 2,..., C k } and C = {C 1, C 2,..., C k } whch we use throughout the paper s the fracton of ponts on whch they dsagree under the optma matchng of custers n C to custers n C ;.e., we defne dst(c, C 1 ) = mn σ S k n k C C σ(), =1 where S k s the set of bjectons σ : [k] [k]. We say that two custerngs C and C are ɛ-cose f dst(c, C ) < ɛ and we say that a custerng has error ɛ f t s ɛ-cose to the target. We can aso defne the dstance dst(c, C ) between two custerngs C = {C 1, C 2,..., C k1 } and C = {C 1, C 2,..., C k 2 } wth a dfferent number of custers k 1 and k 2 where k 1 > k 2 by smpy extendng the custerng C wth a few empty custers and then usng the noton of dstance defned above. We w now state a usefu fact about the dstance between two custerngs whch we use throughout the paper and whch s a smpe consequence of the defnton: Fact 1 Gven two custerngs C and C, f we produce a st L of dsjont subsets S 1, S 2,..., such that for each, a ponts n S are n the same custer n one of C or C and they are a n dfferent custers n the other, then C and C must have dstance at east 1 n ( S 1). In many cases we w use Fact 1 on sets {S } of sze 2. We consder two commony used custerng agorthms whch seek to mnmze some objectve functon or score. Mn-sum custerng The frst one s the mn-sum custerng probem [17, 9]. Here d : ( S 2) R 0 s a dstance functon and the goa s to fnd a custerng that mnmzes Φ Σ := k =1 x,y C d(x, y). In ths paper we focus on the case where d satsfes the trange nequaty, and we aso dscuss a few extensons of ths condton. Correaton custerng The second custerng setup we anayze s correaton custerng ntroduced n [11]. In ths settng the graph G s fuy connected wth edges (x, y) abeed d(x, y) = +1 (smar) or d(x, y) = 1 (dfferent). The goa s to fnd a partton of the vertces nto custers that agrees as much as possbe wth the edge abes. In partcuar, the Mn-Dsagreement correaton custerng objectve (Mn-Dsagreement CC) asks to fnd a custerng C = {C 1, C 2,..., C k } to mnmze the objectve functon the number of dsagreements (the number of 1 edges nsde custers pus the number of +1 edges between custers): Φ CC := #{x, y C : d(x, y) = } + #{x C, y C j, j : d(x, y) = +}. Note that n the correaton custerng settng, the target number of custers s not specfed as part of the nput. Gven a functon Φ (such as k-medan or mn-sum) and nstance (S, d), et OPT Φ = mn Φ(C), C where the mnmum s over a k-custerngs of (S, d). The (c, ɛ)-property The foowng noton orgnay ntroduced n [7] s centra to our dscusson: Defnton 2 Gven an objectve functon Φ (such as k-medan or mn-sum), and c = 1+α > 1, ɛ > 0, we say that nstance (S, d) satsfes the (c, ɛ)-property for Φ f a custerngs C wth Φ(C) c OPT Φ are ɛ-cose to the target custerng C T for (S, d). Note that for any c > 1, the (c, ɛ)-property does not requre that the target custerng C T exacty concde wth the optma custerng C under objectve Φ. However, t does mpy the foowng smpe facts: Fact 3 If (S, d) satsfes the (c, ɛ)-property for Φ, then: (a) The target custerng C T, and the optma custerng C are ɛ-cose. (b) The dstance between k-custerngs s a metrc, and hence a (c, ɛ) property wth respect to the target custerng C T mpes a (c, 2ɛ) property wth respect to the optma custerng C. Thus, we can act as f the optma custerng s ndeed the target up to a constant factor oss n the error rate. For smpcty, we w assume throughout the paper (except n Secton 4) that C T s ndeed the optma custerng C.

4 3 The Mn-sum Custerng Probem Reca that the mn-sum k-custerng probem asks to fnd a k-custerng C = {C 1, C 2,..., C k } to mnmze the objectve functon: k Φ(C) = 2 d(x, y). =1 x,y C We focus here on the case where d : ( S 2) R 0 s a dstance functon satsfyng the trange nequaty. As shown n [7] we have the foowng: Theorem 4 [7] For any 1 c 1 < c 2, any ɛ, δ > 0, there exsts a famy of metrc spaces G and target custerngs that satsfy the (c 1, ɛ) property for the mn-sum objectve and yet do not satsfy even the (c 2, 1/2 δ) property for that objectve. So, n the mn-sum objectve case t s not the case that f the data satsfes the (c 1, ɛ) property, then then we can use c 2 approxmaton agorthm n order to get a custerng of sma error rate, for some c 2 > c 1. [7] aso shows the foowng: Theorem 5 For the mn-sum objectve the probem of fndng a c-approxmaton can be reduced to the probem of fndng a c-approxmaton under the (c, ɛ) assumpton. Therefore, the probem of fndng a c-approxmaton under the (c, ɛ) assumpton s as hard as the probem of fndng a c- approxmaton n genera. Theorem 5 means that, generay speakng, the (c, ɛ) assumpton does not make optmzng mn-sum easer. Genera overvew of our constructon. The genera dea for our constructon s to obtan varous structura propertes for nstances that satsfy the (c, ɛ) assumpton, and then to use these propertes to gve an effcent agorthm for achevng ow error custerngs. These structura propertes are essenta snce as mentoned above the genera mn-sum custerng probem s APX-hard. The structura propertes stem from the fact that under the the (c, ɛ) assumpton the optma souton s fary stabe : changng t a tte ncreases the cost substantay. The frst key property (Lemma 6) we prove usng ths stabty s that most pars of custers are qute expensve to merge. Usng a vertex-cover argument on a specay desgned graph on custers, we show that we can remove few (O(ɛn)) ponts such that no two custers among the remanng ponts are cheap to merge. Next we show that for most of the remanng ponts x we can draw a ba B x of an approprate radus that essentay covers the custer C (x). Such bas B x and B y usuay do not overap when C (x) C (y) (Lemma 9) snce such an overap woud mean that x and y are suffcenty cose to merge C (x) wth C (y) cheapy, eadng to a contradcton. We desgnate a cass of good ponts for whch the above s true. A but O(ɛn/α) ponts are good. We ntroduce subsets B x B x to make sure that the ba around each x (whether good or bad) contans ony good ponts from at most one custer. At the same tme, B x centered around a good pont x st covers the buk of the custer C (x). The agorthm then uses a greedy coverng usng the { B x } x S to perform the actua custerng. The anayss of the custerng produced s done usng a carefu chargng argument. It shows that the fna custerng s O(ɛn/α) cose to the optma. Whe the anayss s qute nvoved, the custerng agorthm tsef s smpe, robust, and effcent. Ths smpcty and robustness aows us to extend t to the nductve settng. 3.1 Propertes of Mn-Sum Custerng We start by dervng a few structura propertes mped by the (1 + α, ɛ)-property for mn-sum. We emphasze that a the constructons n ths subsecton are for the anayss purpose; our agorthmc resuts are descrbed n Secton 3.2. Reca that C denotes the optma custerng. For x C, defne w(x) = y C d(x, y) and et w = avg x w(x) = OPT n. We start by creatng a badness graph G = (V, E) on the set of custers, connectng pars of custers that are not too expensve to merge. Formay, V s the set {C1,..., Ck } and for any two custers, C and C j we add an edge e between them f the addtona cost ncurred for mergng them s at most ( C + Cj ) wα 2ɛ. Lemma 6 Assume that the mn-sum nstance (S, d) satsfes the (1 + α, ɛ)-property wth respect to the target custerng. Then we can remove < 3ɛn ponts such that the remanng set of custers form an ndependent set n G. Proof: We start by showng that f the mn-sum nstance (S, d) satsfes the (1 + α, ɛ)-property wth respect to the target custerng, then there cannot be a coecton of dsjont custer pars (C 1, Cj 1 ), (C 2, Cj 2 ),..., (C r, Cj r ) such that C and Cj are connected n G and ( C + Cj ) 3ɛn. Let C and Cj be two custers such that the addtona cost ncurred for mergng them s at most ( C + Cj ) wα 2ɛ. Assume w..o.g. that C Cj. We show now that for any set sze s there exsts a subset A s of Cj of sze s whch we can move from Cj to C at an addtona cost n the mn-sum objectve of at most A s wα ɛ. Frst note that d(x, y) Cj ɛ. y C x C j Let α(x) = y C d(x, y). So x C j α(x) C j ɛ. Hence, for any set sze s = ɛ n we can seect a subset A s of of Cj (namey the frst s eements wth the smaest vaues of α(x)) such that x A s α(x) A s ɛ.

5 Ths mpes that for any set sze s there exsts a subset A s of Cj whch we can move from Cj to C at an addtona cost n the mn-sum objectve of at most A s wα ɛ, as desred. Assume that there exsts r and a coecton of dsjont pars (C 1, Cj 1 ), (C 2, Cj 2 ),..., (C r, Cj r ) such that C Cj, C and Cj are connected n G, and ( C + Cj ) 3ɛn. Let A C j of sze Snce and we have that ɛ n = mn(ɛn s< ɛ s n, max( C, C j /2)). max( C, C j /2) 1 3 ( C + C j ) ( C + Cj ) 3ɛn A = ɛ n = ɛn. Let C be the custerng obtaned by movng A from Cj to C for = 1,..., r. Each movement of a set A from Cj to C ncreases the dstance between C and C by ɛ. To prove ths we use Fact 1 and do a case anayss. If ɛ n = C, then we match up each pont x n C wth pont y n A and we defne the set S as {x, y }. If ɛ n = Cj /2 then we spt Cj nto two sets Cj 1 and Cj 2 of equa sze and match up a each pont n x n C 1 wth a pont y n C 2 and then defne the set S as {x, y }. If ɛ n < max( C, Cj /2), then we appy ether of the constructons above. In a cases we produce a st of dsjont subsets S 1, S 2,..., such that for each, a ponts n S are n the same custer n one of C or C and they are a n dfferent custers n the other. Usng Fact 1 we obtan that by movng the set A from Cj to C we ncrease the dstance between C and C by 1 n ( S 1) = ɛ. Overa we get dst(c, C ) = ɛ = ɛ. We aso have Φ(C ) = Φ(C) + (/ɛ) (ɛ n) Φ(C) + n Φ(C) + αopt. We thus obtan that C whch s ɛ-far from the target and whose mn-sum cost s wthn a 1 + α factor of OPT, contradctng the (1 + α, ɛ)-property. To fnsh the argument et L be the output of the greedy vertex cover on the graph G. Specfcay, et L be the st of custers constructed as foows: pck an arbtrary edge e n G, add both vertces ncdent to edge e n the st L, deete any edge sharng a vertex wth e, and repeatng unt the graph s out of edges. Note that L s a vertex cover n G; ths s because takng both of the vertces ncdent to a gven each edge n the st L and we ony deete edges ncdent to one of these vertces, and eventuay deete a the edges. Snce L s a vertex cover n G, we have that for any par (C, C ) whch forms an edge n G, ether C L or C L, so the remanng set of custers C \ L s an ndependent set of G. Snce L s a coecton of dsjont edges we aso have (accordng to what we have proved above) C L C 3ɛn. Ths concudes the proof. If C L n the above proof, et H 1 = C. Ese et H 1 =. By Lemma 6 we have that k =1 H1 3ɛn and that the cost of mergng two custers that have not been removed s ow. Ths ast condton mpes the foowng: Lemma 7 For any two x C \ H1, y C j \ H1 j, w(x) 15ɛ, w(y) 15ɛ, we have d(x, y) 3ɛ 1 mn ( C, C j ). Proof: Assume there exst x C, y C j w(y) 15ɛ, s.t. d(x, y) 3ɛ 1 mn ( C, C j )., w(x) 15ɛ, Note the addtona cost ncurred n the mnsum objectve by mergng C and C j s at most d(x, y ) x C y Cj x C y Cj (d(x, x) + d(x, y) + d(y, y )). Therefore the addtona cost ncurred n the mnsum objectve by mergng C and C j s at most Cj w(x) + C w(y) + 3ɛ C C j mn ( C, C j ) = Cj w(x) + C w(y) + 3ɛ max ( C, Cj ) ( ( Cj + C ) 15ɛ + ) 3ɛ < ( C + Cj ) wα 2ɛ, whch contradcts Lemma 6 and the defnton of H 1. For a x, et us now defne τ x and B x that w be used n Agorthm 1. To obtan τ x, we start τ = 0 and graduay ncrease t unt B(x, τ) 1 20 ɛτ ; once ths happens we set τ x = τ and B x = B(x, τ x ). We can now show the foowng. Lemma 8 For any pont x such that w(x) 15ɛ we have τ x 6ɛ C. Proof: Snce w(x) = y C d(x, y) 15ɛ, we have that at east C /2 ponts n a τ = 6ɛ C neghborhood of x. Ths mpes B(τ, x) τ > 12ɛ > 20ɛ, so τ x τ as desred.

6 Lemma 9 For any two ponts x C \ H1, y C j \ H1 j, such that w(x) 15ɛ, w(y) 15ɛ, we have B x B y =. Proof: By Lemmas 7 and 8 we have ( ) τ x + τ y 1 6ɛ C + 1 Cj 3ɛ 1 mn ( C, C j ) < d(x, y), ) whch together wth Lemma 8 mpes the desred resut. Let us denote by H 2 = { x C \ } H1 : w(x) > 15ɛ and H 2 = H 2. Snce E[w(x)] = w, by Markov nequaty, we have H 2 15ɛ α n. 3.2 Agorthm for Mn-Sum Custerng In ths secton, we show that f our data satsfes the (1 + α, ɛ)-property for the mn-sum objectve, then we can fnd a custerng that s O(ɛ/α)-cose to the target C T. We start by consderng the case where we know the vaue of OPT or w = OPT/n and we then show how to get rd of ths assumpton n Theorem 10. For the case of known w we show n the foowng that Agorthm 1 can be used to produces a custerng that s O(ɛ/α)-cose to the target. In ths agorthm we defne crtca threshods τ 0, τ 1, τ 2,... as: τ 0 = 0 and τ s the th smaest dstnct dstance d(x, y) for x, y S. We can show the foowng. Agorthm 1 Mn-Sum Agorthm Input: (S, d), w, ɛ 1, α > 0, k. For a x do: Let the nta threshod τ = τ 0. Construct the ba B(x, τ) by ncudng a ponts wthn dstance τ of x. ɛτ If B(x, τ) 1 20 then et τ x = τ and B x = B(x, τ x ) ese ncrease τ to the next crtca threshod For a x, et B x := {y : x B y, y B x }; set L =. For = 1... k do Let C o be the argest B x. Add C o to L. For a x x, set B x = B x \ C o. Output: Custerng L. Note that n the B x constructon phase one can aternatvey sort the ponts by ther dstance from x and add them to B(x, τ) one-by-one nstead of usng crtca threshods. Theorem 10 If the mn-sum nstance (S, d) satsfes the (1+ α, ɛ)-property and we are gven the vaue of w, then Agorthm 1 produces a custerng that s O(ɛ/α)-cose to the target. Proof: We frst note that the sets B x n Agorthm 1 are we defned snce for sma τ the condton B(x, τ) 1 20 ɛτ s obvousy fase and for very arge τ the condton s obvousy true because B(x, τ) 1. For a, et c be a pont n C that mnmzes x C d(x, c ). By trange nequaty, a x C satsfy w(x) C d(x, c ) w(c ). Moreover, f x C and d(x, c ) 60ɛ C then w(x) 60ɛ w(c ) 60ɛ w(x), whch mpes w(x) 120ɛ. Let G = { x C \ (H 1 H 2 ), d(x, c ) < } 60ɛ C, and et G = G. Let { H 3 = x C : d(x, c ) } 60ɛ C and H 3 = H 3. Thus G = C \ (H1 H1 H1 ) and G = S \ (H 1 H 2 H 3 ). By Markov nequaty we have H 3 120(ɛ/α)n. We say that the ponts of G are good and the ponts of H := H 1 H 2 H 3 = S \ G are bad. As we have seen so far there are not too many bad ponts: H = O( ɛ α n) a fact that we w use ater. Let B = x G B x \ C. Ceary, for a x G we have B x C B. (1) From Lemma 9 we know that f x G and y G j for j then B x B y =. Ths mpes that B B j = for j as we as that f x G then B x ntersects ony G no other G j. Let B x = {y : x B y, y B x }. We now show that for a ponts x, Bx ntersects at most one set G and no other G j for j. For x G, snce B x B x, we get the desred cam. For z S \ G we mght have B z ntersect two dfferent G and G j. However from Lemma 9 we have that for any two x G and y G j there s no z such that z B x and z B y. Ths mpes that there s no z such that we have both x B z and y B z, so for z S \ G, B z can ntersect ony one G. From above we aso have: ( ɛ ) B H 1 + H 2 + H 3 = O α n. We now cam that for any there exsts an x such that B x G G 2( B + C \ G ). (2) We frst prove that for a x G we have B x G. If τ x > 30ɛ C, then B x G. Ese, f τ x < 30ɛ C then B x 1 30 C ɛ 20 ɛ = 1.5 C > G.

7 So for every x G, we have by (1), B x G G B C \ G. Ths mpes that there exsts an x such that So, {x G : x B x } G B C \ G. {x G : x B x } B x G 2 B 2 C \ G. Snce {x G : x B x } B x B x, we get reaton (2), as desred. To fnsh the argument we need to argue that greedy coverng on B x works we. Let us thnk of each custer G as ntay unmarked, and then markng t the frst tme we ever choose a group that ntersects t. We now consder a few cases. If the jth Cj o ntersects an unmarked G, we w assgn σ(j) =. Note that f ths group msses α ponts from G, then snce we were greedy, accordng to reaton (2), we must have pcked at east α 2( B + C \ G ) eements from H n ths group. Overa, we must have (α 2( B + C \ G )) H, whch together wth B H and C \ G H mpes α 5 H. Thus tota error ncurred n ths way w.r.t. the good set G s gven by the number of ponts mssed from G, so t s at most α 5 H. The other case s when the jth group C o j ntersects a marked G. In ths case we assgn σ(j) to any arbtrary custer C not marked by the end of the process. The error ncurred from these cases s at most H + α 6 H, snce ths s an upper bound on the number of ponts eft that aren t n unmarked custers. Fnay, we need to aso consder the error wth respect to the bad set H. Addng a these up, we obtan that the tota error s bounded by 5 H + 6 H + H = 12 H = O(ɛ/αn). In the case of unknown w, we show the foowng: Theorem 11 If k og n/og og n and f the mn-sum nstance (S, d) satsfes the (1 + α, ɛ)-property even f we are not gven w, we can use Agorthm 1 as a subroutne to produce a custerng that s O(ɛ/α)-cose to the target. For the case of genera k, we can use Agorthm 1 as a subroutne to produce a st of og og n custerngs such that one of them s O(ɛ/α)-cose to the target. Proof: It s not dffcut to verfy that the argument n Theorem 10 hods (wth ony a constant factor oss n the fna guarantee on the error rate), even f we use a constant factor approxmaton for w nstead of usng the exact vaue of w n Agorthm 1. If k og n/og og n, then we can use the resuts n [15] for fndng a constant factor approxmaton for w, and thus we are abe produce a custerng that s O(ɛ/α)-cose to the target. For the case of genera k, we use the fact that there exsts an O(δ 1 og 1+δ n)-approxmaton agorthm n tme n O(1/δ) for the case of arbtrary k [9]. The man dea s to use the agorthm n [9] wth δ = 1 to fnd a ower bound an upper bound L for w that are wthn a mutpcatve O(og 2 n) factor of each other. We then try a the vaues of, 2,..., 2,... and run Agorthm 1 for each of them. One of the vaues 2 w be a 2-approxmaton for w and an argument smar to the one n Theorem 10 shows that n that case we get a custerng whch s O(ɛ/α)-cose to the target. Note: A our arguments above can be extended (wth an approprate oss n the fna accuracy guarantees) to the case where the gven dssmarty functon d satsfes ony the foowng d(x, y) γ(d(x, z) + d(z, y)) for some γ > 1. Theorem 12 If the mn-sum nstance (S, d) satsfes the (1+ α, ɛ)-property, then so ong as then so ong as the smaest correct custer has sze greater than 100ɛn/α 2 we can effcenty fnd a custerng that s O(ɛ)-cose to the target. Proof Sketch: Assume that we are gven the vaue of w. We frst use the constructon n Theorem 10 to produce a custerng C 1,..., C k wth the property that the target custerng s O(ɛ/α)-cose to the target. For each custer C we compute the center c be a pont n C that mnmzes x C d(x, c ). We then defne the set { Cs = x C, d(x, c ) < } 60ɛ C. The fact that the custers are ɛn/α 2 means that each Cs captures at east a (1 O(α))-fracton of the correspondng C. We now construct a new custerng C 1,..., C k as foows: for each pont x and each custer C j, we compute the weght w s (x, j) as y C d(x, y). We fnay nsert x nto s the custer C wth = argmn j w s (x, j). The man steps n the correctness proof are the foowng. We frst show that (up to re-ndexng of the custers) Cs G and that Cs = n O((ɛ/α)n). We then use these facts together fact that each Cs s a (1 O(α))-approxmaton to C n order to show that a but O(ɛn) ponts w make the rght choce. In the case where we do not know w, we use the technque n [7] of tryng ncreasng vaues of w: we then stop the frst tme when we output k custers that cover at east n O((ɛ/α)n of the ponts n S. 3.3 Inductve Settng In ths secton we consder an nductve mode n whch the set S s merey a sma random subset of ponts of sze n from a much arger abstract nstance space X, X = N, N n and the custerng we output custerng s represented mpcty through a hypothess h : X Y. In the case where k og n/og og n we produce a custerng of error at most O(ɛ/α). In the case where k > og n/og og n we produce a st of hypotheses, {h 1,..., h?? } such that at east one of them has error at most ɛ/α. We can adapt the agorthm n Theorem 10 to the nductve settng as shown n Agorthm 2. The man dea s to show that our agorthm from the transductve settng s

8 pretty robust, and t can survve emnatng sma custers, makng B and the set sze estmates fuzzy. Specfcay, n the case of known w we can show the foowng: Theorem 13 Assume that the mn-sum nstance (X, d) satsfes the (1 + α, ɛ)-property and that we are gven the vaue of w. If we draw a sampe S of sze n = O ( k 2 ɛ n ( )) kn 2 δ then we can use Agorthm 2 to produce a custerng whch s O(ɛ/α)-cose to the target wth probabty > 1 δ. Moreover, nsertng a new eement ony takes O(k) tme. Proof Sketch: The proof works n two phases. In the frst stage we redo the anayss of Theorem 10 to show that Agorthm 2 works as we as Agorthm 1 (up to a oss of mutpcatve constants) n producng the approxmate custerng. The dfference s that Agorthm 2 s fuzzer than Agorthm 1 n severa respects the comparsons need not be exact, and set-sze estmates are ony needed wthn a constant precson. In the second phase we observe that Agorthm 2 can be executed n the nductve settng wth hgh probabty. In partcuar, set szes can be estmated wthn the requred precson from few sampes, and for each suffcenty arge custer there s a sutabe center x n the custer such that B S x > (1 γ) B max x. Ths mpes that the resut of the executon of Agorthm 2 on the sampe s actuay the projecton of a vad executon of the agorthm on the entre nput to the sampe. Thus by the correctness of the agorthm n the transductve settng we obtan ts correctness n the nductve mode. Fnay, the correctness of the testng phase foows from the structura propertes of the custerng we proved n Theorem 10. Theorem 13 aso works f we are gven a constant factor approxmaton rather than an exact vaue for w. We now state our man resut for the case of unknown w. In the foowng, we denote by D the dameter of the metrc space,.e, D = max x,y d(x, y). Usng resuts from [14, 15] on estmatng the vaue of the optma mn-sum based on the sampe, we obtan the foowng theorem. Theorem 14 Assume that the mnsum nstance (X, d) satsfes the (1 + α, ɛ)-property and that we are not gven thew vaue of w. If we draw a sampe S of sze satsfyng both n = O ( k 2 ɛ n ( )) kn 2 δ and n = Õ ( D(k + n(1/δ))(og n + Dk 2 ) ), and f k og n/og og n then we can use Agorthm 2 as a subroutne to produce a custerng that s O(ɛ/α)-cose to the target. For the case of genera k, we can use Agorthm 2 as a subroutne to produce a st of og og n custerngs such that one of them s O(ɛ/α)-cose to the target. 4 The Correaton Custerng Probem The correaton custerng setup ntroduced n [11] s as foows. We are gven a fuy-connected graph G wth edges abeed +1 (smar) or 1 (dfferent), and the goa s to fnd a partton of the vertces nto custers that agrees as much as possbe wth the edge abes. 4 4 Note that the probem s not trva snce we mght have nconsstences. In partcuar, t s possbe to have x, y, z such that Agorthm 2 Fuzzy Mn-Sum Agorthm Input: (S, d), w, ɛ 1, α > 0, k, n, N. Tranng phase Set w = wn/n, I =, N =, γ = ɛ/k. For a x do: Let the nta threshod τ = τ 0. Construct the ba B S (x, τ) by ncudng a ponts wthn dstance τ of x. 1 If 17 ɛτ B S 1 (x, τ) 18 ɛτ and B S (x, τ) ɛn 2k then et τx S = τ and B S x = B S (x, τx S ); add x to I ese ncrease τ to the next crtca threshod For a x, et { } y : x B S y, B S y ɛn k BS x { } y : x B S y, B S y ɛn 8k. Set L =. For = 1... k do Let C o be a custer B S x of sze of at east (1 γ) of the argest B S x wth x I. Add C o to L. For x x, set B S x = B S x \ Co. Testng Phase:. When a new pont z arrves, assgn t to the custer C o whch mnmzes d(z, x ). In partcuar, the Mn-Dsagreement correaton custerng objectve (Mn-Dsagreement CC) asks to fnd a custerng C = {C 1, C 2,..., C k } the mnmzes the number of dsagreements: the number of 1 edges nsde custers pus the number of +1 edges between custers. In ths custerng formuaton one does not need to specfy the number of custers k as a separate parameter, as n measures such as k- medan or mn-sum custerng. Instead, n correaton custerng, the optma number of custers can take any vaue between 1 and n, dependng on the edge abes. The currenty best approxmaton agorthm for mnmzng dsagreements s a 2.5 approxmaton [3] and the probem s known to be APX-hard [13]. We can show that the (c, ɛ) assumpton does not make optmzng Mn-Dsagreement CC objectve easer. Theorem 15 For the Mn-Dsagreement CC objectve the probem of fndng a c-approxmaton can be reduced to the probem of fndng a c-approxmaton under the (c, ɛ) assumpton. Therefore, the probem of fndng a c-approxmaton under the (c, ɛ) assumpton s as hard as the probem of fndng a c-approxmaton n genera. We show now that f our nput satsfes the (1 + α, ɛ)- property for the Mn-Dsagreement CC objectve, then the data satsfes the (2.5, (49/α + 1)ɛ) property as we. Specfcay: the edge (x, y)s abeed +1, the edge (y, z) s abeed +1 and the (x, z) s abeed 1.

9 Theorem 16 For the Mn-Dsagreement CC objectve, f the nstance (S, d) satsfes the (1+α, ɛ)-property wth respect to the target custerng C T, then the nstance (S, d) aso satsfes the (2.5, (49/α + 1)ɛ) property wth respect to the target custerng C T. Interpretaton: Ths means that under the (1+α, ɛ)-property we can use a state of the art 2.5 approxmaton agorthm for mnmzng dsagreements n order to get a (49/α + 1)ɛ accurate custerng. Proof: We prove the contrapostve. We show that f the nstance that does not satsfy the (2.5, (49/α + 1)ɛ) property wth respect to the target custerng, then the nstance does not satsfy the (1 + α, ɛ) property wth respect to the target custerng. Reca that C s the optma Mn-Dsagreement CC custerng. Assume that the nstance (S, d) that does not satsfy the (2.5, (49/α + 1)ɛ) wth respect to the target custerng. Ths means that there exsts a custerng C = {C 1, C 2,..., C k } such that cost(c ) 2.5OPT and dst(c T, C ) (49/α + 1)ɛ; snce dst(c, C T ) ɛ we have dst(c, C ) 49ɛ α. For x S we denote by C (x) ts custer n C and by C (x) ts custer n C. We w ca a pont unnterestng f t does not change too many neghbors between the two custerngs C and C ; formay, x s unnterestng f C (x) C (x) < C (x) C (x). We show n the foowng that there are at east 49 ɛn α nterestng ponts. In order to do ths we exhbt a parta matchng of the custers n C and C ; specfcay, we connect two custers C and C j f C C j < C C j and we et π() = j. We prove now that ths s a parta matchng of the custers n C and C. Assume by contradcton that ths s not the case;.e. assume that there exst, j, k such that C C k = and C C j < C C j and Ck C j < C k C j, whch mpes C C j + C k C j < C C j + C k C j. (3) However snce C C k = we have both and whch mpes C C j + C k C j C j C C j + C k C j C j, C C j + C k C j C C j + C k C j, thus contradctng (3). Ths proves that π s a parta matchng of the custers n C and C. Let σ be an arbtrary permutaton of the whoe set S that matches a unnterestng ponts accordng to π;.e., σ s defned as σ() = j such that f x s unnterestng and C(x) = C, then C (x) = C π() = C j. By defnton we have dst σ (C, C ) dst(c, C ) 49 ɛn α, whch mpes that there exsts a set I wth at east 49ɛn/α nterestng ponts. We now compute the cost of soatng an nterestng pont x. Let us denote by w(x) the contrbuton of x to the Mn- Dsagreement CC n C and by w (x) the contrbuton of x to the Mn-Dsagreement CC n C. We ceary have w(x) + w (x) C (x) C (x) C (x) C (x), whch mpes: 2(w(x) + w (x)) max( C (x), C (x) ). So, for an nterestng pont x we get that {y : R(x, y) = +} C (x) + w(x) 3(w(x) + w (x)). So, the cost of soatng an nterestng pont x s at most 3(w(x) + w (x)). Snce cost(c ) 2.5OPT, cost(c ) = OPT and I we have: 49 ɛn α 1 I x I (w(x) + w (x)) 3.5OPT I 3.5OPTα 49ɛn. Ths mpes that for any set sze s I there exst a set A I of sze s such that 1 (w(x) + w (x)) 3.5OPTα A 49ɛn. x A Note aso that for any nterestng pont x we have therefore w(x) + w (x) 1, 49ɛn/α I x I(w(x) + w (x)) 3.5OPT, whch mpes OPT 14ɛn/α (4) Let A I of sze s = 4ɛn such that 1 (w(x) + w (x)) 3.5OPTα A 49ɛn. x A Let A s A be the set of sngeton ponts n the target custerng,.e, x A s f C(x) = {x}, and et A ns = A \ A s. We produce a new custerng C from C by soatng the ponts n A ns and by parng up the ponts n A s and mergng any two ponts n the same par. By Fact 1 have get dst(c, C ) 2ɛ, so dst(c T, C ) ɛ. Aso as shown above, the cost of soatng a the ponts n A ns s at most (10.5αOPT/(49ɛn)) A α(42/49) OPT; aso the tota cost of mergng the sngeton nterestng pars s at most A s /2 2ɛn whch by (4) s at most α(7/49) OPT. Ths mpes that the cost of soatng a the ponts n A ns pus the cost of mergng the sngeton nterestng pars s at most αopt. So the Mn-Dsagreement CC cost of C s wthn a 1 + α factor of OPT, and yet C whch s ɛ-far from the target. Thus our custerng nstance does not satsfy the (1+α, ɛ) property wth respect to the target custerng, whch s a contradcton. Ths competes the proof. Note: The other correaton custerng objectve s of maxmzng agreements, the number of +1 edges nsde custers pus the number of 1 edges between custers. For maxmzng agreements there exsts a PTAS [11], so ths objectve s not nterestng n our framework.

10 4.1 The Non Compete Graph Case In the case where the graph G s not fuy-connected, we do not get the strong resut as n Theorem 16. On the contrary, we can show the foowng: Theorem 17 For any α, β < 1/6, there exsts a famy of graphs G and target custerngs that satsfy the (1 + α, 0) property for the the Mn-Dsagreement CC objectve and yet do not satsfy even the (1 + α + β, 1/2) property for that objectve. Proof: Consder a set of n ponts such that the target custerng conssts of one custer C 1 wth n/2 ponts and one custer C 2 wth n/2 ponts. we set both C 1 and C 2 to be fuy connected wth a the edges nsde C 1 and C 2 abeed +. Aso we desgnate a snge vertex n C 1 whch s connected wth n/3 vertces n C 2 wth edges a abeed as +, and a vertex n C 2 connected wth (n/3) (1 + α + β) edges n C 1, a abeed as. It s easy to verfy that the nstance satsfes the (1 + α, 0) property; we have OPT = n 3 and any other souton has cost greater than n/3(1 + α + β). However the souton does not even satsfy the (1 + α + β, 1/2) property. The custerng wth a the ponts n one bg custer has cost 1 + α + β and yet, t s dstance from the target s 1/2. 5 Concusons and Open Questons In ths work we get around nherent napproxmabty resuts for the mn-sum objectve n the case where good approxmaton to the mn-sum objectve ndeed mpes an accurate custerng. We derve strong structura propertes from ths assumpton, and use them to gve an effcent agorthm that produce accurate custerngs. In the mnmzng dsagreements settng for correaton custerng we show that the same assumpton aows us to fnd an accurate custerng usng exstng approxmaton agorthms. One concrete open queston remanng s deang wth a non-compete graph n the context correaton custerng for the mnmzng dsagreements objectve. More generay, t woud be nterestng to further expore and anayze n ths framework other natura casses of commony used custerng objectve functons. It woud aso be nterestng to consder an agnostc verson of mode, where the (c, ɛ) property s satsfed ony after some sma number of outers or -behaved data ponts have been removed. Acknowedgments: usefu dscussons. References We thank Avrm Bum for numerous [1] D. Achoptas and F. McSherry. On spectra earnng of mxtures of dstrbutons. In Prooceedngs of the Eghteenth Annua Conference on Learnng Theory, [2] M. Ackerman and S. Ben-Davd. Whch data sets are custerabe? - a theoretca study of custerabty. In NIPS, [3] N. Aon, M. Charkar, and A. Newman. Aggregatng nconsstent nformaton: rankng and custerng. In Proceedngs of the 37th ACM Symposum on Theory of Computng, [4] N. Aon, S. Dar, M. Parnas, and D. Ron. Testng of custerng. In Proceedngs of STOC, [5] S. Arora and R. Kannan. Learnng mxtures of arbtrary gaussans. In STOC, [6] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagaa, and V. Pandt. Loca search heurstcs for k-medan and facty ocaton probems. SIAM J. Comput., 33(3), [7] M.-F. Bacan, A. Bum, and A. Gupta. Approxmate custerng wthout the approxmaton. In Proceedngs of the ACM- SIAM Symposum on Dscrete Agorthms, [8] M.-F. Bacan, A. Bum, and S. Vempaa. A dscrmantve framework for custerng va smarty functons. In STOC, [9] Y. Barta, M. Charkar, and D. Raz. Approxmatng mn-sum k-custerng n metrc spaces. In Proceedngs on 33rd Annua ACM Symposum on Theory of Computng, [10] S. Ben-Davd. A framework for statstca custerng wth constant tme approxmaton for k-medan and k-means custerng. Machne Learnng, 66(2-3), [11] A. Bum, N. Bansa, and S. Chawa. Correaton custerng. Machne Learnng, 56:89 113, [12] M. Charkar, S. Guha, E. Tardos, and D. B. Shmoy. A constant-factor approxmaton agorthm for the k-medan probem. In STOC, [13] M. Charkar, V. Guruswam, and A. Wrth. Custerng wth quatatve nformaton. In Proceedngs of the 44th Annua IEEE Symposum on Foundatons of Computer Scence (FOCS), pages , [14] A. Czumaj and C. Soher. Subnear-tme approxmaton for custerng va random sampes. In Proceedngs of the 31st Internatona Cooquum on Automata, Languages and Programmng (ICALP), [15] A. Czumaj and C. Soher. Sma space representatons for metrc mn-sum k-custerng and ther appcatons. In Proceedngs of the 24th Internatona Symposum on Theoretca Aspects of Computer Scence, [16] S. Dasgupta. Learnng mxtures of gaussans. In Proceedngs of the 40th Annua Symposum on Foundatons of Computer Scence, [17] W. Fernandez de a Vega, Marek Karpnsk, Care Kenyon, and Yuva Raban. Approxmaton schemes for custerng probems. In In Proceedngs for the Thrty-Ffth Annua ACM Symposum on Theory of Computng, [18] L. Devroye, L. Gyorf, and G. Lugos. A Probabstc Theory of Pattern Recognton. Sprnger-Verag, [19] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Cassfcaton. Wey, [20] P. Indyk. Subnear tme agorthms for metrc space probems. In STOC, [21] K. Jan, M. Mahdan, and A. Saber. A new greedy approach for facty ocaton probems. In 34th STOC, [22] T. Joachms and J. Hopcroft. Error bounds for correaton custerng. In Proceedngs of the Internatona Conference on Machne Learnng, [23] R. Kannan, H. Samasan, and S. Vempaa. The spectra method for genera mxture modes. In 18th COLT, [24] J. Kenberg. An mpossbty theorem for custerng. In NIPS, [25] A. Kumar, Y. Sabharwa, and S. Sen. A smpe near tme (1 + ɛ)-approxmaton agorthm for k-means custerng n any dmensons. In 45th FOCS, [26] M. Mea. Comparng custerngs by the varaton of nformaton. In COLT, [27] M. Mea. Comparng custerngs an axomatc vew. In Internatona Conference on Machne Learnng, [28] L.J. Schuman. Custerng for edge-cost mnmzaton. In Proceedngs of the Thrty-Second Annua ACM Symposum on Theory of Computng,, [29] S. Vempaa and G. Wang. A spectra agorthm for earnng mxture modes. JCSS, 68(2): , 2004.

MARKOV CHAIN AND HIDDEN MARKOV MODEL

MARKOV CHAIN AND HIDDEN MARKOV MODEL JIAN ZHANG JIANZHAN@STAT.PURDUE.EDU Markov chan and hdden Markov mode are probaby the smpest modes whch can be used to mode sequenta data,.e. data sampes whch are not