The Per-character Cost of Repairing Word Languages

Size: px

Start display at page:

Download "The Per-character Cost of Repairing Word Languages"

Sibyl Hawkins
5 years ago
Views:

1 The Per-character Cost of Reparng Word Languages Mchael Benedkt a, Gabrele Pupps b, Crstan Rveros c a Department of Computer Scence Unversty of Oxford b CNRS / LaBRI Unversty of Bordeaux c Department of Computer Scence Pontfca Unversdad Catolca de Chle Abstract We show how to calculate the maxmum number of edts per character needed to convert any strng n one regular language to a strng n another language. Our algorthm makes use of a local determnzaton procedure applcable to a subclass of dstance automata. We then show how to calculate the same property when the edtng needs to be done n streamng fashon, by a fnte state transducer, usng a reducton to mean-payoff games. In ths case, we show that the optmal streamng edtor can be produced n P. Keywords: Repar, asymptotc normalzed cost, edt dstance, dstance automata, regular languages.. Introducton Edt dstance s a well-studed metrc between strngs, measurng how many operatons are needed to get from one strng to another. In ths paper we look for natural (asymmetrc) analogs for regular languages: how many edts does t requre to get from a word n regular language R to a word n regular language T, n the worst case? Our notaton s motvated by consderng R to be a restrcton a constrant that the nput s guaranteed to satsfy and T to be a target a constrant that we want to enforce. In a pror work [], we consdered the basc queston of whether one can get from a word n R to a word n T wth a fnte, unformly bounded number Emal addresses: mchael.benedkt@cs.ox.ac.uk (Mchael Benedkt), gabrele.pupps@labr.fr (Gabrele Pupps), crveros@ng.puc.cl (Crstan Rveros) Preprnt submtted to Theoretcal Computer Scence Aprl 4, 204

2 of edts. One of the man results of [] was a characterzaton of the pars (R, T ) for whch such a unform bound exsts. Example. Consder the regular languages R = a b and T = a c b. Clearly, any strng n R can be converted to a strng n T wth at most edt operaton. Such a bound, when t exsts, shows that the language R s qute close to beng a subset of T the gap between strngs n R and strngs n T s small. However, havng a unform bound on the number of edts s a strong requrement. In ths paper we look not at the absolute number of edts requred to get from R to T, but rather at the percentage of letters that need to be edted. Example 2. Consder the languages R = (a + b) and T = (ab). Roughly, for any par of consecutve occurrences of the letter a n the nput, we wll have to perform one edt n order to ensure alternaton n the output. In partcular, the number of edts requred to get from a strng n R to a strng n T s unbounded. On the other hand, t s clear that, n the worst case (.e., a 2n ), one needs to edt approxmately half of the letters n order to produce a strng n T n ths case, we say that all strngs n R can be repared nto T wth normalzed cost at most 2. We measure the gap from a restrcton language R to a target language T va the worst case, over all strngs u R, of the number of edts needed to brng u nto T dvded by the length of u. Snce we want the defnton to be robust to a fnte number of outlers, we take the lmt of ths quantty as the strngs are of larger and larger length ths s the asymptotc (normalzed) cost n gettng from R to T. Ths gves us a measure of the dstorton needed to get from R to T, lyng always between 0 and. In our pror work [] we have gven algorthms for determnng when the absolute cost of reparng R nto T s unformly bounded, and n ths case compute a bound. Smlarly, the man result of the present paper s an algorthm that computes the asymptotc cost of reparng R nto T. The technques used for the asymptotc cost analyss are radcally dfferent from those used for the bounded repar problem. Specfcally, they rely on deas from the theory of dstance automata [2], and n partcular on an applcaton of determnzaton of dstance automata, closely related to Mohr s determnzaton procedure [3]. 2

3 We then turn to the settng where the repar s requred to be done n streamng fashon, producng the edts mmedately on seeng the nput letter. We measure a streamng repar processor by the number of edts per character t requres to get from any strng n R to a strng n T, agan lookng at the lmt as the strng length gets large. We accordngly defne the streamng asymptotc cost to be the optmal cost of a streamng processor. We show that ths quantty can also be calculated effectvely, usng technques from mean-payoff games. Example 3. Consder R = (a+b) c (a + +b + ) and T = a c a + +b c b +. One can get from R to T by only edtng the ntal letter: so the asymptotc cost s 0. However, a streamng strategy must commt to changng the ntal letter or leavng t be: f t makes the ncorrect choce, t wll have to edt an unbounded fnal segment; thus the streamng asymptotc cost s. The above two results gve us the ablty to compare the cost one should pay n edtng strngs n R to strngs n T wth an arbtrary processor wth the cost when we are restrcted to use a streamng processor. If these are the same, t shows that streamng processors that edt strngs n R to T can approxmate arbtrary processors n worst-case behavor. In summary our contrbutons are: ˆ ˆ We present an algorthm for calculatng the asymptotc normalzed cost of reparng strngs n regular language R to strngs n regular language T, based on locally determnzng a subclass of dstance automata. We gve an algorthm for calculatng the optmal asymptotc normalzed cost acheved usng a streamng edtng algorthm. Organzaton. Secton 2 gves the prelmnares, whle Secton 3 defnes the basc problems. Secton 4 studes the problem of computng the asymptotc cost n the non-streamng case, whle Secton 5 deals wth the streamng case. Secton 6 gves conclusons. 2. Prelmnares Gven a word w over an alphabet Σ, we denote by w ts length and, gven two postons w, we denote by w[] (resp., w[... ]) the -th symbol of w (resp., the nfx of w startng at poston and endng at poston ). 3

4 Automata. Non-determnstc fnte state automata (shortly, NFA) wll be represented by tuples of the form A = (Σ, Q, E, I, F ), where Σ s a fnte alphabet, Q s a fnte set of states, E Q Σ Q s a transton relaton, and I, F Q are sets of ntal and fnte states. The notons of run and accepted word are the usual ones. L (A) s the language recognzed by A. If A s a determnstc fnte state automaton (DFA), then we usually denote the unque ntal state by q 0 and turn ts transton relaton E nto a partal functon δ from Q Σ to Q defned by δ(q, ε) = q and δ(q, a u) = δ(q, u) ff (q, a, q ) E. For techncal reasons, t s convenent to assume that an automaton s trmmed, namely, all ts states are reachable from some ntal states (.e., they are accessble) and they can reach some fnal states (.e., they are coaccessble). It s worth notcng that, snce the decson problems we are gong to deal wth are at least NLogSpace-hard and snce states of automata that are not accessble or not co-accessble can be pruned usng some smple NLogSpace reachablty analyss, ths assumpton wll have no mpact on our complexty results. Snce automata can be vewed as drected (labeled) graphs, we nhert the standard defntons and constructons n graph theory. In partcular, gven an automaton A = (Σ, Q, E, I, F ) and a state q Q, we denote by C(q) the strongly connected component (shortly, SCC) of A that contans all states mutually reachable from q. We say that a component C of A s fnal f t can reach a fnal state (possbly outsde C). Gven a set C of states of A (e.g., a SCC), we denote by A C the NFA obtaned by restrctng A to the set C and by lettng the new ntal and fnal states be all and only the states n C (note that f C conssts of a sngle transent state, then the language L (A C) recognzed by the subautomaton A C s empty). Fnally, we denote by dag(a) the drected acyclc (unlabeled) graph of the SCCs of A and by dag (A) the graph obtaned from the symmetrc and transtve closure of the edges of dag(a). Transducers. A (real-tme sub-sequental) transducer s a tuple S = (Σ,, Q, δ, q 0, Ω), where Σ s a fnte nput alphabet, s a fnte output alphabet, Q s a fnte set of states, δ s a partal transton functon from Q Σ to Q, q 0 s an ntal state, and Ω s a partal functon from Q to. For every nput word u = a... a n Σ, there s at most one run of S on u of the form a q /v a 0 2 /v 2 q... a n/v n ε/v n+ qn 4

5 where δ(q, a ) = (v, q + ) for all 0 < n and Ω(q n ) = v n+. In such a case, we defne the output of S on u to be the word S(u) = v v 2... v n v n+ (observe that the transducer outputs an addtonal, possbly empty, word to be added on at the end of the computaton). Transducers as above produce an output word mmedately on readng an nput character. We wll also consder transducers wth a bounded amount of delay. A k-lookahead transducer, wth k N, s as above, but where the transton functon δ now has nput n Q Σ k+, where Σ = Σ { } and / Σ. Gven an nput word u and a poston u n t, we denote by u the (k + )-character subword of u k that starts at poston and ends at poston + k. The output of a k-lookahead transducer S on an nput u of length n s the unque word v = v v 2... v n v n+ for whch there exsts a sequence of states q 0,..., q n satsfyng δ(q, u ) = (v, q + ), for all n, and Ω(q n ) = v n+. Clearly, a 0-lookahead transducer s smply a standard (real-tme sub-sequental) transducer. 3. Problem settng Gven two words u Σ and v, we denote by dst(u, v) the Levenshten dstance (henceforth, edt dstance) between u and v, whch s defned as the length of a shortest sequence s of edt operatons (e.g., deletng a sngle character, modfyng a sngle character, and nsertng a sngle character) that transforms u nto v [4]. We are nterested n quantfyng how dffcult t s to edt a word n one language to obtan a word n another language. That s, we have fnte alphabets Σ and and regular languages R Σ and T, called the restrcton and target languages, respectvely. We would lke to edt any strng that s known to belong to the restrcton language R nto a strng n the target language T. We wll also consder the specal case where R = Σ, whch we denote as the unrestrcted case. A repar strategy for two languages R and T s any functon from R to T. For a repar strategy f and a word u R, we defne the (absolute) cost of f on u, denoted cost(u, f), as the edt dstance between u and f(u). In [] we have gven a characterzaton of those pars (R, T ) of languages for whch there exst a repar strategy f whose (absolute) cost s fnte and unformly 5

6 bounded. In ths paper we wll not be concerned wth the absolute cost of reparng a word, snce the worst case of ths s often nfnte. We consder nstead a noton of repar cost between words that looks at the percentage of symbols n a word that need to be edted. More precsely, gven a repar strategy f for R and T and gven a word u R, we defne the normalzed cost of f on u as the rato between the cost of f on u and the length of u: ncost(u, f) = def cost(u, f). u In order to measure the asymptotc behavor of the normalzed cost, we defne the asymptotc cost of f as the lmt superor of the normalzed cost when the length of words n the restrcton language tends to nfnty: acost(r, f) = def lm sup u R u n ncost(u, f). Accordngly, we defne the asymptotc repar cost acost(r, T ) for two languages R and T as the mnmum of acost(r, f) taken over all repar strateges f for R and T. Note that acost(r, T ) can be equally defned by acost(r, T ) = lm sup u R u n mn v T dst(u, v). u Example 2 (contnued). Consder agan the languages R = (a + b) and T = (ab) of Example 2 n the ntroducton. Recall that, n the worst case (.e., a 2n ), one needs to edt approxmately half of the letters n order to produce a strng n T. Ths shows that acost(r, T ) = 2. Remark. It s easy to see that the asymptotc repar cost acost(r, T ) of any par of languages ranges over the nterval [0, ] of the real numbers. Indeed, for large words n the restrcton language R, one can modfy and delete the letters to create shorter words n the target language T, and thus the resultng edtng cost s always less than the length of the nput word. Ideally, we are nterested n computng the asymptotc cost acost(r, T ) for any par of regular languages R and T, provded that ths number s ratonal. We wll ndeed show that ths s the case and descrbe a procedure that computes the asymptotc cost. 6

7 Streamng vs non-streamng. We know from [4] that there s a dynamc programmng algorthm that, gven a word u and a regular target language T represented by means of a fnte state automaton T, computes n tme O( u T ) an optmal edt sequence that transforms u nto some word n T. In partcular, ths shows that optmal repar strateges can be descrbed by functons of farly low complexty. Sometmes t s desrable to have repar strateges that are n even more lmted classes. Perhaps the deal case s when we can repar R nto T wth a one-pass algorthm, that s, usng a real-tme sub-sequental transducer. Recall that a real-tme sub-sequental transducer defnes a word-to-word functon; f ths functon happens to produce a word n T for every nput u R, then we say that t s a streamng repar strategy for R and T. Smlarly, we can consder k-lookahead transducers, wth k N: ths type of transducer outputs words on the bass of ts current state and an nput (k +)-character wndow that represents a substrng of u of the form u[]... u[ + k], where u[] s ether the -th symbol of w, f u, or a dummy symbol, f > u. Accordngly, we talk about a k-lookahead streamng repar strategy for R and T. Gven a k-lookahead streamng edt strategy S for R and T and gven a word u R, we can defne the (absolute) cost of S on u n two ways: a. lettng q /v a 0 2 /v 2 q... a n/v n v qn n+ be the run of S on u, we defne the aggregate cost of S on u, denoted cost aggr S (u), to be the length of the fnal output v n+ plus the sum, over all ndces n, of dst(a, v ), where dst(a, v ) s f v s empty, v f a occurs n v, and v otherwse; 2. consderng the transducer S as a repar strategy, we defne the edt cost of S on u, denoted cost edt S (u), to be smply the edt dstance between u and the output S(u). The frst noton of cost consders the dstortons performed n producng the nput from the output t s equvalent to consderng the transducer as producng edt sequences rather than strngs and countng the number of edts produced. The second noton of cost s global and t consders only the output and not ts producton (clearly, the edt cost never exceeds the aggregate cost). These two models of cost can be very dfferent n general. As an example, consder a transducer S on the nput alphabet Σ = {a, b} that swaps a s and b s. On the strng u n = (ab) n, the aggregate cost s 2n snce S 7

8 changes each letter, but the edt dstance between u and S(u) (.e., the edt cost of S on u n our sense) s only 2. In the streamng settng, we wll manly focus on the model of aggregate cost, as for the model of edt cost we do not have any nterestng result. Formally, we defne the asymptotc (normalzed aggregate) cost of a k-lookahead streamng strategy S for R and T, as acost aggr S (R, T ) = def lm sup u R u n cost aggr S (u) u where cost aggr S (u) s defned above. Smlarly, we defne the asymptotc k- lookahead streamng cost of R and T, denoted acost aggr k lookahead (R, T ), as the nfmum of acost aggr S (R, T ) taken over all k-lookahead streamng repar strateges S for R and T. We remark that, a pror, the nfmum n the prevous defnton cannot be replaced by a mnmum: t s concevable that the asymptotc aggregate costs of the k-lookahead streamng repar strateges for R and T are arbtrary close to acost aggr k lookahead (R, T ), but never acheve ths value. In fact, n Secton 5 we wll show that ths s not the case, as we can enforce, wthout loss of generalty, a unform bound to the memory of the k-lookahead streamng repar strateges. Example 2 (contnued). Consder agan the languages R = (a + b) and T = (ab) of Example 2 and recall that acost(r, T ) = 2. We show that the asymptotc 0-lookahead streamng aggregate cost acost aggr 0 lookahead (R, T ) s hgher than the asymptotc cost n the nonstreamng case. Suppose that S s a 0-lookahead streamng repar strategy for R and T. One can nductvely construct arbtrarly long words u n = a a 2... a n R such that f a q /v a 0 2 /v 2 q... a n/v n qn s a partal run of S on u n, then each next letter a n s equal to the last letter of the prefx v v 2... v n of the output of S (f v v 2... v n s empty, then a n can be chosen arbtrarly). It s easy to see that the aggregate cost nduced by the run of S on u n s at least n, whence acost aggr 0 lookahead (R, T ) =. However, f we consder the model of edt cost, then we have acost edt 0 lookahead (R, T ) = 2 (n fact, any streamng strategy for R and T acheves ths asymptotc edt cost). 8

9 Observe that, n general, the k-lookahead streamng asymptotc cost acost aggr k lookahead (R, T ) assocated wth two languages R and T s a nonncreasng functon of the lookahead parameter k N and t s bounded from below by the non-streamng asymptotc cost acost(r, T ). Towards the end of Secton 5 we wll consder the problem of computng the lmt of the streamng asymptotc cost acost aggr k lookahead (R, T ) as the lookahead parameter k gets bgger. We are not able to compute the exact value of ths lmt, nor to prove that acost aggr k lookahead (R, T ) stablzes for suffcently large k, as t seems reasonable. However, we wll show how to solve a smpler problem, that conssts of decdng gven two DFA R and T and a ratonal threshold ν whether there s k N such that acost aggr k lookahead (R, T ) < ν. 4. Asymptotc cost n the non-streamng case In ths secton, we study the problem of computng the asymptotc cost n the non-streamng settng. We begn wth some background on dstance automata, whch wll play a key role n the man characterzaton result. 4.. Dstance automata computng the edt cost Intutvely, a dstance automaton [2] s a transducer D that receves as nput a fnte word u and outputs a correspondng cost D(u) n N { }. Dstance automata can be equvalently defned usng two dfferent presentatons based, respectvely, on matrces of transton costs and on transton relatons. Here, we adopt the latter type of presentaton, whch s more convenent for our purposes (e.g., t eases the defnton of a run of a dstance automaton). Formally, a dstance automaton s a tuple D = (Σ, Q, E, I, F ), where Q s a fnte set of states, E Q Σ N Q s a fnte transton relaton, I and F are some ntal and fnal condtons descrbed by partal functons from Q to N and representng the costs of begnnng and endng a run wth certan states. A run of D on a word u Σ s a sequence γ = (q 0, a, c, q ) (q, a 2, c 2, q 2 )... (q n, a n, c n, q n ) of parwse adacent transtons n E that spell the nput word u = a a 2... a n. The cost of the run γ s naturally defned by cost(γ) = def n c. 9

10 We denote by D(u) the mnmum value I(q 0 ) + cost(γ) + F (q n ) among all states q 0 n the doman dom(i) of I, all states q n n the doman dom(f ) of F, and all runs γ of D on u that start n q 0 and end n q n. We let D(u) = f there are no such states q 0 and q n, or f there s no run from q 0 to q n. When consderng the edt dstance of a word u Σ to a regular language T, t s farly natural to express ths value n terms of the cost computed by a dstance automaton. By default, we assume that the target language T s recognzed by an NFA T = (, Q, E, I, F ). Gven two states p, q of T, we let T p,q be the NFA obtaned from T by lettng p be the new ntal state and q the new unque fnal state. The dstance automaton that computes the edt dstance of a word u Σ to the target language L (T ) s defned as D edt T = (Σ, Q, E edt, I edt, F edt ), where ˆ E edt s the set of all transtons of the form (p, a, c, q), wth p, q Q, a Σ, q reachable from p n T, and c = mn {dst(a, v) v L (T p,q )}, ˆ ˆ I edt s the partal functon that maps a state q Q to the mnmum among the values dst(ε, v), wth v p I L (T p,q ) (f q s not reachable from some ntal state of T, then I edt (q) s undefned), F edt s the partal functon that maps a state p Q to the mnmum among the values dst(ε, v), wth v q F L (T p,q ) (f p cannot reach a fnal state of T, then F edt (p) s undefned). One can easly show that DT edt computes exactly the edt dstance of a word u Σ to the regular language L (T ): Proposton. For every word u Σ, we have D edt T (u) = mn v L (T ) dst(u, v). Proof. Let T = (, Q, E, I, F ) be an NFA and let DT edt = (Σ, Q, E edt, I edt, F edt ) be the correspondng dstance automaton, as defned above. For ths proof, t s convenent to ntroduce the noton of locally optmal run. We say that a run γ = (q, a, c, q 2 )... (q n, a n, c n, q n+ ) of DT edt s locally optmal f t has the mnmum cost among all runs of DT edt on the same word u = a... a n that start n q and end n q n+. Note that there can exst several locally optmal runs on the same word u wth dfferent costs (and, of course, wth dfferent begnnng and endng states). We have the followng characterzaton of the mnmum cost: 0

11 Clam. For every word u Σ and every locally optmal run γ of DT edt u that starts n p and ends n q, we have that cost(γ) = mn v L (T p,q) dst(u, v). Proof. The proof of the above clam s by nducton on the length of the word u. If u = ε, then the clam follows easly. As for the nductve step, let us assume that the clam holds for u and let us prove t for u a, wth a Σ. Let γ be a locally optmal run of DT edt on u a that starts n p and ends n q. Moreover, for every state r Q, let γ r be a locally optmal run of DT edt on u from p to r. Snce γ s locally optmal, we have that ts cost s the mnmum among the cost of a run γ r, wth r Q, plus the cost of an a-labeled transton from r to q. From the nductve hypothess, we have that cost(γ r ) = Ths shows that cost(γ) = mn (r,a,c,q) E edt = mn (r,a,c,q) E edt mn v L (T p,r) dst(u, v). cost(γ r ) + c mn v L (T p,r) dst(u, v) + c. We now look at the defnton of the transton relaton E edt. It contans all quadruples (r, a, c, q) such that c = mn {dst(a, w) w L (T r,q )}. Ths shows that cost(γ) = mn r Q mn v L (T p,r) = mn v w L (T p,q) mn w L (T r,q) dst(u a, v w). dst(u, v) + dst(a, w) on To complete the proof of the proposton, t s suffcent to recall that for every word u Σ, DT edt (u) s the mnmum among the values cost(γ) + I edt (p) + F edt (q), for all states p dom(i edt ) and q dom(f edt ) and all (locally optmal) runs γ of D edt on u that start from p and end n q. We also recall that I edt (p) = mn {dst(ε, v) p I, v L (T p,p)} and F edt (q) = mn {dst(ε, v) q F, v L (T q,q )}. Ths mples that DT edt (u) s the mnmum among the values dst(ε, v I ) + dst(u, v) + dst(ε, v F ) = dst(u, v I v v F ), where v I p I L (T p,p), v L (T p,q ), v F q F L (T q,q ), (hence v I v v F L (T )), and p, q Q. Ths concludes the proof of the proposton.

12 4.2. Shortcut property and determnzable components Dstance automata of the form DT edt are a proper sub-class of all dstance automata. In partcular, they satsfy the shortcut property, formalzed ust below. Gven a symbol a Σ and two states p, q of a dstance automaton a D, we wrte p q to denote the exstence n D of a transton (p, a, c, q) wth some cost c N. Defnton. A dstance automaton D satsfes the shortcut property f for a b a all symbols a, b and all states p, q, r, p q r mples p r and b p r. Lemma. For every DFA T, D edt T satsfes the shortcut property. Proof. The proof follows almost mmedately from the defnton of DT edt. Let us consder two consecutve transtons (p, a, c, q) and (q, b, c, r) n DT edt. We know from the defnton of the transton relaton of DT edt that there exst some words v L (T p,q ) and w L (T q,r ). It follows that v w L (T p,r ). Agan from the defnton of DT edt, we derve the exstence of a transton (p, a, c, r), for some c dst(a, v w). Smlarly, t follows that (p, b, c, r) s a transton n DT edt, for some c dst(b, v w). As wth NFA, we call a strongly connected component (SCC) of a dstance automaton D any maxmal set of mutually reachable states. Gven a SCC C of D, we denote by D C the sub-automaton obtaned from D by restrctng the set of states and transtons to C and by lettng the ntal and fnal condtons map any state of C to 0. Note that the transton graph of D C s a clque when D satsfes the shortcut property. A crucal property entaled by the shortcut property s the followng one. Consder two runs γ and γ of D C that spell the same word u, but end n dfferent states q and q. If γ and γ have optmal cost among all runs on D C on u that end n q and q respectvely, then one can show that the dfference n cost between γ and γ s unformly bounded by a constant. Ths mples that we can determnze D C by usng a subset constructon, mantanng the dfference between the optmal cost of reachng each state q and the overall optmal cost (the same dea underles Mohr s determnzaton procedure [3]). Snce ths dfference s always unformly bounded by a constant, we obtan a fnte state determnstc dstance automaton: 2

13 Proposton 2. For every dstance automaton D that satsfes the shortcut property and every SCC C of D, there s a determnstc dstance automaton det(d C) that s equvalent to D C, namely, such that, for all words u, det(d C)(u) = D C(u). In addton, one can construct det(d C) so as to satsfy the followng property: f u = u k and u2 = u k 2 are two repettons of the same word and ρ and ρ 2 are runs of det(d C) on u and u 2, respectvely, that form cycles, then cost(ρ ) k = cost(ρ 2) k 2. Proof. Let D = (Σ, Q, E, I, F ) be a dstance automaton satsfyng the shortcut property and let C be a SCC of t. As a prelmnary remark, we observe that, by defnton, D C = (Σ, C, E, I, F ), where E s obtaned from E by restrctng the set of states to C and I (q) = F (q) = 0 for all q C. Below, we consder runs of the dstance automaton D C on a gven word u that end n a gven state q and have the mnmum cost among all runs of the same type. We call these runs (u, q)-optmal (note that these are smlar to the locally optmal runs used n the proof of Proposton, wth the only excepton that the startng state s not fxed). We also say that a run s u-optmal f t s (u, q)-optmal for some state q C. The basc dea underlyng the determnzaton of the sub-automaton D C stems from the followng property: Clam 2. Gven a word u Σ, the costs of any two u-optmal runs of D C dffer for at most c max, where c max s the maxmum cost that appears n the transtons of D C. Proof. Let us fx a word u = a... a n and let us consder two u-optmal runs γ = (q, a, c, q 2 )... (q n, a n, c n, q n+ ) and γ = (q, a, c, q 2 )... (q n, a n, c n, q n+ ) of D C on t. Observe that the states q n+ and q n+ belong to the same SCC C of D and, n partcular, q n+ s reachable from q n+, namely, q n+ v q n+ v where denotes the natural extenson of the transton relaton a from v symbols to words (.e., p q ff v = ε and p = q, or v = v a, p v r, r a q, for some v Σ and some r C). Usng a basc nducton on 3

14 v and the shortcut property, one can prove that D contans a transton of the form (q n, a n, c, q n+ ), for some c {0,..., c max }. Ths mples that the followng s also a vald run of D C on u: γ = (q, a, c, q 2) (q 2, a 2, c 2, q 3)... (q n, a n, c, q n) (q n, a n, c, q n+ ) Usng the u-optmalty of γ, we derve cost(γ) cost(γ ) cost(γ ) + c max. By symmetrc arguments, one derves the nequalty cost(γ ) cost(γ) + c max. We now construct a determnstc dstance automaton det(d C) that turns out to be equvalent to D C. Intutvely, det(d C) parses an nput word u and t outputs the mnmal cost of a u-optmal run, keepng track, at the same tme, of the dfferences between ths cost and the costs of the (u, q)-optmal runs, for any q C (these dfferences are called resdual costs and, n vew of the prevous clam, are unformly bounded). We formally defne the determnstc dstance automaton det(d C) equvalent to D C as the tuple (Σ, Q, δ, r 0, F ), where ˆ Q s the set of vectors wth entres ndexed by states n C and values rangng over the fnte set {0,..., c max }, where c max s the maxmum cost that appears n the transtons of D C (ntutvely, these vectors represent the resdual costs of (u, q)-locally optmal runs, for each state q C and for some fxed word u); ˆ δ s the partal functon from Q Σ to N Q defned by δ( r, a) = (c, r ), where c = mn { r[p]+c p C, (p, a, c, q) E} and r [q] = mn { r[p]+ c c p C, (p, a, c, q) E}; ˆ r 0 s the ntal vector defned by r 0 [q] = 0 for all q C; ˆ F s the constant functon that maps any vector r Q to 0 (note that there always exst q C for whch the correspondng resdual r[q] n r s 0). We show that det(d C) s equvalent to D C, namely, that det(d C)(u) = D C(u) for all u Σ. Let us consder a word u = a... a n. By explotng a smple nducton on the length of u, one can prove that 4

15 . there exsts a run of det(d C) on u f, and only f, there exsts a run of D C on u, - 2. f ρ = ( r, a, c, r 2 )... ( r n, a n, c n, r n+ ) s the unque run of det(d C) on u = a... a n startng from state r (recall that det(d C) s determnstc), then the cost of ρ s equal to the cost of a u-optmal run γ augmented wth the resdual r [p], where p s the ntal state of γ. That s: cost(ρ) = cost(γ) + r [p]. () We omt the formal proof of the above propertes and we observe that they mmedately mply that det(d C)(u) = D C(u) for all words u Σ. Towards a concluson, we can use equaton () to prove the addtonal property concernng the cycles n det(d C). Consder two repettons of the same word, that s, u = u k and u2 = u k 2 and suppose that ρ and ρ 2 are runs of det(d C) on u and u 2 that form cycles (note that the two runs do not need to start from the same state). We denote by c max be the maxmal cost n D C and, for every n N, we let γ (n) be a u n -optmal run of D C. We now consder the costs of sutable repettons of the cycles ρ and ρ 2 : k 2 n cost(ρ ) = cost(ρ k 2 n ) cost(γ (k k 2 n) ) + c max (by ()) cost(ρ k n 2 ) + 2 c max (by ()) k n cost(ρ 2 ) + 2 c max. As the above nequalty holds for all natural numbers n, we conclude that cost(ρ ) k cost(ρ 2) k 2. Fnally, a symmetrc argument shows that cost(ρ ) k cost(ρ 2) k 2. Hereafter, gven a dstance automaton D satsfyng the shortcut property and a SCC C n t, we denote by det(d C) the determnstc dstance automaton that satsfes Proposton 2. A close nspecton to the proof of the above proposton shows that det(d C) can be constructed n exponental tme from D and C. Example 4. Consder the dstance automaton D of Fgure, whch computes the edt dstance of any word to the target language T = (ab + b) a. As D satsfes the shortcut property and conssts of two SCCs C and C 2, the two sub-automata D C and D C 2 can be turned nto equvalent determnstc dstance automata det(d C ) and det(d C 2 ), depcted to the rght of Fgure. 5

16 D : a/ a/0, b/ b/ a/, b/0 b/0 a/ a/0, b/ a/, b/ C det(d C ) : b/0 0 0 a/0 a/ 0 b/0 a/ b/0 a/0 0 b/ a/0 b/ C 2 det(d C 2 ) : 0 a/0, b/ Fgure : A dstance automaton wth two SCCs and ts determnzed sub-automata. We remark that the above result does not mply that the entre dstance automaton D s determnzable. Consder, for nstance, a dstance automaton D that computes the edt dstance of a word u to the target language T = a + b. Ths dstance s gven by the mnmum between the number of occurrences of a and the number of occurrences of b and hence any determnstc devce that computes dst(u, L (T )) must use unbounded memory Asymptotc cost n the unrestrcted case We now look at a specal case of the asymptotc cost problem, where the source restrcton s trval. Thanks to Proposton and Lemma, we can reduce the problem of computng the asymptotc repar cost acost(σ, L (T )) n the unrestrcted case to the problem of computng the asymptotc cost of a dstance automaton D satsfyng the shortcut property: acost(d) = def lm sup u Σ u n D(u). u Ths secton s devoted to provde an effectve characterzaton of the asymptotc cost acost(d) that wll mply that the value s ratonal and computable from D. Before turnng to the characterzaton, we prove that, n general, t s not possble to compute the asymptotc cost acost(d) for an arbtrary dstance automaton D: Proposton 3. The problem of decdng, gven an arbtrary dstance automaton D, whether or not acost(d) 2 s undecdable. 6

17 Proof. We use the undecdablty of the 2-threshold problem for normalzed costs nduced by dstance automata [5], whch conssts of decdng, gven a dstance automaton D, whether D(u) u 2 holds for all words u Σ. Let us consder a dstance automaton D = (Σ, Q, E, I, F ). We compute a varant of the Kleene closure of D, denoted D#, by ntroducng a fresh symbol # / Σ and by addng 0-cost #-labeled transtons from all states p dom(f ) to all states q dom(i). The new automaton D# satsfes the followng property: m N, u,..., u m Σ. D # (u #... #u m ) = m D(u ). Clearly, ths mples that acost(d# ) sup u Σ D(u) u. As for the converse nequalty, we consder a famly of words u (n) = u (n) #... #u (n) that lm D # (u(n) ) n D # (u(n) ) n = acost(d# ) and we observe that = mn D(u (n) ) mn u (n) + m n sup u Σ m n of length n such D(u). u In partcular, the above nequaltes mply that acost(d# ) = sup D(u) and u Σ u hence they reduce the 2-threshold problem for D to the problem of decdng whether acost(d# ) 2. From prevous remarks about the undecdablty of the 2-threshold problem, t follows that t s not possble to compute the asymptotc cost for generc dstance automata. The above proof gves a reducton from the 2-threshold problem for the normalzed cost of a dstance automaton D to the 2-threshold problem for the asymptotc normalzed cost of a dstance automaton D#. It s worth remarkng that ths reducton does not preserve the shortcut property. Ths means that, even though t s possble to compute the asymptotc normalzed cost for the sub-class of dstance automata satsfyng the shortcut property, the decdablty of the analogous 2-threshold problem for the normalzed cost cannot be mmedately derved from that. The problem of computng the normalzed cost for a dstance automaton satsfyng the shortcut property remans, to our knowledge, open. An approxmate varant of the threshold problem for dstance automata was solved n [6] by an algorthm that, gven any ratonal number ɛ > 0, tells 7

18 apart the case D(u) u ( ɛ) D(u) 2 from the case u the algorthm may return any output). 2 (n the reamng cases Next we explan how the shortcut property helps n computng the asymptotc cost. One can show that the problem of computng acost(d) for a dstance automaton D that s determnstc s reducble to the problem of computng normalzed costs of smple cycles. Formally, a smple cycle s a run that s a cycle (.e., that starts and ends n the same state) but that does not contan proper sub-cycles. It s then easy to show that for a determnstc dstance automaton D, acost(d) concdes wth the maxmum of cost(l) L among all smple cycles L of D, where cost(l) denotes the cost of the smple cycle L and L ts length (.e., number of transtons n t). Thus by Proposton 2, calculaton wth smple cycles suffces to compute the asymptotc cost of any dstance automaton satsfyng the shortcut property and havng a sngle SCC. We consder now the more general case of a dstance automaton D satsfyng the shortcut property and havng many SCCs, say C,..., C k. The stuaton n ths case s slghtly more complcated, as acost(d) cannot be expressed as a functon of acost(d C ),..., acost(d C k ). For ths we defne D as the determnstc mult-dstance automaton obtaned from the synchronous product of det(d C ),..., det(d C k ) and we denote by L,..., L m the smple cycles of D. Moreover, gven m and k, we denote by cost (L ) the cost of the proecton of the smple cycle L nto the -th component of D. Assumng that D s trmmed, namely, all ts states are reachable from some states n dom(i) and they can reach some states n dom(f ), we can characterze the asymptotc cost of D as follows: Theorem. For every (trmmed) dstance automaton D satsfyng the shortcut property, acost(d) = max α,...,α m 0 mn k m α cost (L ) m α L where C,..., C k are the SCCs of the dstance automaton D, L,..., L m are the smple cycles of the mult-dstance automaton D = det(d C )... det(d C k ), and cost (L ) s the cost of the proecton of the smple cycle L nto the -th component of D. The dea underlyng the above characterzaton s that the asymptotc cost acost(d) s acheved by repettons of smple cycles n the mult-dstance 8 (2)

19 automaton D. The parameters α,..., α m represent a correlaton between the numbers of repettons of the varous smple cycles and the ndex represents the SCC of D that optmzes the normalzed cost of these repettons. Before turnng to the proof of ths characterzaton, we llustrate by means of an example the ratonale behnd the use of cycles n D. Example 5. Consder agan the dstance automaton D of Fgure, wth the two SCCs C and C 2. The determnzed sub-automaton det(d C ) has four dfferent smple cycles: one spellng aa wth cost, one spellng ab wth cost 0, one spellng b wth cost 0, and one spellng aab wth cost. Smlarly, the determnzed sub-automaton det(d C 2 ) has two smple cycles: one spellng a wth cost 0, and the other spellng b wth cost. Hence (aa) n s a famly of words achevng a worst-case asymptotc cost of lm n 2n = 2 for the sub-automaton D C, and b n s a famly of words achevng a worst-case asymptotc cost of lm n n = for the sub-automaton D C 2. However, a 2n s not a worst-case for D C 2 (as t can be repared wth asymptotc cost 0) and, symmetrcally, b n s not a worst-case for D C. Ths means that the asymptotc cost for D must be acheved by a sutable combnaton of both famles of words. To fnd the correct combnaton that wtnesses the asymptotc cost for D t s convenent to consder cycles n the mult-dstance automaton D = det(d C ) detdet(d C 2 ) and lnear combnatons of ther costs n the components C and C 2. For the consdered example, we notce that the smple cycle n D that spells repettons of the form (aab) n acheves maxmal normalzed cost n both components, that s, 3, whch ndeed concdes wth the worst-case asymptotc cost acost(d). The proof of Theorem conssts of establshng two nequaltes, whch are gven by Lemma 3 and Lemma 4 below. For the frst nequalty, we argue that all words can be approxmated n cost by repettons of smple cycles, and that the cost of parsng these words s at most the cost of a homogeneous run,.e., a run lyng entrely nsde a sngle component of D. The frst part of the proof reles on the followng property, whch s also present n [7]. We state t n a graph-theoretc settng, n such a way that t can be later reused n several proofs. In partcular, we consder a drected graph G, whch can be understood as the mult-dstance automaton D, and a path ρ n t, that s, a sequence of edges of the form e e 2... e n, where the target vertex of each edge e concdes wth source vertex of the next edge e +. Intutvely, ths property state n the followng 9

20 lemma eases the calculaton of the cost wthn a SCC C of a run ρ of D that does not show any partcular cyclc structure. Lemma 2 (Smple cycle decomposton [7]). Let G be a fnte graph and let L,..., L m be all the smple cycles n t. Gven a path ρ n G, one can fnd a partton the doman of ρ nto (possbly non-convex) subsets X 0, X,..., X m such that. X 0 K, where K s the number of vertces of G, 2. for all m, the sub-sequence ρ X s a repetton of L. Proof. We fnd the sets X 0, X,..., X m by explotng an nducton. At the begnnng we defne X 0,0 to be the entre doman of the path ρ and X 0, = for all m. At each nducton step on n N, we subtract a sutable convex subset Y n from X n,0 and we add t to one of the subsets X n,, wth m. More precsely, f X n,0 K, where K s the number of vertces of G, then we termnate the nducton wth the current sets X n,0, X n,,..., X n,m. Otherwse, we contnue the nducton by specfyng the sets X n+,0, X n+,,..., X n+,m n terms of X n,0, X n,,..., X n,m as follows. We frst clam that there s an nterval Y n contaned n X n,0 for whch the sub-sequence ρ Y n s an occurrence of a smple cycle L, for some m. Indeed, snce the length of the sub-sequence ρ X n,0 exceeds the number K of states of G, we know that ρ X n,0 contans two repeated occurrences of the same state, and hence a cycle L. In ts turn, the cycle L must contan an occurrence of a smple cycle among L,..., L m (ths follows from the fact that the contanment relaton between cycles s a well-founded partal order). We choose such an occurrence of the smple cycle L n ρ X n,0 and we denote by Y n the set of the postons n X n,0 that carry the chosen occurrence. Accordngly, we defne ˆ X n+,0 = X n,0 Y n, ˆ X n+, = X n, Y n, ˆ X n+, = X n, for all ndces m dfferent from. If n s the last step of the nducton, then we defne X = X n, for all 0 m. Note that we have X 0 = X n,0 K. Moreover, t s easy to verfy (e.g., by nducton on n) that each sub-sequence ρ X n, (and hence, n partcular, the sub-sequence ρ X ) s a repetton of the correspondng smple cycle L. Ths concludes the proof of the clam. 20

21 Hereafter, for the sake of brevty, we tactly assume that C,..., C k are the SCCs of the dstance automaton D and L,..., L m are the smple cycles of the mult-dstance automaton D = det(d C )... det(d C k ). Moreover, we say that a run γ = (q 0, a, c, q )... (q n, a n, c n, q n ) of D = (Σ, Q, E, I, F ) s successful f t starts n a state q 0 dom(i) and t ends n a state q n dom(f ). Lemma 3. For every dstance automaton D satsfyng the shortcut property, acost(d) max α,...,α m 0 mn k m α cost (L ). m α L Proof. Let (u (n) ) n N be a famly of words over the alphabet Σ such that acost(d) = lm sup D(u (n) ). u (n) Wthout loss of generalty, we can assume that the lmt of the sequence D(u (n) ), for arbtrarly large numbers n, exsts, and hence t concdes wth u (n) acost(d). Indeed, f ths were not the case, we could restrct ourselves to a proper sub-famly (u (n) ) n N of words, where N s a an nfnte subset of the natural numbers, n such a way that the sequence D(u(n) ) converges for u (n) D(u n rangng over N. Assumng that lm (n) ) s defned wll allow us to further restrct, f necessary, to sub-famles of words wthout compromsng the u (n) above equalty. To prove the lemma, t s suffcent to fnd some parameters α,..., α m 0 that satsfy the followng nequalty: lm sup D(u (n) ) u (n) mn k m α cost (L ). (3) m α L Let us fx n N and denote by ρ (n) the (unque) successful run of D on the word u (n), and by ρ (n) the proecton of t nto the -th component C, for any k. Frst, we compare the cost D(u (n) ) wth the cost D C (u (n) ) for each SCC C. Consder a successful run γ (n) of D on u (n) whch starts n some state p and ends n some state q and that mnmzes the value cost(γ (n) )+I(p)+F (q). Clearly, we have D(u (n) ) cost(γ (n) )+I max +F max, where I max s the maxmum value taken by the ntal condton of D and F max s the maxmum value taken by the fnal condton of D. Consder now a run γ (n) of D C on 2

22 the same word u (n), but entrely nsde the SCC C, whch starts n p and ends n q and that mnmzes the relatve cost. Snce the ntal and fnal condtons of D C map every state to 0, we have D C (u (n) ) = cost(γ (n) ). Moreover, snce D s trmmed, we know that p s reachable from p and q s reachable from q. Thus, usng the shortcut property, one can easly verfy that cost(γ (n) ) cost(γ (n) ) + 2c max, where c max s the maxmum cost that appears n the transtons of D (the addtve constant 2c max accounts for the cost dscount n consderng a run of D C rather than a run of D). Ths shows that D(u (n) ) cost(γ (n) ) + I max + F max mn k cost(γ(n) ) + 2c max + I max + F max = mn k D C (u (n) ) + 2c max + I max + F max From Proposton 2, we also know that D C (u (n) ) = det(d C )(u (n) ). Moreover, snce det(d C ) s a determnstc dstance automaton, the proecton ρ (n) of ρ (n) can be vewed as the unque run of det(d C ) on u (n). We thus obtan D C (u (n) ) = det(d C )(u (n) ) = cost(ρ (n) ). Below, we explctly compute the cost of each run ρ (n) usng the costs of the smple cycles L n the component C of D. The problem s that the run ρ (n) may not contan factors consstng of entre repettons of these smple cycles. We overcome ths problem by vewng the mult-dstance automaton D as a fnte graph and the run ρ (n) of D as a path n t. The smple cycle decomposton lemma (Lemma 2) mples the exstence of a partton of the doman of ρ (n) nto (possbly non-convex) subsets X (n) 0, X (n),..., X m (n) such that. X (n) 0 s unformly bounded by the number K of states of D, 2. for all m, the sub-sequence ρ (n) X (n) cycle L of D. s a repetton of the smple For every ndex m, we denote by occ (n) the number of repettons of the smple cycle L n the sub-sequence ρ (n) X (n) (by a slght abuse of termnology we say that these are also repettons n the run ρ (n) ). We are 22

23 now ready to bound the cost of ρ (n) n the component C n terms of the costs of the repettons of each smple cycle L n ρ (n). The frst, straghtforward, nequalty s as follows (recall that the sets X (n) 0, X (n),..., X m (n) form a partton of the doman of ρ (n) and X (n) 0 K): cost(ρ (n) ) = m cost(ρ (n) m cost(ρ (n) X (n) X (n) ) + cost(ρ (n) ) + K c max X (n) 0 ) where c max s the maxmum cost that appears n the transtons of D. Moreover, t easly follows from the fact that the sub-sequence ρ (n) X (n) s an -fold repetton of the smple cycle L, that occ (n) cost(ρ (n) X (n) ) = occ (n) cost (L ). Now, n order to fnd the parameters α,..., α m 0 that satsfy Equaton (3), we consder the asymptotc behavour of the sequence occ(n) n. Wthout loss of generalty, we can assume that the lmt of occ(n) n for arbtrarly large numbers n N exsts. Indeed, we can always fnd an nfnte set N of natural numbers such that the sequence occ(n) n converges for n rangng over N. Note that restrctng to the correspondng sub-famly of words u (n) and runs γ (n) and ρ (n), for n N, does not affect the prevously establshed equaltes (n partcular, acost(d) = lm cost(γ (n) ) u (n) ). For the sake of smplcty, we shall not explctly menton the set N hereafter and we use, for nstance, lm f(n) to denote the lmt of a certan functon f for arbtrarly large numbers n N. Accordngly, for every ndex m, we defne Clearly, we have that occ (n) α = def lm occ (n) n. cost (L ) = (n α +occ (n) n α ) cost (L ) = n α cost (L )+g, (n) where g, (n) = def n α ) cost (L ) s a functon whose lmt tends to 0 (hence g, O(), usng the bg-o notaton). Puttng all together, we get an upper bound to the cost of the run γ (n) : (occ (n) D(u (n) ) mn k D C (u (n) ) + 2c max + I max + F max 23

24 = mn k cost(ρ(n) ) + 2c max + I max + F max mn k ( m = mn k = mn k cost(ρ (n) X (n) ) + K c max) + 2c max + I max + F max (occ (n) cost (L )) + 2c max + I max + F max + K c max m (n α cost (L ) + g, (n)) + 2c max + I max + F max + K c max m = n mn k m α cost (L ) + O(). Smlarly, we obtan a lower bound to the length of the word u (n) : u (n) m u (n) X (n) = m occ (n) L = n m α L O(). Towards a concluson, we prove Equaton (3) as follows: lm sup D(u (n) ) u (n) lm sup n mn k m α cost (L ) + O() n m α L O() = mn k m α cost (L ). m α L For the converse nequalty, we present a large famly of words for whch the optmal runs are nearly homogeneous (n the sense that they le almost entrely nsde a sngle component of D). The words wll consst of nested repettons of smple cycles n such a way that any optmal run stablzes n the same component. Lemma 4. For every dstance automaton D satsfyng the shortcut property, acost(d) max α,...,α m 0 mn k 24 m α cost (L ). m α L

25 Proof. Let us fx arbtrarly some parameters α,..., α m 0. We prove the above nequalty by constructng a famly of cyclc words u (n) that depend on α,..., α m and n and such that the normalzed cost of any run of D on. For the moment, we assume that all the states of the cycles L,..., L m are mutually reachable n D. Towards the end of the proof, we wll show how to drop ths assumpton. We fx () a run σ 0 of D that starts from the ntal state of D and ends n the frst/last state of L, () a run σ m of D that starts from the frst/last state of L m and ends n the frst/last state of L, and () for all < m, a run σ of D that starts from the frst/last state of L and ends n the frst/last state of L +. Wthout loss of generalty, we can assume that the lengths of the runs σ 0, σ,..., σ m do not exceed the number K of states of D. Now, we thnk of each smple cycle L as a run of the mult-dstance automaton D that starts and ends n the same state and we construct, for every natural number n, a cyclc run ρ (n) of D as follows: u (n) domnates, n the lmt, the cost m α cost (L ) m α L where ρ (n) = def σ 0 (ρ (n) cycles )n ρ (n) cycles = def L n α σ L n α σ m L n αm m σ m. Accordngly, we denote by u (n) the word spelled out by the run ρ (n). To prove the clam of the lemma, t s suffcent to prove that the followng nequalty holds for all n N and for all choces of runs γ (n) of D on u (n) : lm sup cost(γ (n) ) u (n) mn k m α cost (L ). (4) m α L Let us now fx further a run γ (n) of D on u (n) for each n N. We ntroduce some addtonal notaton. Gven n N, m, and k, we defne: ˆ X (n) to be the set of postons of γ (n) that carry occurrences of transtons whose states belong to the same SCC C (note that X (n) s an nterval); ˆ ˆ Y (n) to be the maxmal subset of X (n) s a repetton of the block ρ (n) cycles Z (n), to be the maxmal subset of Y (n) such that the sub-run ρ (n) Y (n) s also an nterval); (note that Y (n) such that ρ (n) Z (n),,..., Z(n) of the smple cycle L (note that the sets Z (n),,m of Y (n) and they contan possbly non-contguous postons). s a repetton form a partton 25

26 ˆ occ (n) to be the number of repettons of ρ (n) cycles n the sub-run ρ(n) Y (n), namely, occ (n) = (n) Y (note that ths mples ρ (n) Z (n) ρ (n) cycles, = L n α occ (n) ). The frst nequalty s straghtforward (the sets Z (n), are parwse dsont): cost(γ (n) ) k m cost(γ (n) Z (n), ). Gven an ndex k, we also denote by ρ (n) the proecton of ρ (n) nto the -th component (we can thnk of t as a run of the determnstc dstance automaton det(d C ) on u (n) ). Below, we fx some ndces m and k and we compare the cost of each sub-sequence γ (n) Z (n), wth the cost of the correspondng sub-run ρ (n) Z (n), of D C. We observe that γ (n) Z (n), s not necessarly a run of D C, snce the set Z (n), s not an nterval of the doman of γ (n) (we can stll compute ts cost though). We frst turn γ (n) Z (n), nto a run γ (n), of D C on u (n) Z (n), of smlar cost, as follows. Suppose that there exst two postons x < y n Z (n), such that z / Z (n), for all x < z < y and the correspondng transtons γ (n) [x] = (p x, a x, c x, q x ) and γ (n) [y] = (p y, a y, c y, q y ), whch are consecutve n γ (n) Z (n),, do not match (.e., q x p y ). We call such a par (x, y) of postons a gap of Z (n),. The states q x and p y belong to the same SCC C of D and hence t follows from the shortcut property that D C contans a transton of the form (p x, a x, c x, p y ), for some c x N, connectng p x to p y. The descrbed operaton of connectng states through shortcuts can be appled to every gap (x, y) of Z (n), of D C on the subword u (n) Z (n),, thus resultng n a correct run γ(n),. We observe two crucal propertes about ths constructon. Frst, the number of requred operatons s at most occ (n) the fact that the gaps of Z (n), (ths follows from can only appear between the postons that n ρ (n) and from the fact that there exst at most occ (n) such occurrences). Second, the dfference n cost that results from one applcaton of ths operaton never exceeds the maxmum cost c max of the transtons n D. In vew of these propertes, we have correspond to two non-consecutve occurrences of L n α cost(γ (n) Z (n), ) cost( γ(n), ) c max occ (n). 26

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of