Edit-Distance between Visibly Pushdown Languages

Size: px

Start display at page:

Download "Edit-Distance between Visibly Pushdown Languages"

Andrea Berry
5 years ago
Views:

1 Edt-Dstance between Vsbly Pushdown Languages Yo-Sub Han 1 and Sang-K Ko 2 1 Department of Computer Scence, Yonse Unversty 50, Yonse-Ro, Seodaemun-Gu, Seoul , Republc of Korea emmous@yonse.ac.kr 2 Department of Computer Scence, Unversty of Lverpool Ashton Street, Lverpool, L69 3BX, Unted Kngdom sangkko@lverpool.ac.uk Abstract. We study the edt-dstance between two vsbly pushdown languages. It s well-known that the edt-dstance between two contextfree languages s undecdable. The class of vsbly pushdown languages s a robust subclass of context-free languages snce t s closed under ntersecton and complementaton whereas context-free languages are not. We show that the edt-dstance problem s decdable for vsbly pushdown languages and present an algorthm for computng the edt-dstance based on the constructon of an algnment PDA. Moreover, we show that the edt-dstance can be computed n polynomal tme f we assume that the edt-dstance s bounded by a fxed nteger k. Keywords: vsbly pushdown languages, edt-dstance, algorthm, decdablty 1 Introducton The edt-dstance between two words s the smallest number of operatons requred to transform one word nto the other [9]. We can use the edt-dstance as a smlarty measure between two words; the shorter dstance mples that the two words are more smlar. We can compute ths by usng the bottom-up dynamc programmng algorthm [15]. The edt-dstance problem arses n many areas such as computatonal bology, text processng and speech recognton [11, 13, 14]. Ths problem can be extended to a problem of computng the smlarty or dssmlarty between languages [4, 5, 11]. The edt-dstance between two languages s defned as the mnmum edtdstance of two words, where one word s from the frst language and the other word s from the second language. Mohr [11] consdered the problem of computng the edt-dstance between two regular languages gven by fnte-state automata (FAs of szes m and n and showed that t s computable n O(mn log mn tme. He also proved that t s undecdable to compute the edt-dstance between two context-free languages [11] usng the undecdablty of the ntersecton emptness of two context-free languages. As an ntermedate result, Han et al. [5]

2 2 Yo-Sub Han and Sang-K Ko consdered the edt-dstance between a regular language and a context-free language. Gven an FA and a pushdown automaton (PDA of szes m and n, they proposed a polynomal-tme algorthm that computes the edt-dstance between ther languages [5]. Here we study the edt-dstance problem between two vsbly pushdown languages. Vsbly pushdown languages are recognzable by vsbly pushdown automata (VPAs, whch are a specal type of pushdown automata for whch stack behavor s drven by the nput symbols accordng to a partton of the alphabet. Some lterature call these automata nput-drven pushdown automata [10]. The class of vsbly pushdown languages les n between the class of regular languages and the class of context-free languages. Recently, there have been many results about vsbly pushdown languages because of nce closure propertes. Note that context-free languages are not closed under ntersecton and complement and determnstc context-free languages are not closed under unon, ntersecton, concatenaton, and Kleene-star. On the other hand, vsbly pushdown languages are closed under all these operatons. Moreover, language ncluson, equvalence and unversalty are all decdable for vsbly pushdown languages whereas undecdable for context-free languages. Vsbly pushdown automata are useful n processng XML documents [1, 12]. For example, a vsbly pushdown automaton can process a SAX representaton of an XML document, whch s a lnear sequence of characters along wth a herarchcally nested matchng of open-tags wth closng tags. Note that the edt-dstance between two vsbly pushdown languages s undecdable f two languages are defned over dfferent vsbly pushdown alphabets snce the ntersecton emptness s undecdable [8]. Therefore, we always assume that two vsbly pushdown languages are defned over the same vsbly pushdown alphabet. We show that the edt-dstance between vsbly pushdown languages s decdable and present an algorthm for computng the edt-dstance. Moreover, the edt-dstance can be computed n polynomal tme f we assume that the edtdstance s bounded by a fxed nteger k. 2 Prelmnares Here we recall some basc defntons and fx notaton. For background knowledge n automata theory, the reader may refer to textbooks [6, 16]. The sze of a fnte set S s S. Let Σ denote a fnte alphabet and Σ denote the set of all fnte words over Σ. For m N, Σ m s the set of words over Σ havng length at most m. The th character of a word w s denoted by w for 1 w, and the subword of a word w that begns at poston and ends at poston j s denoted by w,j for 1 j w. A language over Σ s a subset of Σ. Gven a set X, 2 X denotes the power set of X. The symbol λ denotes the null word. A (nondetermnstc fnte automaton (FA s specfed by a tuple A = (Q, Σ, δ, s, F, where Q s a fnte set of states, Σ s an nput alphabet, δ : Q Σ 2 Q s a mult-valued transton functon, s Q s the start state and

3 Edt-Dstance between Vsbly Pushdown Languages 3 F Q s a set of fnal states. If F conssts of a sngle state f, we use f nstead of {f} for smplcty. When q δ(p, a, we say that state p has an out-transton to state q (p s a source state of q and q has an n-transton from p (q s a target state of p. The transton functon δ s extended n the natural way to a functon Q Σ 2 Q. A word x Σ s accepted by A f there s a labeled path from s to a state n F such that ths path spells out the word x, namely, δ(s, x F. The language L(A s the set of words accepted by A. A (nondetermnstc pushdown automaton (PDA s specfed by a tuple P = (Q, Σ, Γ, δ, s, Z 0, F, where Q s a fnte set of states, Σ s a fnte nput 2 Q Γ alphabet, Γ s a fnte stack alphabet, δ : Q (Σ {λ} Γ 2 s a transton functon, s Q s the start state, Z 0 s the ntal stack symbol and F Q s the set of fnal states. Our defnton restrcts that each transton of P has at most two stack symbols, that s, each transton can push or pop at most one symbol. We use δ to denote the number of transtons n δ. We defne the sze P of P as Q + δ. The language L(P s the set of words accepted by P. A (nondetermnstc vsbly pushdown automaton (VPA [2, 10] s a restrcted verson of a PDA, where the nput alphabet Σ conssts of three dsjont classes, Σ c, Σ r, and Σ l. Namely, Σ = Σ c Σ r Σ l. The class of the nput alphabet determnes the type of stack operaton. For example, the automaton always pushes a symbol onto the stack when t reads a call symbol n Σ c. If the nput symbol s a return symbol n Σ r, the automaton pops a symbol from the stack. Fnally, the automaton nether uses the stack nor even examne the content of the stack for the local symbols n Σ l. Formally, the nput alphabet s defned as a trple Σ = (Σ c, Σ r, Σ l, where three components are fnte dsjont sets. A VPA s formally defned by a tuple A = ( Σ, Γ, Q, s, F, δ c, δ r, δ l, where Σ = Σ c Σ r Σ l s the nput alphabet, Γ s the fnte set of stack symbols, Q s the fnte set of states, s Q s the start state, F Q s the set of fnal states, δ c : Q Σ c 2 Q Γ s the transton functon for the push operatons, δ r : Q (Γ { } Σ r 2 Q s the transton functon for the pop operatons, and δ l : Q Σ l 2 Q s the local transton functon. We use / Γ to denote the top of an empty stack. A confguraton of A s a trple (q, w, v, where q Q s the current state, w Σ s the remanng nput, and v Γ s the content on the stack. Denote the set of confguratons of A by C(A and we defne the sngle step computaton wth the relaton A C(A C(A. Push operaton: (q, aw, v A (q, w, γv for all a Σ c, (q, γ δ c (q, a, γ Γ, w Σ and v Γ. Pop operaton: (q, aw, γv A (q, w, v for all a Σ r, q δ r (q, γ, a, γ Γ, w Σ and v Γ ; furthermore, (q, aw, ϵ A (q, w, ϵ, for all a Σ r, q δ r (q,, a and w Σ. Local operaton: (q, aw, v A (q, w, v, for all a Σ l, q δ l (q, a, w Σ and v Γ. An ntal confguraton of a VPA A = ( Σ, Γ, Q, s, F, δ c, δ r, δ l s (s, w, ϵ, where s s the start state, w s an nput word and ϵ mples an empty stack. A

4 4 Yo-Sub Han and Sang-K Ko VPA accepts a word f A arrves at the fnal state by readng the word from the ntal confguraton. Formally, we wrte the language recognzed by A 1 as L(A = {w Σ (s, w, ϵ A (q, ϵ, v for some q F, v Γ }. We call the languages recognzed by VPAs the vsbly pushdown languages. The class of vsbly pushdown languages s a strct subclass of determnstc context-free languages and a strct superclass of regular languages. Whle many closure propertes such as complement and ntersecton do not hold for contextfree languages, vsbly pushdown languages are closed under most operatons ncludng other basc operatons such as concatenaton, unon, and Kleene-star. 3 Edt-Dstance The edt-dstance between two words s the smallest number of operatons that transform a word to the other. People use dfferent edt-operatons accordng to own applcatons. We consder three basc operatons, nserton, deleton and substtuton for smplcty. Gven an alphabet Σ, let Ω = {(a b a, b Σ {λ}} be a set of edt-operatons. Namely, Ω s an alphabet of all edtoperatons for deletons (a λ, nsertons (λ a and substtutons (a b. We say that an edt-operaton (a b s a trval substtuton f a = b. We call a word ω Ω an edt strng [7] or an algnment [11]. Let the morphsm h between Ω and Σ Σ be h((a 1 b 1 (a n b n = (a 1 a n, b 1 b n, where a, b Σ {λ} for 1 n. For example, a word ω = (a λ(b b(λ c(c c over Ω s an algnment of abc and bcc, and h(ω = (abc, bcc. Thus, from an algnment ω of two words x and y, we can retreve x and y usng h: h(ω = (x, y. We assocate a non-negatve edt cost to each edt-operaton ω Ω as a functon C : ω R +. We can extend the functon to the cost C(ω of an algnment ω = ω 1 ω n as follows: C(ω = n =1 C(ω. Defnton 1. The edt-dstance d(x, y of two words x and y over Σ s the mnmal cost of an algnment ω between x and y: d(x, y = mn{c(ω h(ω = (x, y}. We say that ω s optmal f d(x, y = C(ω. Note that we use the Levenshten dstance [9] for edt-dstance. Thus, we assgn cost 1 to all edt-operatons (a λ, (λ a, and (a b for all a b Σ. We can extend the edt-dstance defnton to languages. Defnton 2. The edt-dstance d(l 1, L 2 between two languages L 1, L 2 Σ s the mnmum edt-dstance of two words, one s from L 1 and the other s from L 2 : d(l 1, L 2 = mn{d(x, y x L 1 and y L 2 }. Mohr [11] revealed that the edt-dstance between two context-free languages s undecdable from the undecdablty of the ntersecton emptness of two context-free languages. Moreover, he presented an algorthm to compute the edt-dstance between two regular languages gven by FAs of sze m and n n

5 Edt-Dstance between Vsbly Pushdown Languages 5 O(mn log mn tme. Han et al. [5] consdered the ntermedate case where one language s regular and the other language s context-free. Gven a PDA and an FA for the languages, t s shown that the problem s computable n polynomal tme whle computng an optmal algnment correspondng to the edt-dstance requres an exponental runtme. 4 Edt-Dstance between Two VPLs Frst we study the decdablty of the edt-dstance between two vsbly pushdown languages. We notce that the edt-dstance between two context-free languages s undecdable, whereas the edt-dstance between two regular languages can be computed n polynomal tme. Interestngly, t turns out that the edt-dstance between two vsbly pushdown languages s decdable. Here we show that we can compute the edt-dstance between two vsbly pushdown languages L 1 and L 2 gven by two VPAs by constructng the algnment PDA [5], whch accepts all the possble algnments between two languages of length up to k. By settng k to be the upper bound of the edt-dstance between gven two vsbly pushdown languages L 1 and L 2, we can compute the edt-dstance between L 1 and L 2 by choosng the mnmum-cost algnment accepted by the algnment PDA. The algnment PDA s frst proposed by Han et al. [5] to compute the edtdstance between a regular language and a context-free language. Gven an FA A and a PDA P, we can construct an algnment PDA A(A, P that accepts all possble algnments that transform a word accepted by the FA A to a word accepted by the PDA P. After we construct an algnment PDA for two languages, we can compute the edt-dstance between the two languages by computng the length of the shortest word accepted by the algnment PDA. Furthermore, we can obtan an optmal algnment between two languages by takng the shortest word. An nterestng pont to note s that the constructon of the algnment PDA does not mply that we can compute the edt-dstance. For example, the edtdstance between two context-free languages s undecdable even though t s possble to construct an algnment PDA wth two stacks that accepts all possble algnments between two PDAs [11]. Ths s because we cannot compute the length of the shortest word accepted by a two-stack PDA, whch can smulate a Turng machne [6]. Here we start from an dea that we do not need to consder all possble algnments between two vsbly pushdown languages to compute the edt-dstance and an optmal algnment. If we know the upper bound k of the edt-dstance between two vsbly pushdown languages, we can compute the edt-dstance and an optmal algnment by constructng an algnment PDA that accepts all possble algnments of cost up to k between two vsbly pushdown languages. Ths can be done because the stack operaton of VPAs s determned by the nput character. Assume that we have two VPAs A 1 and A 2 over the same alphabet Σ. Then, the algnment PDA A(A 1, A 2 of A 1 and A 2 smulates all possble algnments

6 6 Yo-Sub Han and Sang-K Ko between the two languages L(A 1 and L(A 2. Note that A(A 1, A 2 smulates two stacks from two VPAs smultaneously by readng each edt-operaton. Whenever A(A 1, A 2 smulates a trval edt-operaton (a a, a Σ whch substtutes a character nto the same one, the dfference n heght of two stacks does not change snce two VPAs read characters n the same class. The heght of two stacks becomes dfferent when A(A 1, A 2 reads nsertons of call (or return symbols, deletons of call (or return symbols, or substtutons between two characters n dfferent classes. For example, the heght of two stacks becomes dfferent by 1 when A(A 1, A 2 read an nserton (λ a of a call symbol a Σ c snce A 1 does not change ts stack whle A 2 s pushng a stack symbol onto ts stack. The heght of two stacks becomes dfferent the most when A(A 1, A 2 reads an edtoperaton that substtutes a call (resp. return symbol nto a return (resp. call symbol snce A 1 pushes (resp. pops a stack symbol whle A 2 pops (resp. pushes a stack symbol. Therefore, f the upper bound of the edt-dstance between two vsbly pushdown languages s k, the maxmum heght dfference between two stacks can be at most 2k whenever we smulate an algnment that costs up to k. Note that we can easly compute the upper bound of the edt-dstance between two vsbly pushdown languages by computng the shortest words from each vsbly pushdown language. Let lsw(l be the length of the shortest word n L. Proposton 3. Let L Σ and L Σ be the languages over Σ. Then, d(l, L max{lsw(l, lsw(l } holds. Now we gve the algnment PDA constructon for computng the edt-dstance between two vsbly pushdown languages. The basc dea of the constructon s to remember the top stack symbols from two stacks by usng the states of the algnment PDA. Intutvely, we store the nformaton of the top 2k stack symbols from both stacks n the states nstead of pushng nto the stack of the algnment PDA. Let A = ( Σ, Γ, Q, s, F, δ,c, δ,r, δ,l for = 1, 2 be two VPAs. We construct the algnment PDA A(A 1, A 2 = (Q E, Ω, Γ E, s E, F E, δ E, where Q E = Q 1 Q 2 Γ 2k 1 Γ 2k 2 s the set of states, Ω = {(a b a, b Σ {λ}} s the alphabet of edt-operatons, Γ E = (Γ 1 {λ} (Γ 2 {λ} \ {(λ, λ} s a fnte stack alphabet, s E = (s 1, s 2, λ, λ s the start state, and F E = F 1 F 2 Γ 2k 1 Γ 2k 2 s the set of fnal states. Now we defne the transton functon δ E. There are seven cases to consder as follows. The algnment PDA A(A 1, A 2 reads an edt-operaton that ( pushes on two stacks smultaneously, ( pushes on the frst stack, ( pushes on the second stack, (v pops from two stacks smultaneously, (v pops from the frst stack, (v pops from the second stack, and

7 Edt-Dstance between Vsbly Pushdown Languages 7 (v not perform stack operaton. Assume that x Γ 2k 1 and y Γ 2k 2. Smply, x and y are words over the stack alphabet of A 1 and A 2 whose lengths are at most 2k. For a non-empty word x over Γ 1, recall that we denote the frst character and the last character of x by x 1 and x x, respectvely. We also denote the subword that conssts of characters from x to x j by x,j, where < j. Suppose that there are transtons defned n VPAs A 1 and A 2 as follows: (q, γ δ 1,c (q, a c, [push operaton n A 1 ] q δ 1,l (q, a l, [local operaton n A 1 ] q δ 1,r (q, γ, a r, [pop operaton n A 1 ] (p, µ δ 2,c (p, b c, [push operaton n A 2 ] p δ 2,l (p, b l, and [local operaton n A 2 ] p δ 2,r (p, µ, b r. [pop operaton n A 2 ] By readng an edt-operaton (a c b c, we defne δ E to operate as follows: (q, p, xγ, yµ δ E ((q, p, x, y, (a c b c f x < 2k and y < 2k, ((q, p, x 2, x γ, y 2, y µ, (x 1, y 1 δ E ((q, p, x, y, (a c b c otherwse. By the above transtons, we smulate the push operatons on the stacks of A 1 and A 2 at the same tme. We store the nformaton of the top 2k stack symbols n the states nstead of usng real stack. If a state already contans the nformaton of 2k symbols, we start usng the stack by pushng the bottommost par of stack symbols onto the stack. We also defne δ E to operate as follows by readng an edt-operaton (a c b l : (q, p, xγ, y δ E ((q, p, x, y, (a c b l f x < 2k, ((q, p, x 2, x γ, y 2, y, (x 1, y 1 δ E ((q, p, x, y, (a c b l otherwse. Smlarly, we defne δ E for an edt-operaton (a c λ as follows: (q, p, xγ, µ δ E ((q, p, x, y, (a c λ f x < 2k, ((q, p, x 2, x γ, y 2, y, (x 1, y 1 δ E ((q, p, x, y, (a c λ otherwse. Note that the cases of readng (a l b c or (λ b c are completely symmetrc to the prevous two cases. We also consder the cases when we have to pop at least one of two stacks by readng an edt-operaton. Frst, we defne δ E for the case when we read an edt-operaton (a r b r that pops from both stacks at the same tme. (q, p, γ top x 1, x 1, µ top y 1, y 1 δ E ((q, p, x, y, (γ top, µ top, (a r b r f x x = γ, y y = µ and the top of the stack s (γ top, µ top, (q, p, x 1, x 1, y 1, y 1 δ E ((q, p, x, y, (a r b r f x x = γ, y y = µ and the stack s empty. When we smulate pop operatons on A(A 1, A 2, we remove the last stack symbols stored n the state. Then, we pop off the top of the stack, say (γ top, µ top, and move the par to the front of the stored stack symbols n the state. For the case when we have to pop only from the frst stack, we defne δ E to be as follows:

8 8 Yo-Sub Han and Sang-K Ko (q, p, γ top x 1, x 1, µ top y δ E ((q, p, x, y, (γ top, µ top, (a r b l f x = 2k, y < 2k, and the top of the stack s (γ top, µ top, (q, p, x 1, x 1, y δ E ((q, p, x, y, (a r b l otherwse. Note that x x = γ should hold for smulatng pop operatons on the frst stack. We also defne the transtons for the edt-operatons of the form (a r λ smlarly. Agan, the pop operatons on the second stack are completely symmetrc. Lastly, we defne δ E for the case when we do not touch the stack at all. There are three possble cases as follows: (q, p, x, y δ E ((q, p, x, y, (a l b l, (q, p, x, y δ E ((q, p, x, y, (a l λ, and (q, p, x, y δ E ((q, p, x, y, (λ b l. Now we prove that the constructed algnment PDA A(A 1, A 2 smulates all possble algnments of cost up to k between L(A 1 and L(A 2. Lemma 4. Gven two VPAs A 1 and A 2, t s possble to construct the algnment PDA A(A 1, A 2 that accepts all possble algnments between L(A 1 and L(A 2 of cost up to the constant k. Proof. Let us consder the smulatons of two VPAs A 1 and A 2 for an algnment ω. Gven h(ω = (x, y, the VPA A 1 has to smulate a word x whle the VPA A 2 s smulatng a word y. Suppose that x has 2k call symbols and y has k call symbols, where k < 2k, and no return symbols. Then, the stack contents of A 1 and A 2 should be γ 1 γ 2 γ 2k and µ 1 µ 2 µ k, respectvely, where γ Γ 1 for 1 2k and µ j Γ 2 for 1 j k. See Fg. 1 for llustraton of ths example. On state q n VPA A 1 γ 2k γ 2k 1. γ 2 γ 1 On state p n VPA A 2 On state (q, p, γ 1γ 2 γ 2k, µ 1µ 2 µ k n algnment PDA A(A 1, A 2 µ k. µ 2 µ 1 Fg. 1. Illustraton of how we store the nformaton of two stacks n the states of the algnment PDA A(A 1, A 2. After ths, we push the par (γ 1, µ 1 of two stack symbols n the bottom of two stacks onto the stack of A(A 1, A 2 f we read an edt-operaton (a b where a s a call symbol of A 1. Let us assume that A 1 pushes γ 2k+1 by readng a and A 2 pushes µ k +1 by readng b. Then, we push two stack symbols γ 2k+1

9 Edt-Dstance between Vsbly Pushdown Languages 9 and µ k +1 onto the smulated stacks stored n the state. See Fg. 2 to see what happens after readng (a b. q q a In VPA A 1 In VPA A 2 p γ 2k+1 γ 2k. γ 2 γ 1 p b µ k +1 µ k. µ 2 µ 1 (q, p, γ 1 γ 2k, µ 1 µ k (a b (q, p, γ 1 γ 2k+1, µ 1 µ k +1 In algnment PDA A(A 1, A 2 γ 1, µ 1 Fg. 2. Illustraton of how we store the nformaton of two stacks n the states of the algnment PDA A(A 1, A 2. Underlned stack symbols γ 2k+1 and µ k +1 are pushded by the current transtons of VPAs. In ths way, we can make the stack of A(A 1, A 2 to be always synchronzed. Note that we only push stack symbols onto the stack of A(A 1, A 2 when we need to push a stack symbol onto the stack where the smulated stack stored n the state s full. Therefore, the stack of A(A 1, A 2 should be empty f the maxmum heght of two stacks stored n the state s less than 2k. When we read an edtoperaton that pops a stack symbol from A 1 or A 2, we pop the stack symbols from the stack contents stored n the states. If the heght of the stack n the state s 2k before we pop a stack symbol, we pop a stack symbol from the top of the real stack of A(A 1, A 2 and move to the bottom of the smulated stack stored n the state of A(A 1, A 2. By storng stack nformaton n the states nstead of real stacks, A(A 1, A 2 accepts algnments where the heght dfference of two stacks durng the smulaton can be at most 2k. We menton that A(A 1, A 2 also accepts algnments of cost hgher than k f the smulaton of the algnments does not requre the stack heght dfference to be larger than 2k. Now we prove that the algnment PDA A(A 1, A 2 accepts an algnment ω, where C(ω k f and only f ω s an algnment satsfyng C(ω k and h(ω = (x, y, where x L(A 1 and y L(A 2. (= Snce A(A 1, A 2 accepts an algnment ω, where C(ω k, there exsts an acceptng computaton X ω of A(A 1, A 2 on ω endng n a state (f 1, f 2 where f 1 F 1, f 2 F 2. We assume that ω = w 1 w l, where w Ω for l. Denote the sequence of states of A(A 1, A 2 appearng n the computaton X ω just before readng the lth symbol of ω as C 0,..., C l 1, and denote the state (f 1, f 2 where the computaton ends as C l. Consder the frst component q Q 1 of the state C Q E, for 0 l, and the frst component a j Σ {λ} of the edt-operaton w j, for 1 j l. From the constructon of A(A 1, A 2, t follows that

10 10 Yo-Sub Han and Sang-K Ko (q +1, γ δ A,c (q, a +1, f a +1 Σ c q +1 δ A,l (q, a +1, f a +1 Σ l q +1 δ A,r (q, γ, a +1, and f a +1 Σ r q +1 = q. f a +1 = λ for 0 l 1. Note that the transtons of δ E readng an nserton operaton (λ b do not change the frst components of the states. Thus, the frst components of the state C 0,..., C l spell out an acceptng computaton of A 1 on the word x = a 1 a l obtaned by concatenatng the frst components of the edt-operatons of ω. Usng a smlar argument for the word y obtaned by concatenatng the second components of ω, we can show that the computaton yelds an acceptng computaton of A 2 on y. ( = Let ω = (ω L(1 ω R(1 (ω L(2 ω R(2 (ω L(l ω R(l be an algnment of length l for x = ω L(1 ω L(2 ω L(l L(A 1 and y = ω R(1 ω R(2 ω R(l L(A 2. Let C = C 0 C 1 C m, m N, be a sequence of confguratons of the VPA A 1 that traces an acceptng computaton X A,x on the word x. Assumng that C s (q, γ, the confguraton C +1 s obtaned from C by applyng a transton (q +1, γ +1 δ A (q, a, γ, where a Σ {λ}. Note that ω L(j or ω R(j may be the empty word f (ω L(j ω R(j s an edt-operaton representng an nserton or a deleton operaton. Suppose that the computaton step C C +1 consumes h empty words. Then n the sequence C after the confguraton C, we add h 1 dentcal copes of C. Then, we denote the modfed sequence of confguratons C = C 0C 1 C l, l N. Analogously, let D = D 0D 1 D l be a sequence of confguratons of the VPA A 2 that traces an acceptng computaton X B,y on the word y. From the sequences C and D, we obtan a sequence of confguratons of A(A 1, A 2 descrbng an acceptng computaton on the algnment ω. For the deleton operatons (ω L(j λ, the state s changed just n the frst component and the confguraton of A 2 remans unchanged. Recall that we have added the dentcal copes of confguratons for smulatng ths case. The deleton operatons can be smulated symmetrcally. For the substtuton operatons (ω L(j ω R(j, the computaton step smulates both a state transton of A 1 on ω L(j and a state transton of A 2 on ω R(j. Ths mples that two modfed sequences of confguratons of A 1 and A 2 can be combned to yeld an acceptng computaton of A(A 1, A 2 on ω. Let us consder the sze of the constructed algnment PDA A(A 1, A 2. Let m = Q, n = δ,c + δ,r + δ,l, and l = Γ for = 1, 2. Note that each state contans an nformaton of a par of states from A 1 and A 2, and a stack nformaton of at most 2k stack symbols of A 1 and A 2. If we represent a stack where the heght of the stack s restrcted to 2k wth a word over the stack alphabet Γ 1, we have l 1 = 2k l 1 = 1 + l 1 l2k 1 1 l 1 1

11 Edt-Dstance between Vsbly Pushdown Languages 11 possble words. Smlarly, we defne l 2 to be the number of possble words over Γ 2. Therefore, the number of states n A(A 1, A 2 s n m 1 m 2 l 1l 2 = m 1 m 2 l1 l2 The sze of the stack alphabet Γ E s l 1 l 2 snce we use all pars of the stack symbols where the frst stack symbol s from Γ 1 and the second stack symbol s from Γ 2. The sze of the transton functon δ E s m 1 m 2 n 1 n 2 l1 l2 By Lemma 4, we can obtan an algnment PDA from two VPAs A 1 and A 2 to compute the edt-dstance d(l(a 1, L(A 2. Recently, Han et al. [5] studed the edt-dstance between a PDA and an FA. The basc dea s to construct an algnment PDA from a PDA and an FA and compute the length of the shortest algnment from the algnment PDA. As a step, they present an algorthm for obtanng a shortest word and computng the length of the shortest word from a PDA. For the sake of completeness, we nclude the followng proposton. Proposton 5 (Han et al. [5]. Gven a PDA P = (Q, Σ, Γ, δ, s, Z 0, F P, we can obtan a shortest word n L(P whose length s bounded by 2 m2 l+1 n O(n 2 m2l worst-case tme and compute ts length n O(m 4 nl worst-case tme, where m = Q, n = δ and l = Γ. Now we establsh the followng runtme for computng the edt-dstance between two VPAs. Theorem 6. Gven two VPAs A = (Σ, Γ, Q, s, F, δ,c, δ,r, δ,l for = 1, 2, we can compute the edt-dstance between L(A 1 and L(A 2 n O((m 1 m 2 5 n 1 n 2 (l 1 l 2 10k worst-case tme, where m = Q, n = δ,c + δ,r + δ,l, l = Γ for = 1, 2 and k = max{lsw(l(a 1, lsw(l(a 2 }. If we replace k wth the length 2 Q 2 Γ +1 of a shortest word from a VPA from the tme complexty obtaned n Theorem 6, we have double exponental tme complexty for computng the edt-dstance between two VPAs. It s stll an open problem to fnd a polynomal algorthm for the problem or to establsh any hardness result. Instead, we can observe that the edt-dstance problem for VPAs can be computed n polynomal tme f we lmt the edt-dstance to a fxed nteger k. Corollary 7. Let A 1 and A 2 be two VPAs and k be a fxed nteger such that d(l(a 1, L(A 2 k.then, we can compute the edt-dstance between L(A 1 and L(A 2 n polynomal tme...

12 12 Yo-Sub Han and Sang-K Ko As a corollary of Theorem 6, we can also establsh the followng result. A vsbly counter automaton (VCA [3] can be regarded as a VPA wth a sngle stack symbol. It s nterestng to see that we can compute the edt-dstance between two VCAs n polynomal tme when we are gven t Corollary 8. Gven two VCAs A 1, A 2 and a postve nteger k N n unary such that d(l(a 1, L(A 2 k, we can compute the edt-dstance between L(A 1 and L(A 2 n polynomal tme. References 1. R. Alur. Marryng words and trees. In Proceedngs of the 26th ACM SIGMOD- SIGACT-SIGART Symposum on Prncples of Database Systems, PODS 07, pages , R. Alur and P. Madhusudan. Vsbly pushdown languages. In Proceedngs of the 36th Annual ACM Symposum on Theory of Computng, pages , V. Bárány, C. Lödng, and O. Serre. Regularty problems for vsbly pushdown languages. In Proceedngs of the 23rd Annual Symposum on Theoretcal Aspects of Computer Scence, pages , C. Choffrut and G. Pghzzn. Dstances between languages and reflexvty of relatons. Theoretcal Compututer Scence, 286(1: , Y.-S. Han, S.-K. Ko, and K. Salomaa. The edt-dstance between a regular language and a context-free language. Internatonal Journal of Foundatons of Computer Scence, 24(7: , J. Hopcroft and J. Ullman. Introducton to Automata Theory, Languages, and Computaton. Addson-Wesley, Readng, MA, 2nd edton, L. Kar and S. Konstantnds. Descrptonal complexty of error/edt systems. Journal of Automata, Languages and Combnatorcs, 9: , J. Leke. VPL ntersecton emptness. Bachelor s Thess, Unversty of Freburg, V. I. Levenshten. Bnary codes capable of correctng deletons, nsertons, and reversals. Sovet Physcs Doklady, 10(8: , K. Mehlhorn. Pebblng mountan ranges and ts applcaton to DCFL-recognton. In Automata, Languages and Programmng, volume 85, pages M. Mohr. Edt-dstance of weghted automata: General defntons and algorthms. Internatonal Journal of Foundatons of Computer Scence, 14(6: , B. Mozafar, K. Zeng, and C. Zanolo. From regular expressons to nested words: Unfyng languages and query executon for relatonal and xml sequences. Proceedngs of the VLDB Endowment, 3(1-2: , P. A. Pevzner. Computatonal molecular bology - an algorthmc approach. MIT Press, K. Thompson. Regular expresson search algorthm. Communcatons of the ACM, 11(6: , R. A. Wagner and M. J. Fscher. The strng-to-strng correcton problem. Journal of the ACM, 21: , D. Wood. Theory of Computaton. Harper & Row, 1987.

13 Edt-Dstance between Vsbly Pushdown Languages 13 Appendx Context-free grammar (CFG. A context-free grammar (CFG G s a fourtuple G = (V, Σ, R, S, where V s a set of varables, Σ s a set of termnals, R V (V Σ s a fnte set of productons and S V s the start varable. Let αaβ be a word over V Σ, where A V and A γ R. Then, we say that A can be rewrtten as γ and the correspondng dervaton step s denoted αaβ αγβ. A producton A t R s a termnatng producton f t Σ. The reflexve, transtve closure of s denoted by and the context-free language generated by G s L(G = {w Σ S w}. We say that a varable A V s nullable f A λ. Proposton 3. Let L Σ and L Σ be the languages over Σ. Then, holds. d(l, L max{lsw(l, lsw(l } Proof. Assume that lsw(l = m and lsw(l = n where n m. It s easy to see that the edt-dstance between two shortest words can be at most m snce we can substtute all characters of the shortest word of length n wth any subsequence of the longer word and nsert the remanng characters. Proposton 5. (Han et al. [5] Gven a PDA P = (Q, Σ, Γ, δ, s, Z 0, F P, we can obtan a shortest word n L(P whose length s bounded by 2 m2 l+1 n O(n 2 m2l worst-case tme and compute ts length n O(m 4 nl worst-case tme, where m = Q, n = δ and l = Γ. Proof. Recall that we can convert a PDA nto a CFG by the trple constructon [6]. Let us denote the CFG obtaned from P by G P. Then, G P has Q 2 Γ +1 varables and Q 2 δ productons. Moreover, each producton of G P s of the form A σbc, A σb, A σ or A λ, where σ Σ and A, B, C V. Snce we want to compute the shortest word from G P, we can remove the occurrences of all nullable varables from G P. Then, we pck a varable A that generates the shortest word t Σ among all varables and replace ts occurrence n G P wth t. We can compute the shortest word of L(P by teratvely removng occurrences of such varables. We descrbe the algorthm n Algorthm 1. Snce a producton of G P has at most one termnal followed by two varables, the length of the word to be substtuted s at most 2 m 1 when we replace mth varable. Snce we replace at most Q 2 Γ varables to have the shortest word, the length of the shortest word n L(P can be at most 2 Q 2 Γ +1. Snce there are at most 2 R occurrences of varables n R and V varables, we replace 2 R V occurrences of a gven varable on average. Therefore, the worst-case tme complexty for fndng a shortest word s O(n 2 m2l. We also note that we can compute only the length of the shortest word n O(m 4 nl worst-case tme by encodng a shortest word to be substtuted wth a bnary number.

14 14 Yo-Sub Han and Sang-K Ko Algorthm 1: ShortestLength(P Input: A PDA P = (Q, Σ, Γ, δ, s, Z 0, F P Output: lsw(l(p 1: convert P nto a CFG G P = (V, Σ, R, S by the trple constructon 2: elmnate all nullable varables 3: for B t R, where t Σ and t s mnmum among all such t n R do 4: f B = S then 5: return t 6: else 7: replace all occurrences of A n R wth t 8: remove A from V and ts productons from R 9: end f 10: end for Theorem 6. Gven two VPAs A = (Σ, Γ, Q, s, F, δ,c, δ,r, δ,l for = 1, 2, we can compute the edt-dstance between L(A 1 and L(A 2 n O((m 1 m 2 5 n 1 n 2 (l 1 l 2 10k worst-case tme, where m = Q, n = δ,c + δ,r + δ,l, l = Γ for = 1, 2 and k = max{lsw(l(a 1, lsw(l(a 2 }. Proof. In the proof of Lemma 4, we have shown that we can construct an algnment PDA A(A 1, A 2 = (Q E, Ω, Γ E, s E, F E, δ E that accepts all possble algnments between two VPAs A 1 and A 2 of length up to k. From Proposton 5, we can compute the edt-dstance n O(m 4 nl tme, where m = Q E, n = δ E and l = Γ E. Recall that m = m 1 m 2 l1 and l = l 1 l 2. Note that l 1 l2 O(l 2k 1 and, n = m 1 m 2 n 1 n 2 l2 l1 O(l 2k 2 l2 f l 1, l 2 > 0. Therefore, the tme complexty of computng the edt-dstance between two VPAs A 1 and A 2 s (m 1 m 2 5 n 1 n 2 l 1 5 l 2 5 l 1 l 2 O((m 1 m 2 5 n 1 n 2 (l 1 l 2 10k, where k s the maxmum of the length of the two shortest words from L(A 1 and L(A 2. Corollary 8. Gven two VCAs A 1, A 2 and a postve nteger k N n unary such that d(l(a 1, L(A 2 k, we can compute the edt-dstance between L(A 1 and L(A 2 n polynomal tme.,

15 Edt-Dstance between Vsbly Pushdown Languages 15 Proof. If l 1 = l 2 = 1, l 1 O(k and l2 O(k. If we replace l 2k 1 and l 2k 2 by k from the tme complexty, we obtan the tme complexty O((m 1 m 2 5 n 1 n 2 k 10 whch s polynomal n the sze of nput.

Basic Regular Expressions. Introduction. Introduction to Computability. Theory. Motivation. Lecture4: Regular Expressions

Basic Regular Expressions. Introduction. Introduction to Computability. Theory. Motivation. Lecture4: Regular Expressions Introducton to Computablty Theory Lecture: egular Expressons Prof Amos Israel Motvaton If one wants to descrbe a regular language, La, she can use the a DFA, Dor an NFA N, such L ( D = La that that Ths