Automata for Analyzing and Querying Compressed Documents Barbara FILA, LIFO, Orl eans (Fr.) Siva ANANTHARAMAN, LIFO, Orl eans (Fr.) Rapport No

Size: px
Start display at page:

Download "Automata for Analyzing and Querying Compressed Documents Barbara FILA, LIFO, Orl eans (Fr.) Siva ANANTHARAMAN, LIFO, Orl eans (Fr.) Rapport No"

Transcription

1 Automt for Anlyzing nd Querying Compressed Documents Brr FILA, LIFO, Orléns (Fr.) Siv ANANTHARAMAN, LIFO, Orléns (Fr.) Rpport N o

2 Automt for Anlyzing nd Querying Compressed Documents Brr Fil, Siv Annthrmn LIFO - Université d Orléns (Frnce), e-mil: {fil, siv}@univ-orlens.fr Astrct. In first prt of this work, tree/dg utomt re defined s extensions of (unrnked) tree utomt which cn run indifferently on trees or dgs; they cn thus serve s tools for nlyzing or querying ny semi-structured document, whether or not given in compressed formt. In second prt of the work, we present method for evluting positive unry queries, expressed in terms of Core XPth xes, on ny dg t representing n XML document possily given in compressed form; the evlution is done directly on t, without unfolding it into tree. To ech Core XPth query of certin sic type, we ssocite word utomton; these utomt run on the grph of dependency etween the non-terminls of the miniml strightline regulr tree grmmr ssocited to the given dg t, or long complete siling chins in this grmmr. Any given positive Core XPth query cn e decomposed into queries of the sic type, nd the nswer to the query, on the dg t, cn then e expressed s su-dg of t whose nodes re suitly leled under the runs of such utomt. Keywords: Tree utomt, Tree grmmrs, Dgs, XML, Core XPth. 1 Introduction Severl lgorithms hve een optimized in the pst, y using structures over dgs insted of over trees. Tree utomt re widely used for querying XML documents (e.g., [8, 9,15,16]); on the other hnd, the notion of compressed XML document hs een introduced in [2, 7,12], nd possile dvntge of using dg structures for the mnipultion of such documents hs een rought out in [12]. It is legitimte then to investigte the possiility of using utomt over dgs insted of over trees, for querying compressed XML documents. Dg utomt (DA) were first introduced nd studied in [5]; DA ws defined there s nturl extension of tree utomton, i.e. s ottom-up tree utomton running on dgs; nd the lnguge of DA ws defined s the set of dgs tht get ccepted under (ottom-up) runs, defined in the usul sense; the emptiness prolem for DAs ws shown there to e NP-complete, nd the memership prolem proved to e in NP; ut the prolem of stility under complementtion of the clss of dg utomt closely linked with tht of determiniztion ws left open. These two issues hve since een settled negtively in [1]: the reson is tht the set of ll terms (trees) represented y the set of dgs ccepted y non-deterministic DA is not necessrily regulr tree lnguge; consequence is tht the clss of tree lnguges recognized y DAs (s sets of ccepted dgs) is strict superclss of the clss of regulr tree lnguges. It is well-known however, tht nswers to MSO-definle queries on (semi-)structured trees form regulr tree lnguges ([18]); it is thus necessry to define the lnguges of DAs in mnner different from tht of [5,1], if they re to serve s tools for nlyzing nd querying document, independently of whether it is given in (prtilly or fully) compressed formt, or s tree. Our first im in this work is therefore to redefine the notion of the lnguge of DA suitly, with such n ojective. 2

3 For chieving tht, we first present (in Section 2) the notion of compressed document s tree/dg (trdg, for short), designting directed cyclic grph tht my e prtilly or fully compressed. The terminology trdg hs een chosen to distinguish it from tht of tdg employed in [1]; this ltter term will e employed in this pper when referring to fully compressed dg. A Tree/Dg utomton (TDA, for short) is then defined s n utomton which runs on trdgs. The essentil differences with the DAs of [1] re the following: (i) our TDAs cn e unrnked, nd (ii) lthough the trnsition rules of TDA look quite like those of the DAs in [1], or those of TAs, run of TDA on ny given trdg t will crry with it not only ssignments of sttes to the nodes of t, ut lso to the edges of t; runs will e so defined tht TDA ccepts ny given trdg t if nd only if it ccepts the tree ˆt otined y uncompressing t, s tree utomton running on the tree ˆt, in the usul sense. In the second prt of the pper, we present n pproch sed on word utomt for evluting queries on trdgs tht represent XML documents in prtilly or fully compressed formt; the terms trdg nd document will therefore e considered synonymous in the sequel. Any given trdg t is first seen s equivlent to miniml strightline regulr tree grmmr L t, tht one cn nturlly ssocite with t, cf. e.g., [3, 4]. From the grmmr L t, we construct the grph of dependency D t etween its non-terminls, nd lso the chilings (liner grphs formed of complete chins of siling non-terminls) of L t. The word utomt tht we uild elow will run on D t or the chilings of L t, rther thn on the document t itself. We shll only consider positive unry queries expressed in terms of Core XPth xes. (The view we dopt llows us to define the vrious xes of Core XPth on compressed documents, in mnner which does not modify their semntics on trees.) For evluting ny such query on ny document (trdg) t, we proceed s follows. We first rek up the given query into sic su-queries of the form Q= //*[xis::] where xis is Core XPth xis of certin type. To ech such sic query Q, we ssocite word utomton A Q. The utomton A Q runs on the grph D t when xis is non-siling, nd on the chilings of L t when xis is siling xis. An essentil point in our method is tht the runs of A Q re guided y some well-defined semntics for the nodes trversed, indicting whether the current node nswers Q, or is on pth leding to some other node nswering Q. The utomton, though not deterministic, is mde effectively unmiguous y defining suitle priority reltion etween its trnsitions, sed on the semntics. A sic query Q cn then e evluted in one single top-down pss of A Q, under such n unmiguous run. An ritrry positive unry Core XPth query cn e evluted on t y comining the nswers to its vrious sic su-queries, nd its nswer set is expressed s su-trdg of t, whose nodes get leled in conformity with the semntics. It is importnt to note tht the evlution is performed on the given trdg t; s such, on two different trdgs corresponding to two different compressions of sme XML tree, the nswers otined my not e the sme, in generl. The pper is structured s follows: Section 2 presents the notions of trdgs, nd of Tree/Dg utomt. In Section 3, we construct from ny trdg t its normlized strightline regulr tree grmmr L t, s well s the dependency grph D t nd the chilings of L t ; these will e seen s rooted leled cyclic grphs (rlgs, for short); the sic notions of Core XPth re lso reclled. Section 4 is devoted to the construction of the word utomt for ny sic Core XPth query, sed on the semntics, nd n illustrtive exmple. In Section 5 we prove tht the runs of these utomt, uniquely nd effectively determined under mximl priority condition, generte the nswers to the queries. Section 6 shows how non sic (composite, or imricted) Core XPth query cn e evluted 3

4 in stepwise fshion. In Section 7, we show how to refine our pproch, so s to derive, from the nswer for ny given Core Xpth query Q on trdg t, the nswer set for the sme query Q on the tree-equivlent ˆt of t. In the ppendices, we show how to trnslte the usul Core XPth queries into one in stndrd form on which our pproch is pplicle (the trnsltion is done in liner time on the size of the given query); we lso present polynomil time lgorithm for constructing the mximl priority run, for ny sic query utomton over ny given document (trdg), with complexity ound of O(n 3 ), where n is the numer of nodes of the trdg; the ound reduces to O(n 2 ) on trees where the reltion Prents is trivil; complete illustrtive exmple, on composite imricted query, is given in the lst ppendix. 2 Tree/Dg Automt Definition 1 A tree/dg (trdg for short) over not necessrily rnked lphet Σ is rooted dg (directed cyclic grph) t = (N odes(t), Edges(t)), where, for ny node u Nodes(t): - u hs nme nme t (u) = nme(u) Σ; - the edges going out of ny node re ordered; - nd if nme(u) is rnked, then the numer of outgoing edges t u is the rnk of nme(u). Given ny node u on trdg t, the notion of the su-trdg of t rooted t u is defined s usul, nd denoted s t u. If v is ny node, γ(v) = u 1...u n will denote the string of ll its not necessrily distinct children nodes; for every 1 i n, the i-th outgoing edge from v to its i-th child node u i γ(v) will e denoted s i e(v, i); we shll lso write then v u i ; the set of ll outgoing (resp. incoming) edges t ny node v will e denoted s Out v (t), or Out v (resp. In v (t), or In v ); nd for ny node u, we set: Prents(u) = {v Nodes(t) u is child of v}. A trdg t will e sid to e tree iff for every node u on t other thn the root, Prents(u) is singleton. For ny trdg t, we define the set Pos(t) s the set of ll the positions pos t (u) of ll its nodes u, these eing defined recursively, s follows: if u is the root node on t, then pos t (u) = ǫ, otherwise, pos t (u) = {α.i α pos t (v), v is prent of u, u is n i-th child of v}. The set Pos(t) consists of (some of the) words over nturl integers. To ny edge e : u i v on trdg t, is nturlly ssocited the suset pos t (e) = pos t (u).i of Pos(t). The function nme t is extended nturlly to the positions in Pos(t) s follows: for every u Nodes(t) nd α pos t (u), we set nme t (α) = nme t (u). Given trdg t, we define its tree-equivlent s tree ˆt such tht: Pos(ˆt) = Pos(t), nd for every α Pos(t) we hve nme t (α) = nmeˆt (α). It is immedite tht ˆt is uniquely determined, up to tree isomorphism; it cn ctully e constructed cnoniclly (cf. [7]), y tking for nodes the set Pos(t), nd for directed edges the set {(α, α.i) α, α.i Pos(t)}, ech node α eing nmed with nme t (α). There is then nturl, nme preserving, surjective mp from N odes(ˆt) onto N odes(t); it will e referred to in the sequel s the compression mp, nd denoted s c. A trdg is sid to e tdg, or fully compressed, iff for ny two different nodes u, u on t, the two su-dgs t u nd t u hve non-isomorphic tree-equivlents; otherwise, the trdg is sid to e prtilly compressed when it is not tree. For exmple, the tree to the left of Figure 1 is the tree-equivlent of the prtilly compressed trdg to the right, nd lso to the fully compressed tdg to the middle. We define now the notion of Tree/Dg utomton, first over rnked lphet Σ, to fcilitte understnding. The definition is then esily extended to the unrnked cse. 4

5 f f f Tree Fully Compressed Prtilly Compressed Fig.1. tree, tdg, nd trdg Definition 2 A Tree/Dg utomton (TDA, for short) over rnked lphet Σ is tuple (Σ, Q, F, ), where Q is finite non-empty set of sttes, F Q is the set of finl (or ccepting) sttes, nd is set of trnsition rules of the form: f(q 1,..., q k ) q, where f Σ is of rnk k, nd q 1,..., q k, q Q. It will e convenient to write the trnsition rules of TDA in different (ut equivlent) form: trnsition of the form f(q 1,...,q k ) q is lso written s (f, q 1... q k ) q, where q 1...q k is seen s word in Q, of length = rnk(f) in the rnked cse. The notion of TDA is then extended esily to the unrnked cse, i.e., where the signture symols nming the nodes re not ssumed to e of fixed rnk: it suffices to define the trnsitions to e of the form (f, ω) q, where ω Q ; we my ssume wlog tht ω is -regulr expression on Q not involving +, y replcing rule (f, ω + ω ) q, y the two rules (f, ω) q, (f, ω ) q. A TDA is sid to e ottom-up deterministic iff whenever there re two trnsition rules of the form (f, ω) q, (f, ω ) q, with q q, we hve necessrily ω ω = ; otherwise it is sid to e non-deterministic. We lso gree to denote the trnsitions of the form (f, ) q simply s f q, nd refer to them s initil trnsitions. For defining the notion of runs of TDAs on trdg in ottom-up style, we need some preliminries. Let A e TDA with stte set Q nd trnsition set. Suppose t is trdg nd ssume given mp M : Edges(t) Q. If u is ny node on t with u 1... u n s the string of ll its (not necessrily distinct) children, the string M(e(u, 1))...M(e(u, n)), formed of sttes ssigned y M to the outgoing edges t u, will e denoted s M(Out u ). We then define, recursively in ottom-up style, inry reltion t u on the sttes of Q, with respect to (w.r.t. or wrt, for short) the given mp M; this reltion, denoted s M u = u, is defined s follows: Definition 3 Let A, t, M e s ove, nd u ny given node on the trdg t. If u is lef with nme(u) =, then q u q iff whenever q we lso hve q ; otherwise q u q iff: (i) (nme(u), M(Out u )) q is n instnce of trnsition rule in ; i.e., hs rule (nme(u), ω) q such tht M(Out u ) is in ω; (ii) there exists mp q : Q Q, such tht: - q (q) = q, nd the rule (nme(u), q (M(Out u ))) q is lso n instnce of trnsition rule in ; - for ny edge e : u i u Out u, we hve: M(e) u q (M(e)). Definition 4 Let A = (Σ, Q, F, ) e ny given TDA, nd t ny given trdg. A run of A on t is pir (r, M), where r: Nodes(t) Q nd M : Edges(t) Q re mps such tht the following conditions hold, t ny node u on t: 5

6 (1) if nme(u) = f, then the rule (f, M(Out u )) r(u) is n instnce of trnsition rule in ; (2) there is n incoming edge e In u with M(e) = r(u); nd for every e In u such tht M(e ) = q q = r(u), we hve q M u q A run (r, M) is ccepting on trdg t iff r(ǫ) F, i.e, r mps the root-node of t to n ccepting stte. A trdg t is ccepted y TDA iff there is n ccepting run on t. The lnguge of TDA is the set of ll trdgs tht it ccepts. Remrk 1. i) Note tht if t is tree, then In u is singleton t every non-root node u on t, so run (r, M) of ny TDA on t cn e identified with its first component r; we get then the usul notion of runs of tree utomt on trees. Exmple 1. Over the unrnked signture {, f, g} consider TDA A, with the following trnsitions: p, q, p, q, (, p) q, (, q) p, (, q ) q, (g, q Q ) q, (g, p q) p, (f, q p q) q fin, (f, p Q ) q fin, with Q = {p, q, q, q fin }, nd q fin s the unique ccepting stte. An ccepting ottom-up run of A on tdg is depicted on the left of Figure 2, nd on its right, the sme run s seen on the tree equivlent of the tdg. f q fin f q fin p q p p p g q q p g q g p q q p q q q q p Fig.2. A ottom-up ccepting run of the TDA of Exmple 1 on trdg, nd the sme seen on its tree equivlent. A few comments on the ove run my e of help: we strt with ssigning stte q to the lef node, under r; the ssignments of stte q under M to ll the incoming edges t this node poses no prolem; we cn then ssign stte p to node, nd susequently lso p to the node g, under r, vi the trnsition rule (g, pq) p; we then ssign p under M to the first incoming edge t g; to ssign stte q under M to the second incoming edge t g, we just need to check tht: - for mp : Q Q such tht (p) = q, (q) = p, the rule (g, (p)(q)) q is n instnce of trnsition rule of the TDA; - for the outgoing edge g, leled with p y M, we hve p q = (p); - for the outgoing edge g, leled with q y M, we do hve q p = (q); reching q fin t the root-node is trivil vi the lst trnsition rule. (Note tht 6

7 we could hve s well ssigned p under M to the second incoming edge t g, with no conditions to check, then rech q fin.) Remrk 1 (contd.). ii) Unlike the DAs of [5] or [1], the following ottom-up non-deterministic TDA: q 1, q 2, f(q 1, q 2 ) q, with q 0, q 1, q s sttes where q is ccepting, hs non-empty lnguge: s TDA it ccepts f(, ). For deterministic TDA, we hve the following result (s expected): Proposition 1 Let A e ottom-up deterministic TDA, nd t ny given trdg; then there is t most one run of A on t. Proof. Let Q e the set of sttes of A, nd M : Edges(t) Q ny given mp ssigning sttes to the edges on t. We shll show y induction tht the hypothesis of determinism on A implies tht, t ny node u on t, the inry reltion M u = u defined ove (Definition 3), w.r.t. the mp M, is the identity reltion on the set Q. The proposition will then follow from conditions (1) nd (2) on runs, cf. Definition 4; we will get, in prticulr, tht for every incoming edge e t u, M(e) must e the sme s r(u); so the run cn e identified with its first component r (s on tree). The induction will e on non-negtive integer d u, tht we define t ny node u of t nd refer to s its height on t s the mximl numer of rcs on t from u to the lef nodes. If d u = 0, then u is lef node; tht u is the identity reltion on Q in this cse is immedite, from the determinism of A, nd the definition of u. So, ssume tht d u > 0, nd let v 1...v n e the string of ll the children nodes of u on t. By the inductive hypothesis, for every i, 1 i n, the reltion vi is the identity reltion on Q; it follows then, from the conditions (i) nd (ii) on the reltion u (Definition 3), tht this ltter must lso e the identity reltion on Q. We my now formulte the principl result of the first prt of this pper: Proposition 2 i) A TDA ccepts trdg t if nd only if it ccepts the tree equivlent of t. ii) The emptiness prolem for TDA is decidle in time P w.r.t. its numer of sttes. iii) The uniform memership prolem for TDA is decidle in time NP (resp. time P) w.r.t. its numer of sttes, nd the numer of edges (resp. nd the numer of positions) on the given trdg. Proof. Let ˆt e the tree equivlent of the trdg t, nd c the nturl surjective compression mp from Nodes(ˆt) onto Nodes(t). Property i): For proving the only if prt, one uses the following resoning, coupled with induction on the height function t the nodes of t (defined in the proof of the previous proposition): Let (r, M) e n ccepting run of the given TDA on the trdg t; consider node s on the tree equivlent ˆt, of which the node u on t is the imge under the compression mp c; let r(u) = q under the given run of the TDA on t; then, for every stte q of the TDA such tht q M u q, one cn construct prtil run of the TDA seen s usul tree utomton on the tree ˆt, climing up from lef elow s on ˆt to the node s, nd ssigning the stte q to this node (for n illustrtive exmple, see the tree to the right of Figure 2). Proving the if prt of Property i) is little more complex. We strt with given ccepting run ˆρ of the given TDA, s ottom-up tree utomton running in the usul sense on the tree ˆt; from this run ˆρ, we shll construct run (r, M) of the TDA on the trdg t, y n inductive, top-down trversl of the tdg t; for this top-down trversl, we will e using n integer vlued function defined t 7

8 ny node u of t nd referred to s its depth on t s the mximl numer of rcs on t from the root node on t to the node u. We shll lso use the fct tht the nodes of ˆt re in nturl ijection with the set Pos(t) of positions on t. The topdown construction of the run (r, M) is done y the following pseudo-lgorithm, where d stnds for the mximl depth on t t its lef nodes. BEGIN /* define first r t the root node on t, nd M on its outgoing edges */ r(ǫ t ) = ˆρ(ǫˆt ); For every outgoing edge e j, 1 j k, t ǫ t, set M(e j ) = ˆρ(ǫ.j); i = 1; /* Now go down */ while (i < d) do { For every node u t depth i do { choose e In u (t), nd α pos t (e) such tht M(e) = ˆρ(α); set r(u) = M(e); For every e j Out u (t), 1 j m, outgoing from u, set M(e j ) = ˆρ(α.j); } i = i + 1; } END. It is not difficult to check then, tht y construction, the pir of mps (r, M) gives n ccepting run of the TDA on the trdg t. (The resoning is illustrted elow.) Properties ii) nd iii) follow, in the rnked cse, from the proof of i) nd the results of TATA ([6]), Chpter 1; in the unrnked cse, one cn either employ resoning sed on reduction to the rnked cse s in [10], or ppel directly to the results of [13]. (Note: the numer of positions on trdg is the sme s the size of its tree equivlent.) We illustrte here the resoning employed in the proof of the if prt of ssertion i) of the ove proposition, with the tdg t of Exmple 1. We strt with the run ˆρ on its tree-equivlent ˆt, s depicted to the right of Figure 2. At strt, to the root node on t (t depth 0) is ssigned the stte q fin, nd to its three outgoing edges, re signed the three sttes p, q, q respectively; t g, which is the only node on t t depth 1, we choose the first incoming edge (of position 1, nd leled with p y M), nd set r(u) = ˆρ(1) = p; the two outgoing edges t g on t hve s positions the sets {11, 21}, {12, 22} respectively; to these two outgoing edges t g on t, we ssign the sttes tht ˆρ ssigns to the two sons of the node g t position 1 on ˆt, nmely p, q respectively (this mens in essence tht we hve selected the positions 11 nd 12 on the two outgoing edges t g on t); next, we go to depth 2 on t, where is the unique node, to which we then hve to ssign the stte ˆρ(11) tht M hs lredy ssigned to its incoming edge; the rest of the resoning is ovious, so left out. Remrk 2. Let t t e two given trdgs such tht Pos(t ) = Pos(t), nd there is nme preserving surjective mp c from Nodes(t ) onto Nodes(t). We cn then define t to e compression, or compressed form, of t ; nd refer to t s n uncompressed equivlent of t, nd to the surjective mp c on Nodes(t ) s compression mp. It is esily checked tht t nd t hve then the sme tree-equivlent; nd it folows from Proposition 2 ove tht ny given TDA A ccepts t if nd only if it ccepts t. This mens tht it is legitimte to define the lnguge of TDA s the set of ll tdgs tht it ccepts (or trees tht it ccepts), or s the set of ll trdgs ccepted, up to tree-equivlence. 8

9 3 Querying Compressed Documents: Preliminries Given trdg t, one cn nturlly construct regulr tree grmmr ssocited with t, which is strightline (cf. [4]), in the sense tht there re no cycles on the dependency reltions etween its non-terminls, nd ech non-terminl produces exctly one su-trdg of t. Such grmmr will e denoted s L t, if it is normlized in the following sense: (i) for every non-terminl A i of L t, there is exctly one production of the form A i f(a j1,..., A jk ), where i < j r for every 1 r k; we shll then set Sons(A i ) = {A j1,...,a jk }, nd sym Lt (A i ) = f; (ii) the numer of non-terminls is the numer of nodes on t. Such normlized grmmr L t is uniquely defined up to renming of the nonterminls. For instnce, for the trdg t to the left of Figure 3 we get the following normlized grmmr: A 1 f(a 2, A 3, A 4, A 5, A 2 ), A 2 c, A 3 (A 5 ), A 4, A 5. Such grmmr is esily constructed from t, for instnce y using stndrd lgorithm which computes the depth of ny node (s the mximl distnce from the root), to numer the non-terminls so s to stisfy condition (i) ove. c t: f D t : A _ 1 ( f, ) A 2 (c, _ ) A (, _ 3 ) A, _ 4 ( ) A 5 (, F 1 : A 2 (c, _ ) A (, _ 3 ) A, _ 4 ( ) A 5 (, A 2 (c, _ ) F 0 : A 1 ( f, F3: A 5 (, Fig.3. trdg t, ssocited rlg D t, nd chilings of L t The dependency grph of the normlized grmmr L t ssocited with t, nd denoted s D t, consists of nodes nmed with the non-terminls A i, 1 i n, nd one single directed rc from ny node A i to node A j whenever A j is son of A i. The root of D t is y definition the node nmed A 1. The notion of Sons of the nodes on D t is derived in the ovious wy from tht defined ove on L t. Furthermore, to ny production A i f(a j1,..., A jk ) of L t, we ssocite rooted liner grph composed of k nodes respectively nmed A j1,..., A jk, with root t A j1 nd such tht for ll l {2,...,k} the node nmed A jl is the son of the node nmed A jl 1. This grph will e clled the chiling of L t ssocited with the (unique) A i -production; it is denoted s F i. We lso define further chiling denoted F 0, s the liner grph with single node nmed A 1, where A 1 is the xiom of L t. In the sequel, we designte y G either D t or ny of the chilings F of L t. We complete ny of these cyclic grphs G into rooted leled cyclic grph (rlg, for short), y ttching to ech node u on G, with nme(u) = A i, lel denoted lel(u), nd defined s lel(u) = (sym Lt (A i ), ); cf. Figure Positive Core XPth Queries on trdgs In this pper we restrict our study to positive Core XPth queries on trdgs. Recll tht Core XPth is the nvigtionl segment of XPth, nd is sed on the following xes of XPth (cf. [10, 19]): self, child, prent, ncestor, descendnt, following-siling, preceding-siling. A loction expression 9

10 is defined s predicte of the form [xis::], where xis is one of the ove xes, nd is symol of Σ. Given ny trdg t over Σ, context node u on t nd Σ, the semntics for xis is defined y evluting this predicte t u. The semntics for the xes self, child, descendnt re esily defined, exctly s on trees (cf. [19]). For defining the semntics of the remining xes, we first recll tht Prents(u) = {v Nodes(t) u is child of v}. Definition 5 Given context node u on trdg t, nd Σ: i) [prent::] evlutes to true t u, if nd only if there exists -nmed node in Prents(u); ii) [ncestor::] evlutes to true t u, iff either [prent::] evlutes to true t u, or there exists node v Prents(u) such tht [ncestor::] evlutes to true t v; iii) [following-siling::] evlutes to true t u, iff there exists -nmed node u, nd node v on t such tht γ(v) is of the form...u...u...; iv) [preceding-siling::] evlutes to true t u, iff there exists -nmed node u, nd node v on t such tht γ(v) is of the form...u...u... For the composite xes descendnt-or-self nd ncestor-or-self, the semntics re then deduced in n ovious mnner. We shll lso need position predictes of the form [position()= i]; their semntics is tht the expression [child:: [position()= i]] evlutes to true t context node u, iff: [child::] evlutes to true t u, nd u is n i-th child of some prent. Positive Core XPth query expressions re usully defined in the literture (cf. e.g., [7]), s those generted y the following grmmr: A ::= self child descendnt prent ncestor preceding-siling following-siling S cn ::= A:: position()= i S cn nd S cn S cn or S cn E cn ::= A:: [S cn ] E cn [E cn ] Q cn ::= /S cn /E cn Q cn /Q cn We shll refer to the query expressions generted y this grmmr s cnonicl; they cn e shown to e of the type /C 1 /C 2 /.../C n, where ech C i is of the form A::[X cn ], or of the form A::[X cn ] conn A :: [X cn ], with conn {nd, or}, nd X cn, X cn {S cn, E cn, true}; we gree here to identify A::[true] with A::. Any such positive Core XPth query expression cn e trnslted into one tht is in stndrd form, i.e., where the formt of the su-queries is of the type xis:: ; we formlize this ide now. We shll refer to the xes self, child, descendnt, prent, ncestor, preceding-siling, following-siling s sic. A sic Core XPth query is query of the form //*[xis::], where xis is sic xis. More generlly, the queries we propose to evlute on trdgs re defined formlly s the expressions Q std generted y the following grmmr, where stnds for ny node nme on the documents, or for (mening ny ): A ::= self child descendnt prent ncestor preceding-siling following-siling S ::= A:: position()= i S nd S S or S Root E ::= A:: [S] E[E] Q std ::= //* //*[S] //*[E] Core XPth queries Q std of the formt generted y this grmmr re sid to e in stndrd form; to e le to hndle ny positive Core XPth query with such grmmr, we hve introduced specil predicte clled Root, deemed true only t the root node of the trdg considered. By the evlution of given query expression Q on ny trdg t, we men the ssignment: t the set of ll context nodes on t where the expression Q evlutes to true (following the conventions of Definition 2); this ltter set is lso clled the nswer for Q on t. Two given queries Q 1, Q 2 re sid to e equivlent 10

11 iff, on ny trdg t, the nswer sets for Q 1 nd Q 2 re the sme. Any positive Core XPth query Q cn cn e trnslted into n equivlent one in stndrd form; e.g., /c[following-siling::g]/d is equivlent to //*[self::d nd prent::*[root nd self::c [following-siling::g]]] in stndrd form. An inductive procedure performing such trnsltion in the generl cse (of liner complexity w.r.t. the numer of loction steps in Q cn ) is given in Appendix I. The following proposition results from Definition 5. Proposition 3 (1) For ny set of nodes X on trdg t, nd ny xis A, we hve: A(X) = {/child:: [position()= i 1 ]/.../child:: [position()= i k ]/A:: } x X, α pos t (x) α = i 1...i k (2) For ny trdg t, nd ny node with nme on t, we hve: (i) //*[preceding::] = {descendnt-or-self(following-siling( u //*[self::u nd (descendnt:: or self::)]))} (ii) //*[following::] = {descendnt-or-self(preceding-siling( u //*[self::u nd (descendnt:: or self::)]))} Finlly, following [2], for ny set S of nodes on t, the sets of nodes following(s) nd preceding(s) cn now e defined formlly, s follows: following(s) = descendnt-or-self(following-siling(ncestor-or-self(s))), preceding(s) = descendnt-or-self(preceding-siling(ncestor-or-self(s))). Note: Unlike on tree, the ncestor, descendnt, following, self nd preceding xes do not prtition the set of nodes on trdg t, in generl. 4 Automt for the Bsic Core XPth Queries 4.1 The Semntics of the Approch We first consider sic Core XPth queries. Composite or imricted queries will susequently e evluted in stepwise fshion; see Section 6. To ny sic query Q = //*[xis::], we shll ssocite word utomton (ctully trnsducer), referred to s A Q. It will run top-down, on the rlg D t if xis is non-siling, nd on ech of the chilings F of L t otherwise. In either cse, run will ttch, to ny node trversed, pir of the form ( l, x), where the component l of the pir hs the intended semntics of selection or not, y Q, of the corresponding node on t, nd the component x will e 1 or 0, with the intended semntics tht x = 1 iff the corresponding node on t hs descendnt nswering Q. At the end of the run, lel(u), t ny node u of D t, will e replced y new lel derived from the ll-pirs ttched to u y the run. To formlize these ides, we introduce set of new symols L = {s, η,, } referred to s llels (the term llel is used so s to void confusion with the term lel). We define ll-pirs s elements of the set L {0, 1}, nd the sttes of A Q s elements of the set {init} (L {0, 1}). For ny Q, the utomton A Q is over the lphet Σ {s, η}, hs init s its initil stte, nd hs no finl stte. The set Q of trnsitions of A Q will consist of rules of the form (q, τ) q where q {init} (L {0, 1}), q (L {0, 1}), nd τ Σ {s, η}. For ny rlg G, we define function ll: Nodes(G) Σ {s, η}, y setting ll(u) = π 1 (lel(u)), the first component of lel(u). The utomton A Q ssocited to sic query Q =//*[xis::] will run top-down on the rlg G, 11

12 where G is D t if xis is sic non-siling xis, nd G is ny chiling F of L t if xis is sic siling xis. A run of A Q on G is mp r: Nodes(G) L {0, 1}, such tht, for every u Nodes(G), the following holds: - if u is root G, then the rule (init, ll(u)) r(u) is in Q ; - otherwise, for every v γ(u) the rules (r(u), ll(v)) r(v) re ll in Q. (Note: when xis is non-siling, this mounts to requiring tht, for ny node v, the stte r(v) must e in conformity with the sttes r(u) for every prent node u of v, with respect to the rules in Q.) From the run of the utomton A Q nd from the sttes it ttches to the nodes of D t, we will deduce, t every node u of t, well-determined ll-pir s ( new) lel t u, vi the nturl ijection etween Nodes(t) nd Nodes(D t ). The ll-pirs thus ttched to the nodes of t will hve the following semntics (where x stnds for the nme of the node u on t, corresponding to the current node on D t ): - (, 1) : x =, current node on t is selected y (i.e., is n nswer for) Q; - (, 1) : x =, current node is not selected, ut hs selected descendnt; - (, 0) : x =, current node is not selected, nd hs no selected descendnt; - (s, 1) : x, current node is selected; - (η, 1) : x, current node is not selected, ut hs selected descendnt; - (η, 0) : x, current node is not selected, nd hs no selected descendnt. Only the nodes on D t, to which the run of A Q ssocites the lels (s, 1) or (, 1), correspond to the nodes of t tht will get selected y the query Q. The ll-pirs with oolen component 1 will lel the nodes of D t corresponding to the nodes of t which re on pth to n nswer for the query Q; thus the utomt A Q will hve no trnsitions from ny stte with oolen component 0 to stte with oolen component 1. Moreover, with view to define runs of such utomt which re unique (or unmiguous in sense tht will e presently mde cler), we define the following priority reltions etween the llpirs: (η, 0) > (η, 1) > (s, 1), nd (, 0) > (, 1) > (, 1). A run of the utomton A Q will lel ny node u on G with n ll-pir either from the group {(, 0), (, 1), (, 1)} or from the group {(η, 0), (η, 1), (s, 1)}; nd this group is determined y ll(u). For ese of presenttion, we gree to set η := s, nd often denote either of the ove two groups of ll-pirs under the uniform nottion {(l, 0), (l, 1), (l, 1)}, where l {η, }, with the ordering (l, 0) > (l, 1) > (l, 1). We shll construct run r of A Q on G tht will e uniquely determined y the following mximl priority condition: (MP): t ny node v on G, r(v) is the mximl ll-pir ( l, x) for the ordering > in the group {(l, 0), (l, 1), (l, 1)} determined y ll(v), such tht A Q contins trnsition rule of the form (r(u), ll(v)) ( l,x), for every prent u of v. Such run will ssign lel with oolen component 1 only to the nodes corresponding to those of the miniml su-trdg t contining the root of t nd ll the nswers to Q on t. 4.2 Re-leling of D t y the Runs of A Q We first consider non-siling sic query Q on given document t, nd given run r of the utomton A Q on the D t ; t the end of the run, the nodes on D t will get re-leled with new ll-pirs, computed s elow for every u Nodes(D t ): l r (u) = (s, 1) iff r(u) {(s, 1), (, 1)}, l r (u) = (η, 1) iff r(u) {(η, 1), (, 1)}, l r (u) = (η, 0) iff r(u) {(η, 0), (, 0)}. 12

13 The rlg otined in this mnner from D t, following the run r nd the ssocited re-leling function l r, will e denoted s r(d t ). For sic query Q over siling xis, the sitution is little more complex, ecuse severl different nodes on one chiling of L t cn hve the sme nme (non-terminl), or severl different chilings cn hve nodes nmed y the sme non-terminl, or oth. Thus, to ny node of D t, nmed with non-terminl A, will correspond in generl set of ll-pirs, ssigned y the vrious runs of A Q to the A-nmed nodes on the vrious chilings of L t. We therefore proceed s follows: for every complete set r of runs of A Q, formed of one run r F on ech chiling F, we will define r(d t ) s the re-leled rlg derived from D t, under r. With tht purpose we ssocite to r nd ny u Nodes(D t ), set of ll-pirs: ll r (u) = {r F (v) v Nodes(F), nd nme(v) = nme(u)}. r F r We then derive, t ech node of D t unique ll-pir in conformity with the semntics of our pproch, y using the following function: λ r (u) = s ll r (u) {(s, 1), (, 1)}, λ r (u) = η ll r (u) {(s, 1), (, 1)} =. From D t nd this function λ r, we next derive n rlg λ r (D t ) y re-leling ech node u on D t with the pir (λ r (u), ). And finlly we define r(d t ) s the rlg otined from λ r (D t ), y running on it the utomton for the sic nonsiling query //*[self::s], s indicted t the eginning of this susection. In prcticl terms, such run mounts in essence to setting, s the second component of lel(u) t ny node u, the oolen 1 iff u is on pth to some node with ll s, nd 0 otherwise. All these detils re illustrted with n exmple in the following susection. 4.3 The Automt We first present the utomt for the sic queries //*[self::] nd for //*[following-siling::], nd give n illustrtive exmple using the former for = s, nd the ltter for =. The utomt for the other sic queries re given fter the exmple. Automt: for //*[self::] nd for //*[following-siling::] γ= init γ= γ= η, 1 γ= γ= γ= η, 0 T, 1 γ= init T, 1 η, 0 T, 0 s, 1 Figure 4 elow illustrtes the evlution of Q =//*[following-siling::], on the trdg t of Figure 3. We first use the utomton for the sic query //*[following-siling::] with =, nd then the utomton for //*[self::] with = s. The su-trdg of t, formed of nodes corresponding to those of r(d t ) with lels hving oolen component 1, contins ll the nswers to Q on t. 13

14 r 1 on F 1 : A 2 (c, _ ) A (, _ 3 ) A, _ 4 ( ) A 5 (, A 2 (c, _ ) ( s, 1 ) ( s, 1) (T,1) ( T, 0) ( η, 0) r0 on F0 : A _ 1 ( f, ) (η, 0) r 3 on F 3 : A 5 (, ( T, 0) r 0 r 1, r 3, on D t : (η, 0) A 1 ( f, A 2 (c, _ ) A (, _ 3 ) A, _ 4 ( ) ( s, 1 ) ( s, 1) (T,1) ( η, 0) ( T, 0) A 5 (, D t ) λ r ( : A 1 ( η, _ ) run of the utomton for //*[self : : s] on ( η, 1) A 1 ( η, _ ) finl re leled λ r ( D ): t rlg:r(d t ) A 1 ( η, 1) A 2 (, _ ) s A 3 ( s, _ ) A 4 (s, (, _ ) A 2 s (T,1) (, _ ) A 3 s (T,1) A 4 ( s, A2 ( s, 1) A 3 ( s, 1 ) A 4 ( s, 1) (T,1) A 5 (η _, ) (η, 0) A 5 (η _, ) A 5 (η, 0) Fig. 4. Automton for the query //*[prent::] init η, 1 η, 0 T, 0 T, 1 s, 1 T, 1 Automton for the query //*[ncestor::] T, 1 γ= γ= γ= T, 1 η, 1 init T, 0 γ= s, 1 γ= γ= γ= η, 0 γ= γ= 14

15 Automton for the query //*[child::] init T, 0 η, 0 T, 1 η, 1 T, 1 Automton for the query //*[preceding-siling::] s, 1 s, 1 η, 1 init T, 1 T, 0 η, 0 T, 1 Automton for the query //*[descendnt::] init γ= γ= T, 1 T, 0 γ= η, 0 γ= γ= γ= s, 1 γ= γ= A few words on some of the utomt y wy of explntion. First, the reson why the utomton for self does not hve the sttes (, 0), (, 1), (s, 1): for (, 0), (, 1), y the semntics of susection 4.1 we must hve x =, where x is the nme of the current node on t, ut then the query //*[self::] should select the current node, so one cnnot e t such stte; s for (s, 1), the resoning is just the opposite. Next, the reson why the utomton for descendnt does not hve the sttes (η, 1), (, 1): if the semntics ttriute one of these pirs to ny node u, tht would men the node u hs selected descendnt u ; which mens tht u hs some -descendnt node, which would then e -descendnt for u too, so Q should select u. 15

16 5 Mximl Priority Runs of Bsic Query Automt Note tht the following properties, required y our semntics of susection 4.1, hold on the utomt A Q constructed ove, for ny sic Core XPth query Q = //*[xis::]: i) There re no trnsitions from ny stte with oolen component 0 to stte with oolen component 1; ii) The -trnsitions hve ll their trget sttes in {(, 0), (, 1), (, 1)}; nd for ny γ, the trget sttes of γ-trnsitions re ll in {(η, 0), (η, 1), (s, 1)}. Theorem 1 Let Q e ny sic Core XPth query, t ny given trdg, nd let G denote either the rlg D t, or ny given chiling F of L t. Assume given leling function L from Nodes(G) into the set of ll-pirs, which is correct with respect to Q, i.e., in conformity with the semntics of susection 4.1. Then there is run r of the utomton A Q on G, such tht : i) r is comptile with L; i.e., r(u) = L(u) for every node u on G; ii) r stisfies the mximl priority condition (MP) of susection 4.1. Proof. We first construct, y induction, complete run (i.e., defined t ll the nodes of G) stisfying property i). For tht, we shll employ resonings tht will e specific to the xis of the sic query Q. We give here the detils only for the xis prent; they re similr for the other xes. Q = //*[prent::]: (The xis considered is non-siling so G = D t here.) At the root u node of D t, we set r(u) = L(u); we hve to show tht there is trnsition rule in A Q of the form (init, ll(u)) L(u). Oviously, for the xis prent, the root node u cnnot correspond to node on t selected y Q, so the only ll-pirs possile for L(u) re (l, 0), (l, 1), with l {η, }; for ech of these choices, we do hve trnsition rule of the needed form, on A Q. Consider then node v on D t such tht, t ech of its ncestor nodes u on D t, the prt of the run r of A Q hs een constructed such tht r(u) = L(u); ssume tht the run cnnot e extended t the node y setting r(v) = L(v). This mens tht there exists prent node w of v, such tht ( L(w), ll(v)) L(v) is not trnsition rule of A Q ; we shll then derive contrdiction. We only hve to consider the cses where the oolen component of L(w) is greter thn or equl to tht of L(v). The possile couples L(w), L(v) re then respectively: L(w) : (, 0) (, 1) (, 1) (, 1) (, 1) L(v) : (η, 0) (, 1) (η, 1) (, 1) (η, 1) In ll cses, we hve ll(w) = ecuse of the semntics, so the node (on t corresponding to the node) v hs -prent, so must e selected; thus the ove choices for L(v) re not in conformity with the semntics; contrdiction. We now prove tht the complete run r thus constructed, stisfies property ii). For this prt of the proof, the resoning does not need to e specific for ech Q; so, write Q more generlly, s //*[xis::] for some given. Suppose the run r does not stisfy the mximl priority condition t some node v on G; ssume, for instnce, tht the run r mde the choice, sy of the ll-pir (l, 1), lthough the mximl leling of the node v, in mnner comptile with the ll-pirs of ll its prents, ws the ll-pir (l, 0). Since L is ssumed correct, nd r is comptile with L, the mximl possile leling (l, 0) would men tht the node (on t corresponding to the node) v hs no descendnt selected y Q; wheres, the choice tht r is ssumed to hve mde t v, nmely the ll-pir (l, 1), hs the opposite semntics whether or not ll(v) = ; in other words, the leling L would not e correct with respect to Q; contrdiction. The other possiilities for the d lelings under r lso get eliminted in similr mnner. Theorem 2 Let Q, t, D t, F, G e s ove. Let r e (complete) run of the utomton A Q on G, which stisfies the mximl priority condition (MP) of 16

17 susection 4.1. Then the leling function L on N odes(g), defined s L(u) = r(u) for ny node u, is correct with respect to the semntics of susection 4.1. Proof. Let us suppose tht the leling L deduced from r is not correct with respect to Q; we shll then derive contrdiction. The resoning will e y cse nlysis, which will e specific to the xis of the sic query Q considered. We give the detils here for Q = //*[descendnt::]. The xis is non-siling, so we hve G = D t here. The sets Nodes(t), Nodes(D t ) re in nturl ijection, so for ny node u on D t we shll lso denote y u the corresponding node on t, in our resonings elow. We sw tht the utomton A Q for the descendnt xis does not hve the sttes (η, 1), (, 1). Consider then node u on D t such tht: for ll ncestor nodes w of u, the llel r(w) is in conformity with the semntics, ut the ll-pir r(u) is not in conformity. Now, A Q hs only 5 sttes: (init), (, 1), (s, 1), (, 0), (η, 0), of which only the lst four cn llel the nodes. So the possile d choices tht r is ssumed to hve mde t our node u, re s follows: () r(u) = (, 1), ut the node u is not n nswer to the query Q. Here nme(u) must e, so the choice of r ought to hve een (, 0); () r(u) = (s, 1), ut the node u is not n nswer to the query Q. Here nme(u), so the choice of r ought to hve een (η, 0); (c) r(u) = (η, 0), ut the node u is n nswer to the query Q. Here nme(u), so the choice of r ought to hve een (s, 1); (d) r(u) = (, 0), ut the node u is n nswer to the query Q. Here nme(u) must e, so the choice of r ought to hve een (, 1). In ll the four cses, we hve to show: i) tht the ought-to-hve-een choice ll-pir is rechle from ll the prent nodes of u; ii) nd tht, with such new nd correct choice mde t u, r cn e completed from u, into run on the entire dg D t. The resoning will e similr for cses (), (), nd for the cses (c), (d). Here re the detils for cse (): Tht u is not n nswer to Q mens tht u hs no -descendnt node, so for ll nodes v elow u on D t, we hve ll(v). Therefore, ssertions i) nd ii) ove follow from the following oservtions on the utomton for Q= //*[descendnt::]: i) if r could rech the stte (, 1) t node u (vi -trnsition) from ny prent node of u, then (, 0) is lso rechle thus t u, from ny of them; ii) if, from the stte (, 1), r could rech ll the nodes on D t elow u (with stte (η, 0)), vi trnsitions over γ, then it cn do exctly the sme now, with the correct choice ll-pir (, 0) t u. As for cse (c): Node u is n nswer to Q here, so u hs -descendnt; let v e -node elow u on D t ; the ll-pir r(v) tht r ssigns to v must then e either (, 1) or (, 0); this implies tht r pssed from the stte (η, 0) supposedly ssigned y r to u to (, 1) or (, 0) somewhere etween u nd v; which is impossile, s is esily seen on the utomton A Q for the xis descendnt considered. The resoning for cse (d) is even esier: from stte (, 0), no stte with n outgoing -trnsition is rechle. 6 Evluting Composite Queries A composite query is query in stndrd form, ut is not sic. We propose to evlute such query incrementlly. For this, it suffices to consider queries tht re of the form //*[A::x conn A ::x ], where conn {nd, or}, or of the form //*[A 1 ::*[A 2 ::]]. For those of the former type, we oserve first tht the components in disjunction (resp. conjunction) under * cn e evluted seprtely. Indeed, the nswer for Q = //*[A::x conn A ::x ] cn 17

18 e otined s union (resp. intersection) of the nswers for the two component queries //*[A::x], nd //*[A ::x ], when conn is n or (resp. n nd). We pply the method descried erlier, seprtely for Q 1 = //*[A::x] nd for Q 2 = //*[A ::x ], thus getting two respective evluting runs r 1, r 2. Any node u of the dg D t will then e re-leled, y the composite query Q, with ll-pirs computed y function AND when conn = nd (resp. OR when conn = or), in conformity with the semntics presented in the Section 4.1: AND(u) = (s, 1) iff r 1 (u) = (l, 1) = r 2 (u); AND(u) = (η, 0) iff r 1 (u) = (l, 0) or r 2 (u) = (l, 0); AN D(u) = (η, 1) otherwise. OR(u) = (s, 1) iff r 1 (u) = (l, 1) or r 2 (u) = (l, 1); OR(u) = (η, 0) iff r 1 (u) = (l, 0) = r 2 (u); OR(u) = (η, 1) otherwise. Figure 5 elow illustrtes the ove resoning, for the evlution of the composite query Q = //*[self:: nd prent::], on the trdg t of Figure 3: ( η,1) //*[self : : ] //*[prent : : ] nd(d t ) A 1 ( f, ( η,1) A 1 ( f, A 1 (η,1 ) A2(c, _ ) A (, _ 3 ) A, _ 4 ( ) ( η,0) ( η,1) ( T, 1) A 2 (c, _ ) A (, _ 3 ) A, _ 4 ( ) η,0 ( η,0) ( ) ( T, 1) A 2 ( η, 0) A 3 ( η,1) A 4( η, 0) ( T, 1) A 5 (, ( s,1) A 5 (, A 5 ( s, 1 ) Fig. 5. We next consider the queries of the form Q = //*[A 1 ::*[A 2 ::]], with imricted predictes. For their evlution, we first consider mximl priority run evluting r 2 (resp. set of runs r 2 ) of the utomton ssocited to the inner query //*[A 2 ::], on D t (resp. the set of ll chilings of L t ). This run (resp. set of runs) will output the rlg r 2 (D t ) (resp. r 2 (D t )), s descried in Section 4.2. Evluting the imricted query Q on the dg t is then done y running the utomton for the sic outer query //*[A 1 ::s] on r 2 (D t ) (resp. r 2 (D t )). Finlly, the nswer for query of the type Q = //*[child::x[position()= k]], is the suset of the nodes nswering //*[child::x], which correspond to k-th node on some chiling. 7 Deriving the Answer on the Tree-equivlent Given Core XPth query Q nd its nswer set on trdg t, we show here how to derive the nswer for the sme query Q on the tree-equivlent ˆt of t; this is of importnce, since the stndrd model for n XML document (even when given in compressed form) is generlly considered s the tree representtion of the document. We oserve, to strt with, tht the nswer set for Q on t is in generl superset of the nswer set for Q on the tree-equivlent ˆt. This cn e so for the following two resons: (i) If certin node u on t is selected y Q, not ll of the nodes u on ˆt, tht re lifts of u under the compression mp c on Nodes(ˆt), my nswer the 18

19 query Q on the tree ˆt, even when Q is sic query. For instnce, consider the sic query //*[prent::]; on the fully compressed tdg f((c), (c)), the (unique) node nmed c is n nswer; it hs two c-nmed nodes s lifts on the tree-equivlent ˆt, of which only one is n nswer for the query. (ii) A node u on trdg t my nswer composite query Q, ut none mong the lifts of u on ˆt my nswer the sme query Q on the tree ˆt. For instnce, the unique c-nmed node on the compressed tdg f((c), (c)) nswers the query //*[prent:: nd prent::], ut there is no node on the tree-equivlent nswering this query. Actully, such situtions rise only for queries involving the upwrd xes prent, ncestor, which define reltions tht re less trivil on trdgs thn on trees. We cn formulte this oservtion more precisely, s follows: Lemm 1. Let A e one of the xes self,child,descendent, Q the sic query //*[A::x], t ny given trdg, ˆt its tree-equivlent, u ny given node on t, nd u c 1 (u) ny node lift of u on ˆt. Then: wrt the mximl priority runs of the utomton for the xis A, respectively on D t nd Dˆt, the nodes u on t, nd u on ˆt, get leled y the sme ll-pir; in prticulr, the node u nswers Q on t if nd only if the node u nswers the sme query Q on the tree ˆt. Proof. Follows y oserving tht the semntics of Section 4.1 hve een defined in mnner which is top-down, nd tht the compression mp c : Nodes(ˆt) Nodes(t) mps the set Nodes(ˆt u ), of nodes elow u on ˆt, onto the set of nodes of the su-trdg t u. The ove lemm is first step towrds the ojective of this section. As second step, we propose to distinguish, on the utomt constructed ove for the two queries //*[prent::], //*[ncestor::], the trnsitions tht will never e fired on tree; such s, e.g., the one from stte (η, 1) to stte (, 1). (Note: for this trnsition to e firle, we hve to rech node corresponding to -nmed node on the trdg, which must then lso hve s (unique) prent on the tree i.e., the node from which the trnsition is to e fired -nmed node; this prent node cnnot correspond then to node leled with (η, 1).) Such trnsitions tht re not firle on tree) will e depicted with dotted rrows on the utomton; the trnsitions with full rrows re then the ones tht re firle oth on trdgs nd on trees. The two utomt thus revised re s follows: Automton for the query //*[prent::] -revised init η, 0 T, 0 η, 1 T, 1 s, 1 T, 1 19

20 Automton for the query //*[ncestor::] -revised T, 1 γ= γ= γ= T, 1 η, 1 init T, 0 γ= s, 1 γ= γ= γ= η, 0 γ= γ= The next step towrds our ojective of this section consists in completing mximl priority run r of the utomton for ny given sic query, y ssociting to current node u on D t, suset of Pos t (u) (rememer: Nodes(t) nd Nodes(D t ) re in nturl ijection), denoted s P r (u), nd defined s follows: cse u selected (i.e. the lel of u under r is (s, 1) or (, 1): we set P r (u) = Pos t (u) α,i {α.i α.i Pos t (u)}, the union eing tken over the positions α of prent nodes v on t such tht the trnsition from v to u is dotted; cse u not selected (i.e. the lel of u under r is neither (s, 1) nor (, 1): we set here P r (u) = Pos t (u). A run r completed in this mnner will e denoted in oldfce type s r, giving thus mp r : Nodes(t) L {0, 1} Pos(t), defined y u (r(u), P r (u)). In order to derive the nswer to composite query Q on the tree-equivlent of t, from the nswer for Q on t, we need nturlly to complete the functions AND nd OR of Section 6, y dding component giving the selected positions for query Q which is conjunction or disjunction of two su-queries Q 1, Q 2. These completed functions, gin denoted oldfce s AND nd OR, or defined elow in rther ovious mnner (the indices 1, 2 correspond to the runs wrt the two queries, nd l 1, l 2 stnd for s or η; recll tht η stnds for the llel s): AND(u) = ((s, 1), P r1 (u) P r2 (u)) iff r 1 (u) = ((l 1, 1), P r1 (u)) nd r 2 (u) = ((l 2, 1), P r2 (u)); AND(u) = ((η, 0), Pos t (u)) iff r 1 (u) = (l 1, 0) nd r 2 (u) = (l 2, 0); AND(u) = ((η, 1), Pos t (u)), otherwise. OR(u) = ((s, 1), P r1 (u) P r2 (u)) iff r 1 (u) = ((l 1, 1), P r1 (u)) nd r 2 (u) = ((l 2, 1), P r2 (u)); = ((s, 1), P r1 (u)) iff r 1 (u) = (l 1, 1) nd r 2(u) (l 2, 1); = ((s, 1), P r2 (u)) iff r 2 (u) = (l 2, 1) nd r 1 (u) (l 1, 1); OR(u) = ((η, 0), Pos t (u)) iff r 1 (u) = (l 1, 0) nd r 2 (u) = (l 2, 0); OR(u) = ((η, 1), Pos t (u)), otherwise. Exmple 2. We evlute the query Q =//*[ncestor:: [prent::c]] on the trdg t presented to the left of Figure 6; the stndrd form of this query is Q =//*[ncestor::*[self:: nd prent::c]]; nd its nswer consists of ll the nodes hving n ncestor with prent c. But we wnt here to otin the sme nswer for Q on t nd on its tree-equivlent ˆt presented to the right of Figure 6. To find such n nswer, it is necessry to use the revised utomt for the prent nd ncestor xes. For ese of comprehension, we illustrte the evlution of Q directly on the trdg t (nd not on the rlg D t ) it is possile ecuse, for this document the trdg t nd its rlg D t re isomorphic. Note tht ech node u of t is represented y its nme nd the set of positions of the nodes on ˆt tht re lifts of u. First, look t Figure 7 where we hve presented the evlution of Q using the non-revised utomt. We otin then n nswer on t selecting nodes nd g (which re the nodes of t hving n ncestor with prent c); ut, if we unfold this nswer, we otin the tree with s selected nodes: t positions 11, 211, nd 20

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Closure Properties of Regular Languages

Closure Properties of Regular Languages Closure Properties of Regulr Lnguges Regulr lnguges re closed under mny set opertions. Let L 1 nd L 2 e regulr lnguges. (1) L 1 L 2 (the union) is regulr. (2) L 1 L 2 (the conctention) is regulr. (3) L

More information

Formal Languages and Automata

Formal Languages and Automata Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University

More information

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton 25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS The University of Nottinghm SCHOOL OF COMPUTER SCIENCE LEVEL 2 MODULE, SPRING SEMESTER 2016 2017 LNGUGES ND COMPUTTION NSWERS Time llowed TWO hours Cndidtes my complete the front cover of their nswer ook

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Mehryr Mohri Cournt Institute nd Google Reserch mohri@cims.nyu.com Preliminries Finite lphet Σ, empty string. Set of ll strings over

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Converting Regular Expressions to Discrete Finite Automata: A Tutorial Converting Regulr Expressions to Discrete Finite Automt: A Tutoril Dvid Christinsen 2013-01-03 This is tutoril on how to convert regulr expressions to nondeterministic finite utomt (NFA) nd how to convert

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

CM10196 Topic 4: Functions and Relations

CM10196 Topic 4: Functions and Relations CM096 Topic 4: Functions nd Reltions Guy McCusker W. Functions nd reltions Perhps the most widely used notion in ll of mthemtics is tht of function. Informlly, function is n opertion which tkes n input

More information

Lecture 9: LTL and Büchi Automata

Lecture 9: LTL and Büchi Automata Lecture 9: LTL nd Büchi Automt 1 LTL Property Ptterns Quite often the requirements of system follow some simple ptterns. Sometimes we wnt to specify tht property should only hold in certin context, clled

More information

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Eugene Weinstein Google, NYU Cournt Institute eugenew@cs.nyu.edu Slide Credit: Mehryr Mohri Preliminries Finite lphet, empty string.

More information

Regular expressions, Finite Automata, transition graphs are all the same!!

Regular expressions, Finite Automata, transition graphs are all the same!! CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

3 Regular expressions

3 Regular expressions 3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)

More information

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science CSCI 340: Computtionl Models Kleene s Theorem Chpter 7 Deprtment of Computer Science Unifiction In 1954, Kleene presented (nd proved) theorem which (in our version) sttes tht if lnguge cn e defined y ny

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38 Theory of Computtion Regulr Lnguges (NTU EE) Regulr Lnguges Fll 2017 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of Finite Automt A finite utomton hs finite set of control

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

Finite Automata-cont d

Finite Automata-cont d Automt Theory nd Forml Lnguges Professor Leslie Lnder Lecture # 6 Finite Automt-cont d The Pumping Lemm WEB SITE: http://ingwe.inghmton.edu/ ~lnder/cs573.html Septemer 18, 2000 Exmple 1 Consider L = {ww

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Tutorial Automata and formal Languages

Tutorial Automata and formal Languages Tutoril Automt nd forml Lnguges Notes for to the tutoril in the summer term 2017 Sestin Küpper, Christine Mik 8. August 2017 1 Introduction: Nottions nd sic Definitions At the eginning of the tutoril we

More information

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30 Tlen en Automten Test 1, Mon 7 th Dec, 2015 15h45 17h30 This test consists of four exercises over 5 pges. Explin your pproch, nd write your nswer to ech exercise on seprte pge. You cn score mximum of 100

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

The size of subsequence automaton

The size of subsequence automaton Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,

More information

Formal Languages and Automata Theory. D. Goswami and K. V. Krishna

Formal Languages and Automata Theory. D. Goswami and K. V. Krishna Forml Lnguges nd Automt Theory D. Goswmi nd K. V. Krishn Novemer 5, 2010 Contents 1 Mthemticl Preliminries 3 2 Forml Lnguges 4 2.1 Strings............................... 5 2.2 Lnguges.............................

More information

CHAPTER 1 Regular Languages. Contents

CHAPTER 1 Regular Languages. Contents Finite Automt (FA or DFA) CHAPTE 1 egulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, euivlence of NFAs nd DFAs, closure under regulr

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

1 From NFA to regular expression

1 From NFA to regular expression Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work

More information

Thoery of Automata CS402

Thoery of Automata CS402 Thoery of Automt C402 Theory of Automt Tle of contents: Lecture N0. 1... 4 ummry... 4 Wht does utomt men?... 4 Introduction to lnguges... 4 Alphets... 4 trings... 4 Defining Lnguges... 5 Lecture N0. 2...

More information

Formal languages, automata, and theory of computation

Formal languages, automata, and theory of computation Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

DFA minimisation using the Myhill-Nerode theorem

DFA minimisation using the Myhill-Nerode theorem DFA minimistion using the Myhill-Nerode theorem Johnn Högerg Lrs Lrsson Astrct The Myhill-Nerode theorem is n importnt chrcteristion of regulr lnguges, nd it lso hs mny prcticl implictions. In this chpter,

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Theory of Computation Regular Languages

Theory of Computation Regular Languages Theory of Computtion Regulr Lnguges Bow-Yw Wng Acdemi Sinic Spring 2012 Bow-Yw Wng (Acdemi Sinic) Regulr Lnguges Spring 2012 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of

More information

Deterministic Finite Automata

Deterministic Finite Automata Finite Automt Deterministic Finite Automt H. Geuvers nd J. Rot Institute for Computing nd Informtion Sciences Version: fll 2016 J. Rot Version: fll 2016 Tlen en Automten 1 / 21 Outline Finite Automt Finite

More information

Java II Finite Automata I

Java II Finite Automata I Jv II Finite Automt I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz Finite Automt I p.1/13 Processing Regulr Expressions We lredy lerned out Jv s regulr expression

More information

Foundations of XML Types: Tree Automata

Foundations of XML Types: Tree Automata 1 / 43 Foundtions of XML Types: Tree Automt Pierre Genevès CNRS (slides mostly sed on slides y W. Mrtens nd T. Schwentick) University of Grenole Alpes, 2017 2018 2 / 43 Why Tree Automt? Foundtions of XML

More information

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9. Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Lexical Analysis Finite Automate

Lexical Analysis Finite Automate Lexicl Anlysis Finite Automte CMPSC 470 Lecture 04 Topics: Deterministic Finite Automt (DFA) Nondeterministic Finite Automt (NFA) Regulr Expression NFA DFA A. Finite Automt (FA) FA re grph, like trnsition

More information

CS 330 Formal Methods and Models

CS 330 Formal Methods and Models CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2017 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 2 1. Prove ((( p q) q) p) is tutology () (3pts) y truth tle. p q p q

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

Homework Solution - Set 5 Due: Friday 10/03/08

Homework Solution - Set 5 Due: Friday 10/03/08 CE 96 Introduction to the Theory of Computtion ll 2008 Homework olution - et 5 Due: ridy 10/0/08 1. Textook, Pge 86, Exercise 1.21. () 1 2 Add new strt stte nd finl stte. Mke originl finl stte non-finl.

More information

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings... Tle of contents: Lecture N0.... 3 ummry... 3 Wht does utomt men?... 3 Introduction to lnguges... 3 Alphets... 3 trings... 3 Defining Lnguges... 4 Lecture N0. 2... 7 ummry... 7 Kleene tr Closure... 7 Recursive

More information

CS375: Logic and Theory of Computing

CS375: Logic and Theory of Computing CS375: Logic nd Theory of Computing Fuhu (Frnk) Cheng Deprtment of Computer Science University of Kentucky 1 Tle of Contents: Week 1: Preliminries (set lger, reltions, functions) (red Chpters 1-4) Weeks

More information

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages 5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive

More information

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA) Finite Automt (FA or DFA) CHAPTER Regulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, equivlence of NFAs DFAs, closure under regulr

More information

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 4 1. UsetheproceduredescriedinLemm1.55toconverttheregulrexpression(((00) (11)) 01) into n NFA. Answer: 0 0 1 1 00 0 0 11 1 1 01 0 1 (00)

More information

ɛ-closure, Kleene s Theorem,

ɛ-closure, Kleene s Theorem, DEGefW5wiGH2XgYMEzUKjEmtCDUsRQ4d 1 A nice pper relevnt to this course is titled The Glory of the Pst 2 NICTA Resercher, Adjunct t the Austrlin Ntionl University nd Griffith University ɛ-closure, Kleene

More information

State Minimization for DFAs

State Minimization for DFAs Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck. Outline Automt Theory 101 Rlf Huuck Introduction Finite Automt Regulr Expressions ω-automt Session 1 2006 Rlf Huuck 1 Session 1 2006 Rlf Huuck 2 Acknowledgement Some slides re sed on Wolfgng Thoms excellent

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

Context-Free Grammars and Languages

Context-Free Grammars and Languages Context-Free Grmmrs nd Lnguges (Bsed on Hopcroft, Motwni nd Ullmn (2007) & Cohen (1997)) Introduction Consider n exmple sentence: A smll ct ets the fish English grmmr hs rules for constructing sentences;

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

A negative answer to a question of Wilke on varieties of!-languages

A negative answer to a question of Wilke on varieties of!-languages A negtive nswer to question of Wilke on vrieties of!-lnguges Jen-Eric Pin () Astrct. In recent pper, Wilke sked whether the oolen comintions of!-lnguges of the form! L, for L in given +-vriety of lnguges,

More information

GNFA GNFA GNFA GNFA GNFA

GNFA GNFA GNFA GNFA GNFA DFA RE NFA DFA -NFA REX GNFA Definition GNFA A generlize noneterministic finite utomton (GNFA) is grph whose eges re lele y regulr expressions, with unique strt stte with in-egree, n unique finl stte with

More information

Revision Sheet. (a) Give a regular expression for each of the following languages:

Revision Sheet. (a) Give a regular expression for each of the following languages: Theoreticl Computer Science (Bridging Course) Dr. G. D. Tipldi F. Bonirdi Winter Semester 2014/2015 Revision Sheet University of Freiurg Deprtment of Computer Science Question 1 (Finite Automt, 8 + 6 points)

More information

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers 80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES 2.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input

More information

Non-deterministic Finite Automata

Non-deterministic Finite Automata Non-deterministic Finite Automt Eliminting non-determinism Rdoud University Nijmegen Non-deterministic Finite Automt H. Geuvers nd T. vn Lrhoven Institute for Computing nd Informtion Sciences Intelligent

More information

CS 267: Automated Verification. Lecture 8: Automata Theoretic Model Checking. Instructor: Tevfik Bultan

CS 267: Automated Verification. Lecture 8: Automata Theoretic Model Checking. Instructor: Tevfik Bultan CS 267: Automted Verifiction Lecture 8: Automt Theoretic Model Checking Instructor: Tevfik Bultn LTL Properties Büchi utomt [Vrdi nd Wolper LICS 86] Büchi utomt: Finite stte utomt tht ccept infinite strings

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p

More information

Let's start with an example:

Let's start with an example: Finite Automt Let's strt with n exmple: Here you see leled circles tht re sttes, nd leled rrows tht re trnsitions. One of the sttes is mrked "strt". One of the sttes hs doule circle; this is terminl stte

More information

PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA

PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA RIGHT LINEAR LANGUAGES. Right Liner Grmmr: Rules of the form: A α B, A α A,B V N, α V T + Left Liner Grmmr: Rules of the form: A Bα, A α A,B V N, α V T

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

FABER Formal Languages, Automata and Models of Computation

FABER Formal Languages, Automata and Models of Computation DVA337 FABER Forml Lnguges, Automt nd Models of Computtion Lecture 5 chool of Innovtion, Design nd Engineering Mälrdlen University 2015 1 Recp of lecture 4 y definition suset construction DFA NFA stte

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Worked out examples Finite Automata

Worked out examples Finite Automata Worked out exmples Finite Automt Exmple Design Finite Stte Automton which reds inry string nd ccepts only those tht end with. Since we re in the topic of Non Deterministic Finite Automt (NFA), we will

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

Myhill-Nerode Theorem

Myhill-Nerode Theorem Overview Myhill-Nerode Theorem Correspondence etween DA s nd MN reltions Cnonicl DA for L Computing cnonicl DFA Myhill-Nerode Theorem Deepk D Souz Deprtment of Computer Science nd Automtion Indin Institute

More information

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-* Regulr Expressions (RE) Regulr Expressions (RE) Empty set F A RE denotes the empty set Opertion Nottion Lnguge UNIX Empty string A RE denotes the set {} Alterntion R +r L(r ) L(r ) r r Symol Alterntion

More information

BACHELOR THESIS Star height

BACHELOR THESIS Star height BACHELOR THESIS Tomáš Svood Str height Deprtment of Alger Supervisor of the chelor thesis: Study progrmme: Study rnch: doc. Štěpán Holu, Ph.D. Mthemtics Mthemticl Methods of Informtion Security Prgue 217

More information

Analytically, vectors will be represented by lowercase bold-face Latin letters, e.g. a, r, q.

Analytically, vectors will be represented by lowercase bold-face Latin letters, e.g. a, r, q. 1.1 Vector Alger 1.1.1 Sclrs A physicl quntity which is completely descried y single rel numer is clled sclr. Physiclly, it is something which hs mgnitude, nd is completely descried y this mgnitude. Exmples

More information

dx dt dy = G(t, x, y), dt where the functions are defined on I Ω, and are locally Lipschitz w.r.t. variable (x, y) Ω.

dx dt dy = G(t, x, y), dt where the functions are defined on I Ω, and are locally Lipschitz w.r.t. variable (x, y) Ω. Chpter 8 Stility theory We discuss properties of solutions of first order two dimensionl system, nd stility theory for specil clss of liner systems. We denote the independent vrile y t in plce of x, nd

More information

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004

Advanced Calculus: MATH 410 Notes on Integrals and Integrability Professor David Levermore 17 October 2004 Advnced Clculus: MATH 410 Notes on Integrls nd Integrbility Professor Dvid Levermore 17 October 2004 1. Definite Integrls In this section we revisit the definite integrl tht you were introduced to when

More information