Swinburne Research Bank

Similar documents
Chapter 7. Kleene s Theorem. 7.1 Kleene s Theorem. The following theorem is the most important and fundamental result in the theory of FA s:

The Area of a Triangle

Topic II.1: Frequent Subgraph Mining

10.3 The Quadratic Formula

Andersen s Algorithm. CS 701 Final Exam (Reminder) Friday, December 12, 4:00 6:00 P.M., 1289 Computer Science.

Mathematical Reflections, Issue 5, INEQUALITIES ON RATIOS OF RADII OF TANGENT CIRCLES. Y.N. Aliyev

Week 8. Topic 2 Properties of Logarithms

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Prerna Tower, Road No 2, Contractors Area, Bistupur, Jamshedpur , Tel (0657) ,

Validating XML Documents in the Streaming Model with External Memory

Edinburgh Research Explorer

2-Way Finite Automata Radboud University, Nijmegen. Writer: Serena Rietbergen, s Supervisor: Herman Geuvers

( ) D x ( s) if r s (3) ( ) (6) ( r) = d dr D x

Math 4318 : Real Analysis II Mid-Term Exam 1 14 February 2013

10 Statistical Distributions Solutions

Module 4: Moral Hazard - Linear Contracts

Data Structures. Element Uniqueness Problem. Hash Tables. Example. Hash Tables. Dana Shapira. 19 x 1. ) h(x 4. ) h(x 2. ) h(x 3. h(x 1. x 4. x 2.

CS 573 Automata Theory and Formal Languages

FI 2201 Electromagnetism

Michael Rotkowitz 1,2

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Incremental Maintenance of XML Structural Indexes

Fourier-Bessel Expansions with Arbitrary Radial Boundaries

Finite State Automata and Determinisation

A Study of Some Integral Problems Using Maple

Data Compression LZ77. Jens Müller Universität Stuttgart

Analysis of Variance for Multiple Factors

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

6.5 Improper integrals

Language Processors F29LP2, Lecture 5

Deterministic simulation of a NFA with k symbol lookahead

About Some Inequalities for Isotonic Linear Functionals and Applications

Section 1.3 Triangles

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

System Validation (IN4387) November 2, 2012, 14:00-17:00

Part 4. Integration (with Proofs)

Arrow s Impossibility Theorem

MAT 403 NOTES 4. f + f =

Arrow s Impossibility Theorem

Previously. Extensions to backstepping controller designs. Tracking using backstepping Suppose we consider the general system

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

1 Nondeterministic Finite Automata

8 THREE PHASE A.C. CIRCUITS

Lecture 14. Protocols. Key Distribution Center (KDC) or Trusted Third Party (TTP) KDC generates R1

Influence of the Magnetic Field in the Solar Interior on the Differential Rotation

EECE 260 Electrical Circuits Prof. Mark Fowler

A Study on the Properties of Rational Triangles

3.1 Magnetic Fields. Oersted and Ampere

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

NON-DETERMINISTIC FSA

INTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable

Logic Synthesis and Verification

On Natural Partial Orders of IC-Abundant Semigroups

Optimization. x = 22 corresponds to local maximum by second derivative test

Discrete Model Parametrization

Electronic Supplementary Material

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

π,π is the angle FROM a! TO b

Illustrating the space-time coordinates of the events associated with the apparent and the actual position of a light source

Introduction to Olympiad Inequalities

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Symmetrical Components 1

ITI Introduction to Computing II

QUADRATIC EQUATION. Contents

Chapter 4 State-Space Planning

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Review of Mathematical Concepts

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Bisimulation, Games & Hennessy Milner logic

Physics 505 Fall 2005 Midterm Solutions. This midterm is a two hour open book, open notes exam. Do all three problems.

r r E x w, y w, z w, (1) Where c is the speed of light in vacuum.

Engr354: Digital Logic Circuits

Probabilistic Retrieval

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

Simple Schemas for Unordered XML

Nondeterministic Automata vs Deterministic Automata

Chapter 2 Finite Automata

Nondeterministic Finite Automata

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Homework 3 Solutions

Chapter Seven Notes N P U1C7

CHENG Chun Chor Litwin The Hong Kong Institute of Education

Part I: Study the theorem statement.

U>, and is negative. Electric Potential Energy

More Properties of the Riemann Integral

Section 4.4. Green s Theorem

CHAPTER 7 Applications of Integration

Figure 1. The left-handed and right-handed trefoils

Solutions to Problem Set #1

TOPIC: LINEAR ALGEBRA MATRICES

This immediately suggests an inverse-square law for a "piece" of current along the line.

XPath Rewriting Using Multiple Views. Author. Published. Journal Title DOI. Copyright Statement. Downloaded from. Griffith Research Online

Bases for Vector Spaces

Class Summary. be functions and f( D) , we define the composition of f with g, denoted g f by

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

The Formulas of Vector Calculus John Cullinan

p-adic Egyptian Fractions

Unit 4. Combinational Circuits

Convert the NFA into DFA

Transcription:

Swinune Reseh Bnk http://esehnk.swinune.edu.u Zhou, R., Liu, C., Wng, J., & Li, J. (2009). Continment etween unions of XPth queies. Oiginlly pulished in X. Zhou, H. Yokot, K. Deng, & Q. Liu (eds.) Poeedings of the 14th Intentionl Confeene on Dtse Systems fo Advned Applitions (DASFAA 2009), Bisne, Austli, 21 23 Apil 2009. Letue notes in ompute siene (Vol. 5463, pp. 405 420). Belin: Spinge. Aville fom: http://dx.doi.og/10.1007/978-3-642-00887-0_36 Copyight Spinge-Velg Belin Heideleg 2009. This is the utho s vesion of the wok, posted hee with the pemission of the pulishe fo you pesonl use. No futhe distiution is pemitted. You my lso e le to ess the pulished vesion fom you liy. The definitive vesion is ville t http://www.spingelink.om/. Swinune Univesity of Tehnology CRICOS Povide 00111D swinune.edu.u

Continment etween Unions of XPth Queies Rui Zhou 1, Chengfei Liu 1, Junhu Wng 2, nd Jinxin Li 1 1 Swinune Univesity of Tehnology, Meloune, Austli {zhou,liu,jinxinli}@swin.edu.u 2 Giffith Univesity, Gold Cost, Austli J.Wng@giffith.edu.u Astt. In this ppe, we ddess the ontinment polem fo unions of XPth queies with nd without shem. We find the polem n e lwys edued into ontinment polem etween one single quey nd union of queies. When shem is not ville, the polem n e futhe edued into heking ontinment etween piwise queies (eh fom one union), ut this only holds fo some XPth susets, suh s XP {/,//,[]}, ut not fo XP {/,//,[], }. We then show the polem is still solvle in XP {/,//,[], }, though no effiient lgoithm exists. When shem is t hnd, we popose sttegy to ewite quey into union of simplified queies sed on shem infomtion, nd then pply methods developed when shem is not tken into ount. The polem is then edued into heking ontinment etween unions of queies in XP {/,[]} without shem. 1 Intodution Testing quey ontinment is key tehnique in mny dtse pplitions. In quey optimiztion [1], it is sutsk to hek if two fomultions of quey e equivlent, nd hene to hoose the fomultion with less evlution ost. In dt integtion senio, espeilly ewiting queies using views [2], it povides mens to find equivlent o ontined ewitings, nd lso to detet edundnt ewitten queies to sve omputtion time. Quey ontinment n lso e used to mintin integity onstints [3] nd detemine when queies e independent of updtes to the dtse [4]. In eltionl ontext, ontinment hs een studied fo onjuntive queies 3 [1] nd unions of onjuntive queies [6]. In [6], it shows, fo two unions of onjuntive queies P nd Q, P is ontined in Q if nd only if ny quey p in P is ontined in some quey q in Q. Theefoe, ontinment fo unions of onjuntive queies n e edued into ontinment fo ( nume of) pis of queies. While fo XML queies, espeilly XPth [7] queies, the ontinment polem is minly studied etween two single queies [8 11], ut not fo unions of XPth queies. Some questions still need to e nsweed: Cn we dw simil onlusion fo unions of XPth queies s tht fo unions of onjuntive queies? If not, how n we detemine ontinment etween unions of XPth queies? Does 3 A typil quey tht is extensively studied in eltionl dtse, see [5].

it mke ny diffeene whethe n XML shem is ville? We will look into these questions in this ppe. Befoe heding into the min pt, we would like to stess gin the impotne of detemining ontinment etween unions of queies. One typil pplition is ewiting queies using views, whee to detet edundnt ewiting mens to detet if ewitten quey is ontined in union of othe ewitten queies. This is speil se of union ontinment, fo one of the unions ontins only one quey. In this ppe, we show tht the piwise ompison popety fo unions of onjuntive queies holds only fo some XPth susets, suh s XP {/,//,[]} fetuing hild, desendnt, nd nh xes. Fo lge suset XP {/,//,[], }, with wildds dded, we povide n exmple to show two queies n e omined togethe to ontin thid quey, though neithe of them ould solely ontin the thid one. Theefoe we should devise some new sttegy to detet ontinment eltionship fo unions of queies in this suset. To mke the wok ompehensive, we lso disuss ontinment unde shem infomtion. A quey p not ontined in nothe quey q in genel, my e ontined in q unde shem onstints, euse shem imposes some onstints, onfining wildds nd desendnt xes in the quey eing intepeted in some ptiul wys. In ode to tkle this polem, we popgte shem onstints into queies to eliminte wildds nd desendnt edges, nd thus simplify the queies into queies in suset XP {/,[]}. Then fte hsing the simplified pttens in XP {/,[]}, estlished methods fo unions of queies without shem n e pplied. In this ppe, Ou ontiutions e highlighted s follows: We e the fist to investigte the ontinment polem fo unions of queies in XML ontext, ptiully on XPth queies. We show the polem n e lwys edued into ontinment polem etween one single quey nd union of queies. When shem is not ville, the polem n e futhe edued into heking ontinment etween two single queies (eh fom one union). Howeve, this esult only holds fo some simple XPth susets like XP {/,//,[]}, not fo XP {/,//,[], }. But fotuntely in XP {/,//,[], }, the polem is still solvle. When shem is ville, we suggest sttegy to ewite quey into union of simplified queies sed on shem infomtion, nd then pply the methods developed when shem is not onsideed. The polem is then edued into heking ontinment etween unions of queies in XP {/,[]} without shem. The est of this ppe is ognized s follows. In Setion 2, we will give some nottions nd kgound knowledge. Then we popose two impotnt theoems nd tkle the ontinment polem without shem in Setion 3. Shem infomtion will e tken into ount in Setion 4 to eliminte wildd nodes nd expnd desendnt edges. In Setion 5, we extend the disussion fom XPth queies into genel tee ptten queies. Relted wok is given in Setion 6. Finlly, we dw onlusion nd popose some futue wok in Setion 7.

2 Peliminies In this setion, we intodue some nottions nd kgound knowledge. XML douments nd XPth queies e modelled s tees nd tee pttens, nd evluting n XPth quey on n XML doument is modelled s mthing tee ptten to tee. We lso fomulte the definition fo ontinment etween unions of XPth queies t the end of this setion. 2.1 XML Tees In the litetue, n XML doument is modeled s n unodeed tee 4 with nodes leled fom n infinite lphet Σ (Σ is finite, if shem is ville), the lel of eh node oesponds to n XML element, n ttiute nme o dt vlue, the oot node of the tee epesents the oot element in the doument. We slightly modify the model y dding new oot with unique lel Σ to the tee, seving s the doument node. In this wy, the oot node in the pevious model eomes (single) hild of this doument node, nd evey XML doument stts with oot node leled, see Definition 1. We will see the im of this modifition in the next susetion. We denote ll possile tees ove Σ s T Σ. Definition 1. An XML doument is tee t = (V t,e t, t,σ), whee V t is the node set, nd v V t, v hs lel in the lphet Σ, denoted s lel(v); E t is the edge set; t V t is the oot node of t, nd lel( t ) = ; 2.2 XPth Quey XPth is the oe sulss of XML quey lnguges. We onside suset of XPth fetuing hild xes (/), desendnt xes (//), nhes ([ ]), nd wildds (*). It n e epesented y the following gmm: p. l p/p p//p p[p] whee. denotes the uent ontext node, l is lel fom lphet Σ nd * epesents wildd lel. We denote this suset s XP {/,//,[], }. The esult of evluting n XPth expession p XP {/,//,[], } on tee t T Σ, denoted s p(t), is set of nodes in t. The fomlized semntis e given in [12] (omitted hee), whee the ontext node is fixed on the doument node if the ontext node is not expliitly speified. Like in [8], esides llowing the usge of. immeditely inside pedite [ ], we futhe llow. to ppe t the eginning of n expession to ptue the XPth queies stting with / o //. Fo exmple, queies // nd ////, whih do not onfom to the ove 4 Ode is ignoed in most pevious eseh woks, nd so it is in this wok.

gmm, i.e. ignoed in pevious woks (suh s [8,9,13]), n e now ewitten into.// nd.////, nd theefoe e ptued. Sine. lwys inspets the doument node (whih we intentionlly dded in the fome setion) y defult, the ewitten expessions oetly peseve the semntis. Moeove, XPth expessions stting with lel n e sfely ewitten into some expessions stting with. s well, eg, / equls to.//. In this wy, n XPth quey oesponds to unique tee ptten quey, see the following definition. Definition 2. An XPth quey q n e expessed s tee ptten (V q,e q, q,d q,σ q ), whee V q is the node set, nd v V q, v hs lel in finite lphet Σ q { }, denoted s lel(v); E q is the edge set, nd e E q,type(e) {/,//}. We use the tem pedge ( d-edge ) to epesent the type of n edge, / ( // ). q is the oot node of the quey, oesponding to the leding. tg in q (if the uent ontext node is not speified, then lel( q ) = ); d q is the nswe (lso lled distinguished o etun) node of the quey, identified with ile; The esult of evluting n XPth quey, equls to finding emeddings fom tee ptten quey q to tee t, whih n e epesented s q(t) = {f(d q ) f is some emedding fom p to t}. Emedding is defined s follows: Definition 3. An emedding fom tee ptten q = (V q,e q, q,d q,σ q ) to tee t = (V t,e t, t,σ) is mpping f : V q V t, stisfying: Root peseving: f( q ) = t ; Lel peseving: v V q, lel(v) = o lel(v) = lel(f(v)); Stutue peseving: e = (v 1,v 2 ) E q, if e is p-edge, f(v 1 ) is the pent node of f(v 2 ); e is n d-edge, f(v 1 ) is n nesto node of f(v 2 ) inluding the se f(v 1 ) eing the pent of f(v 2 ). In this wok, ll XPth queies e oseved s tee pttens fo ese of disussion, nd we will povide some disussions on genel tee pttens (queies with moe thn one distinguished nodes) in Setion 5. 2.3 Continment Fomultion Fo ny two tee ptten quey p nd q, p is sid to e ontined in q, denoted s p q, iff t T Σ,p(t) q(t). We now extend the definition to unions of queies. We use lowese lette nd n uppese lette to eflet single quey nd union of queies espetively. Let P = {p 1,p 2,,p m } e union of queies, the esult of this set of queies on tee t, denoted s P(t), is defined s p 1 (t) p 2 (t) p m (t). Fo two unions of queies P = {p 1,p 2,,p m } nd Q = {q 1,q 2, q n }, P is sid to e ontined in Q, iff t T Σ,P(t) Q(t) (i.e. p 1 (t) p 2 (t) p m (t) q 1 (t) q 2 (t) q n (t) ).

3 Continment without Shem In this setion, we will investigte ontinment polem etween unions of XPth queies without shem infomtion. We stt with eduing the polem into simplified fom nd intoduing oolen tee ptten. Afte tht, we endevo to solve the polem fo oolen tee pttens elonging to suset P {/,//,[]}. Finlly, lge suset with wildds, P {/,//,[], }, will e disussed. 3.1 Continment Redution To simplify the polem, we popose theoem to edue the ontinment heking etween two unions of queies into ontinment heking etween one single quey nd union of queies. The theoem lso holds when shem is t hnd. Theoem 1. Fo two unions of XPth queies P = {p 1,p 2,,p m } nd Q, we hve: P Q, iff p i P, p i Q. Poof. If: Given p i P, p i Q, we hve, fom definition, fo ny tee t T Σ, p i (t) Q(t). Theefoe, fo ny tee t T Σ, p 1 (t) p 2 (t) p m (t) Q(t), tht is P Q; Only if: Given P Q, we hve t T Σ,P(t) Q(t), i.e. p 1 (t) p 2 (t) p m (t) Q(t), thus p i, p i (t) p 1 (t) p 2 (t) p m (t) Q(t). The ide onveyed y the ove theoem is simple, ut it lys foundtion to hek union ontinment, euse we n lwys sfely simplify the left pt of the ompison into single quey. This leds to some futhe explotions nd osevtions in Setion 3.3 nd 3.4. 3.2 Boolen Tee Ptten Boolen ptten (shot fo oolen tee ptten) is tee ptten quey with no distinguished node. The esult of evluting oolen ptten p on tee t, p(t), is oolen vlue, eithe tue o flse. p(t) is tue, mens thee exists n emedding fom p to t, othewise p(t) is flse. Fo two oolen pttens, p nd q, we sy p is ontined in q, denoted s p q, iff t T Σ, p(t) q(t). Eh XPth tee ptten oesponds to unique oolen ptten, whih n e otined y dding hild node with distint lel x to the distinguished node, nd mke the distinguished node not outstnding ny moe (shown in Fig. 1). Let the oesponding oolen pttens of XPth pttens p nd q e p nd q espetively. Aoding to [14], p q iff p q. Consequently, fo ese of disussion, XPth tee ptten queies e onsideed s oolen pttens in the est of the ppe. Nottions p nd q will efe to oolen pttens fom now on, nd we no longe use p nd q. And the esult of union of oolen tee pttens Q = {q 1,q 2,,q n } on tee t n e then expessed in the fom of Q(t) = q 1 (t) q 2 (t) q n (t), fo Q(t) is oolen vlue. We lso use P {/,//,[]} to denote the oesponding oolen ptten suset fo XPth tee pttens in XP {/,//,[]}.

x p p Fig. 1. A XPth Tee Ptten p nd Its Coesponding Boolen Ptten p 3.3 Pttens in P {/,//,[]} As is shown in Theoem 1, heking ontinment etween unions of queies n lwys e edued into heking ontinment etween one single quey nd union of queies. Then one question ises: Cn the polem e futhe edued into heking ontinment fo nume of quey pis? Tht is, if quey is ontined in union of queies, does it men tht the single quey is ontined in some ptiul quey fom the union? The nswe is yes, ut the esult is estited to etin suset of queies. Fo simpliity, we illustte this esult (expessed s Theoem 2) within quey suset P {/,//,[]}, whih hs nhes nd desendnt xes, ut no wildds. We will point out lge quey suset whee the popety still holds fte poving the theoem. Theoem 2. Fo oolen ptten p nd union of oolen pttens Q = {q 1,q 2,,q n } in P {/,//,[]}, we hve: p Q, iff q i Q, suh tht p q i. Poof. The suffiient ondition is ovious. Now we will pove the neessy ondition y poving its ontpositive sttement, i.e. to show tht if p is not ontined in ny q i Q, then p nnot e ontined in Q. Befoe stting the poof, we fist intodue tehnique lled homomophism poviding suffiient nd neessy ondition to deide ontinment etween two single pttens in P {/,//,[]}. A homomophism fom one ptten p = (V p,e p, p,σ p ) to nothe q = (V q,e q, q,σ q ), is funtion h : V p V q stisfying the definition of emedding (given in Definition 3). The only diffeene is tht homomophism is mpping fom one quey ptten to nothe, while emedding is mpping fom ptten to dt tee. Aoding to Theoem 3 in [8], fo two oolen pttens p nd q in P {/,//,[]}, p q iff thee exists homomophism fom V q to V p. In othe wods, if p q, thee must exist node v i in V q, suh tht we nnot find ny homomophism h tht hs oesponding node h(v i ) in V p, stisfying lel peseving nd stutue peseving onditions w..t. nodes v i nd h(v i ). We ll suh node v i pivte node of q ginst p. We lso nme, on some pth in q (fom oot to lef), the fist pivte node s tnsitionl node. To pove the ontpositive sttement of the neessy ondition in Theoem 2, given q i Q,p q i, we ould onstut tee t, suh tht p(t) holds while q i (t) is flse. And hene p(t) does not imply Q(t) = q 1 (t) q 2 (t) q n (t), nmely p is not ontined in Q. The tee t n e onstuted s follows: eple eh d-edge in p with two p-edges nd n dditionl distint lel z.

Fo instne, // n e tnsfomed into /z/. Hee lel z does not ppe in ny Σ qi (i.e. z Σ n i=1 Σ q i ), whee Σ qi is the lphet of q i. Sine Σ is infinite (when thee is no shem ville) nd Σ qi is finite (euse the nume of lels in quey is limited), this tnsfomtion is lwys possile. Afte the tnsfomtion, it is stightfowd tht the esult tee t onfoms to ptten p, nd thus p(t) is tue. Howeve, fo ny q i, we n show q i (t) is flse. The eson is: sine p q i, thee must e some tnsitionl node v i in q i, suh tht fo the tnsitionl node, we nnot find oesponding node f(v i ) in t defined y ny emedding f fom q i to t. Othewise, if suh emedding f existed, we ould otin twin homomophism h fom q i to p sed on f. Hee the twin homomophism h would hve the sme mpping funtion s emedding f, euse, in f, no nodes in q i n e mpped onto z-nodes (nodes with distint lel z) in t. Theefoe, oesponding node h(v i ) in p would exist fo the homomophism. This esult ontdits with the ssumption tht v i is tnsitionl node. Rell tht, tnsitionl node in q i ould not mp onto ny node in p y ny homomophism, s esult, q i Q, q i (t) is flse, i.e. Q(t) = q 1 (t) q 2 (t) q n (t) is flse. In ddition, p(t) is tue, hene p(t) Q(t). The ontpositive sttement of the neessy ondition is poved. The omplexity of testing ontinment etween one ptten p nd union of pttens Q = {q 1,q 2,,q n } is O( p n i=1 q i ), ounded y O(n p q mx ), whee q mx is mx{ q i }. This is n immedite esult fom tht finding homomophism fom ptten q i to p is of omplexity O( p q i ), whee p, q i e the size (nume of nodes) of p nd q i espetively. Howeve, Theoem 2 only holds in P {/,//,[]}, o lge suset ˆP {/,//,[], } mentioned in [15]. ˆP {/,//,[], } efes to n XPth quey suset futhe inluding wildds, ut with two dditionl estitions: (i) no wildd node is inident with d-edges(//) nd (ii) thee is no wildd lef node. The eson tht Theoem 2 holds fo limited quey suset lies in the following spets: (1) If thee e wildd nodes in the pttens, homomophism only seves s suffiient (ut not neessy) ondition to detemine ontinment etween two pttens. We will give n exmple nd devise sttegy in the next susetion to del with pttens in P {/,//,[], }. (2) Moeove, if thee is shem ville, the theoem does not neessily hold in P {/,//,[]} s well, euse lphet Σ eomes finite, nd thee my not lwys exist distint lel z to tnsfom n d-edge // into /z/, nd thus the uent poof is not suffiient. We will onside queies onfoming to shem in Setion 4. 3.4 Pttens in P {/,//,[], } We fist give simple exmple to show tht Theoem 2 is not tue fo suset P {/,//,[], }. See in Fig 2, p q 1 nd p q 2, ut it is ovious tht p is equivlent to Q = q 1 q 2, euse if is desendnt of, then eithe is diet hild of o is desendnt of s hild. And p = Q implies p Q. The exmple shows tht sevel pttens my e omined togethe to ontin tget ptten, though none of them ould solely ontin the tget. This osevtion mkes

the polem omplite in P {/,//,[], }, fo it is diffiult to know whih pttens should e omined togethe to ontin the tget. * p:./// q1:.// q2:.//*// Fig. 2. Continment with Wildd Node Fotuntely, tking dvntge of Lemm 3 in [8], the ontinment polem etween one single ptten p nd union of pttens Q == {q 1,q 2,,q n } n e edued to heking ontinment etween two single pttens s well. The two single pttens n e onstuted esily, s is shown in Fig. 3, whee in oth p nd q is lel in Σ, T 0 T Σ is tee suh tht fo ny q i, q i (T 0 ) is tue. This n e hieved y fusing the oots of q i (this is possile euse they she the sme oot lel ), nd epling ll wildds with n ity lel, nd ll d-edges with p-edges. It hs een poved tht p Q(Q = q 1 q 2 q n ), iff p q. Sine p,q P {/,//,[], } nd deiding p q is onp-omplete, the ontinment polem etween unions of pttens in P {/,//,[], } is onp-omplete. Despite we nnot ek the intinsi omplexity esult fo suset P {/,//,[], }, we mnge to onvet the polem into one tht we hve solving sttegy. T0 T0 n-1 P T0 n-1 q1 q2 n qn p' T0 q' Fig. 3. Constutions of Ptten p nd q One my elize tht sine P {/,//,[]} is suset of P {/,//,[], }, we n, without lose of genelity, use the onstution method ove, to hek whethe p q using homomophism tehnique in ode to detemine ontinment eltionship etween p nd Q fo suset P {/,//,[]}. This intodues nothe sttegy to solve the polem fo suset P {/,//,[]}. The osevtion is tue, ut the dwk of the ove method is tht it is less effiient thn the method implied y Theoem 2. We now illustte it y nlyzing the lgoithm omplexity. Fo ptten p, it ontins 2(n 1) nume of T 0 tees, eh of whih hs the size n i=1 q i (see how to onstut T 0 in the pevious pgph), nd thus p is p +2(n 1) n i=1 q i.

The size of q is esy to get s n i=1 q i. The lgoithm omplexity O( p q ) is O(( p + 2(n 1) n i=1 q i ) n i=1 q i ), ounded y O(n p q mx + 2(n 1)n 2 q 2 mx), lge thn O(n p q mx ) given in the lst setion. To onlude, if thee is no wildd node in the quey ptten, it is ette to ompe p with queies in Q one y one. 4 Continment with Shem Given shem G, if fo ny tee t onfoming to G, we hve p(t) q(t), then we sy p is ontined in q unde G, denoted s p G q. Shem povides mens to define o onstin XML dt. A ptten p not ontined in ptten q in genel, my e ontined in q fo tees onfoming to etin shem. We show simple exmple in Fig. 4, p 1 hs wildd node, nd p 2 hs n d-edge, they e not ontined in Q = q 1 q 2 in genel. But if shem G is ville, ll the queies should onfom to G. It is not hd to see p 1 G p 2 G Q, in ft p 1 = G p 2 = G Q. * p1 p2 q1 d q2 G Fig. 4. Continment with Shem Infomtion Shem infomtion is usully modelled s egul expessions o few nume of onstints. The woks [9, 10] show tht ontinment etween pttens in P {/,//,[],,DTD} is EXPTIME-omplete, nd some moe theoetil esults w..t. vious ptten susets n e found in [11]. Sine the ontinment polem is ledy diffiult fo two single pttens, it is unlikely to hve n effiient method to detemine ontinment etween unions of pttens in P {/,//,[],,DTD}. The im of ou wok is not to ek the poved EXPTIME-omplete uppeound fo two queies, no to povide ny ext omplexity esults fo unions of queies, ut to eexmine the polem fom nothe ngle nd to suggest sttegy to hek ontinment etween two single pttens o unions of pttens with shem infomtion. The ide is to popgte DTD onstints into queies so s to eliminte wildds nd desendnt edges. Consequently, the polem ould e onveted into ontinment etween unions of simplified queies in P {/,[]}. Then, fte hsing pttens in P {/,[]} with DTD onstints, we n pply Theoem 1 nd 2 to evlute the ontinment eltionship. In ou ppe, we model the shem s dieted gph G. (we don t onside disjuntions in the shem.) G is DAG mens the shem is not eusive, othewise G will hve iles. We will onside G s DAG fist in Setion 4.1, 4.2 nd 4.3 nd will disuss eusive shem in Setion 4.4. d

4.1 Eliminting Wildds With shem ville, wildd node n e epled y speifi lels in Σ, s long s the esult ptten onfoms to the given shem. A nive method is to pik n ity lel in Σ fo eh wildd node, nd then to veify if the lely speified quey omplies with shem G. This method equies to veify Σ k (whee k is the nume of wildd nodes in ptten) nume of queies, nd is oviously not effiient. Algoithm 1 Algoithm fo Eliminting Wildd Nodes 1: fo ll node v tht is not -node in ptten p do 2: L(v) {lel(v)}; 3: end fo 4: fo ll lef -node v in p do 5: L(v) Σ; 6: end fo 7: Mk ll lef nodes nd ll non -nodes; 8: epet 9: fo eh -node x in p whose hilden x 1, x 2,, x k e ll mked do 10: fo i = 1 to k do 11: S i φ; 12: fo eh β L(x i ) do 13: if ((x, x i ) is p-edge nd thee is some α L(x) suh tht (α, β) is n edge in G) o ((x, x i ) is n d-edge nd thee is some α L(x) suh tht thee is pth fom α to β in G) then 14: S i S i {α}; 15: end if 16: end fo 17: end fo 18: L(x) k i=1 Si; 19: Mk x; 20: end fo 21: until ll -nodes in p e mked 22: Unmk oot p nd ll non -nodes; 23: epet 24: fo eh -node x in p whose pent x p is unmked do 25: fo eh β L(x) do 26: if ((x p, x) is p-edge nd thee is some α L(x p) suh tht (α, β) is n edge in G) o ((x p, x) is n d-edge nd thee is some α L(x p) suh tht thee is pth fom α to β in G) then 27: Add (β, α) into P(x); 28: else 29: Remove β fom L(x); 30: end if 31: end fo 32: Unmk x; 33: end fo 34: until ll -nodes e unmked

We popose n impoved lgoithm shown in Algoithm 1. The si ide is to use existing stutul infomtion in the quey to void wild guesses. It is inspied y [16] in whih simil ide ws used to test the stisfiility of tee ptten unde shem G. Hee, in ou senio, we need to eod detiled lel eltionships (pent-hild o nesto-desendnt) fo djent node pis, euse these eltionships ould e futhe utilized to tnsfom one quey with wildds into union of queies without wildds. Algoithm 1 sns the wildd nodes in p twie: ottom-up (line 8-21) nd then top-down (line 23-34). The ottom-up phse lultes set of possile lels L(x) fo eh wildd node x, using infomtion out possile lels of its hilden. The top-down phse futhe efines the set L(x) using infomtion out the pent lel of x. The pent-hild nd nesto-desendnt eltionships e eoded s lel pis in P(x). Note tht in the lgoithm, we omit, fo evity, the step to hek if L(x) = φ fte line 18 nd 29 (L(x) = φ mens ptten p does not onfom to G), sine it is not diffiult to implement. The lgoithm uns in O( p Σ 2 ): lines 1-7 uns in O( p ); the ottom-up phse visits eh node in p t most twie, nd within the loop, lines 12-16 n e done in O( Σ 2 ). And thus lines 8-21 un in O( p Σ 2 ). The top-down phse lso uns in O( p Σ 2 ). If thee is n index on shem G, p(α,β) o d(α,β) n e heked effiiently. 4.2 Eliminting d-edges Now we hve otined unions of queies without wildds. Howeve, Theoem 2 is still not suffiient to deide ontinment etween two sets of queies unde shem (eollet the exmple in Fig. 4). We need to eple ll the dedges with onete pths ompomising only p-edges, euse d-edges must e intepeted in speifi wys onstined y the shem. A nive method to expnd n d-edge (v 1,v 2 ), simil to eliminting wildd nodes, is to find ll the pths etween two lels lel(v 1 ) nd lel(v 2 ) in shem G, nd eple the d-edges with one of these onete pths. Oviously, thee my e mny wys to eple n d-edge, nd thus ptten onsisting d-edges will e tnsfomed into union of lge nume of pttens in P {/,[]}. Then with the follow-up tetment in Setion 4.3, one n detemine the ontinment eltionship fo unions of queies unde shem. To void geneting possily exponentil nume of pttens, ette solution is to wisely eple n d-edge (v 1,v 2 ) with sugph etween lel(v 1 ) nd lel(v 2 ) in G, denoted s G s (lel(v 1 ),lel(v 2 )). To define G s (lel(v 1 ),lel(v 2 )) fomlly, G s (lel(v 1 ),lel(v 2 )) = { node v v G v is ehle fom lel(v 1 ) lel(v 2 ) is ehle fom v}. As long s the given ptten onfoms to G, i.e. lel(v 2 ) is ehle fom lel(v 1 ) in G, sugph G s (lel(v 1 ),lel(v 2 )) will lwys exist, no mtte G hs iles o not. In ddition, to find G s (lel(v 1 ),lel(v 2 )) is not expensive. It inludes top-down tvese in G fom lel(v 1 ) to lel(v 2 ), nd ottom-up tvese fom lel(v 2 ) to lel(v 1 ). The ide is simil to Algoithm 1, nd hene we omit the detils.

4.3 Chsing Pttens in P{/, []} Now we hve wildds nd d-edges eliminted, nd ll the pttens tnsfomed into P {/,[]}. To edue the polem into one without shem, we hve the lst step to hse the pttens in P {/,[]} s muh s possile with siling onstints nd funtionl onstints [9]. When G is not eusive, the poess is not diffiult, sine the esult pttens (fte hsing) should e finite. The polem then onvets into heking ontinment fo unions of queies in P {/,[]} without shem. Theefte, we n pply Theoem 1 nd 2 to solve it. Note tht, fte expnding d-edges, ptten my eome DAG the thn igoous tee ptten in P {/,[]}. ut the hse poess is the sme exept tht we my not pply siling onstints t some node whose hild nodes e following o-semntis, euse these hild nodes e expnded fom n d-edge expessed s ltentive pths (o sugphs), mking siling onstints not stisfied on suh node. On the othe hnd, when homomophism is used to detet ontinment etween suh o-semnti pttens, finl step needs to e dded. Fo exmple, when we ondut mpping fom ptten p to q, we n dw the onlusion q p with two futhe onditions holding: (1) fo evey sugph hsed fom G s (lel(v 1 ),lel(v 2 )) in p, one of its sugph onneting v 1 nd v 2 must e mpped on to q; (2) fo evey two nodes v 1 nd v 2 with n d-edge in q, if v 2 is mpped, then evey v 2 s nesto (on ll ltentive sugphs) should e mpped y some node in p. 4.4 Reusive Shem One hllenge ises: if the shem is eusive, ptten n e hsed ontinuously without stop, nd /-pth (pth only onsisting of p-edges) my ontin ile epeted fo ny times. In suh ses, we llow the loop to ppe one in the hsed ptten to keep tk of the nodes in ile, nd we lso tg the loop stt node nd loop end node. Now we e le to ewite quey in P {/,//,[],,DTD} into union of finite nume of queies in P {/,[]}. Theoem 2 will then e suffiient to deide the ontinment. A ondition needs to e dded when to find homomophism fom ptten q to ptten p (p, q e in P {/,[]} with loop stt node nd loop end node tgged): if v 1 nd v 2 e the loop stt node nd loop end node in q, thee must exist ile in G with lels lel(h(v 1 )), lel(h(v 2 )) s stt nd end espetively. Hee, h(v 1 ) nd h(v 2 ) my not e loop stt nd end nodes in ptten p. Fig. 5 shows n exmple. p 0 nd q 0 e two queies involving d-edges. G is shem ontining ile. Afte expnding d-edges nd hsing with shem G fo p 0 nd q 0, we get pttens p nd q. In ptten q, // is expnded nd hsed into // with node v 1 lelled, node v 2 lelled s the loop stt nd loop end. Similly, // is expnded nd hsed into // in p. Consideing the esult pttens p nd q, thee is homomophism fom q to p, nd moeove, nodes h(v 1 ) nd h(v 2 ), the oesponding nodes in p of loop stt v 1 nd loop end v 2 in q, hve lels = lel(h(v 1 )) nd = lel(h(v 2 )) tht e stt nd end nodes of ile in G. Theefoe, ptten p is ontined in ptten q, nd thus the oiginl ptten p 0 is ontined in q 0.

Note tht the ondition does not equie h(v 1 ) nd h(v 2 ) e the loop stt nd loop end nodes in p. p0 h(v1) h(v2) (stt) (end) p v1 (stt) v2 (end) q Fig. 5. Expnding Edges unde Reusive Shem In the ove disussion, we ssume tht thee e no inteseted iles in G, i.e. the eusive loops hve no ovelps. This ssumption oviously simplifies the polem, nd it is still inteesting nd hllenging to investigte the ontinment polem of unions of pttens unde omplex eusive shem. q0 G 5 Disussions on Genel Tee Ptten Queies Diffeent fom XPth tee pttens, genel tee ptten queies my ontin moe thn one distinguished nodes. This my dd diffiulty to ontinment heking in some iumstne. In ft, it is due to multiple distinguished nodes tht ontinment etween genel tee pttens is not the sme s ontinment etween oolen pttens. We illustte the osevtion y fistly eviewing Poposition 1 in [8] nd then show when the poposition is not oet nd why. To estte it: let two genel tee pttens e p nd q, we n otin two oolen pttens p nd q y dding distint lels l 1,,l k to the k distinguished nodes in p nd q espetively, then we hve p q iff. p q. Howeve, one ould disove tht the poposition only holds fo output-ode-sensitive queies. In othe wods, the distinguished nodes should hve fixed ode so tht we n lel them in unique wy. See Fig. 6 fo n exmple. Ptten q hs two distinguished nodes nd. Suppose thee is no pedefined ode fo the distinguished nodes, to tnsfom q into oolen ptten, we hve two lelling shemes shown s q 1 nd q 2 espetively. Oviously, q 1 nd q 2 e not identil. Theefoe, in suh sitution, genel tee ptten quey should e tnsfomed into unions of oolen pttens, the thn single ptten. Hene if we hve k distinguished nodes, we will hve k! wys to lel them, esulting in k! oolen pttens to epesent genel tee ptten. This will signifintly omplite ontinment detetion if the nume of distinguished nodes is lge. Lukily, fo XPth queies, thee is only one distinguished node, so the oolen ptten evolved fom its oesponding XPth ptten is unique. And the onlusions we got in the ove setions e still oet. In some el pplitions, genel tee ptten queies indeed ognize the distinguished nodes in fixed ode, suh s tee ptten queies indued fom XQuey queies. But we should keep in mind

tht ontinment etween genel tee pttens nd oolen pttens e not lwys the sme. l1 l2 l1 q q1 q2 Fig. 6. Diffeent Leling Sheme fo Genel Tee Ptten Quey l2 6 Relted Wok Quey ontinment ws fist put fowd togethe with quey equivlene in ode to optimize quey evlution in eltionl ontext [1], whee ontinment polem is studied fo two queies ontining selet, pojet nd join opetos. Lte, ontinment of unions of queies is disussed in [6]. It povided suffiient nd neessy ondition showing tht ontinment etween unions of queies n e edued into ontinment etween nume of piwise queies. In [17], the uthos showed if eltions e modelled s multisets of tuples, the pevious suffiient nd neessy ondition holds only fo one type of lel system, while fo nothe type of lel system, the ontinment polem is undeidle. Unfotuntely, the estlished theoy fo eltionl queies nnot e pplied in XML ontext. The ontinment polem etween unions of XPth queies is still open, though fuitful esults hve een podued fo ontinment etween two single queies. In some pionee woks, the polem ws shown in PTIME fo XPth susets XP {/,[], }, XP {/,//,[]} nd XP {/,//, }, nd futhemoe poved to e onp-omplete fo XP {/,//,[], } in [8]. When shem is ville, the polem tuned out to e moe diffiult, euse dt tees e onstined oding to ptiul ptten, nd thus XPth queies with wildd nodes nd d-edges nnot e intepeted itily. Wood [9] nd Neven nd Shwentik [10] independently showed the ontinment etween two XPth queies in XP {/,//,[],,DTD} is deidle, in ft is EXPTIME-omplete. Neven nd Shwentik [10] lso disussed disjuntion nd viles in XPth. Moe theoetil esults with espet to vious XPth quey susets e summized in [11]. A ihe XPth fgment, XPth2.0, is eently exmined in [18]. 7 Conlusions nd Futue Wok In this ppe, we hve ddessed the ontinment polem etween unions of XPth queies. We showed tht the polem n e lwys edued into ontinment etween one quey nd union of queies. We lso poved tht, fo XPth

suset XP {/,//,[]}, the polem n e edued into heking ontinment etween two single queies, eh fom one union. Fo lge suset XP {/,//,[], }, we utilize n existing tehnique to develop n effetive sttegy to solve the polem. When shem is ville, we ould use the shem to eliminte wildd nodes nd expnd d-edges in the quey so tht ou developed theoem ould e pplied theefte to deide ontinment eltionship etween unions of queies unde shem infomtion. One dietion fo futue wok is to onside moe omplited eusive eltionships in shem, eg. two iles my hve intesetions. This is lwys diffiult polem, my esult in hsing pttens in P {/,[]} the hllenging. Aknowledgments. This wok ws suppoted y the Austlin Reseh Counil Disovey Pojet unde the gnt nume DP0878405. Refeenes 1. Aho, A.V., Sgiv, Y., Ullmn, J.D.: Equivlenes mong eltionl expessions. SIAM J. Comput. 8(2) (1979) 218 246 2. Hlevy, A.Y.: Answeing queies using views: A suvey. VLDB J. 10(4) (2001) 270 294 3. Gupt, A., Sgiv, Y., Ullmn, J.D., Widom, J.: Constint heking with ptil infomtion. In: PODS. (1994) 45 55 4. Levy, A.Y., Sgiv, Y.: Queies independent of updtes. In: VLDB. (1993) 171 181 5. Chnd, A.K., Melin, P.M.: Optiml implementtion of onjuntive queies in eltionl dt ses. In: STOC. (1977) 77 90 6. Sgiv, Y., Ynnkkis, M.: Equivlenes mong eltionl expessions with the union nd diffeene opetos. J. ACM 27(4) (1980) 633 655 7. Clk, J., DeRose, S.: XML pth lnguge (XPth) 1.0. In: W3C Reommendtion, http://www.w3.og/tr/xpth (Noveme 1999) 8. Miklu, G., Suiu, D.: Continment nd equivlene fo fgment of XPth. J. ACM 51(1) (2004) 2 45 9. Wood, P.T.: Continment fo XPth fgments unde DTD onstints. In: ICDT. (2003) 297 311 10. Neven, F., Shwentik, T.: XPth ontinment in the pesene of disjuntion, DTDs, nd viles. In: ICDT. (2003) 312 326 11. Shwentik, T.: Xpth quey ontinment. SIGMOD Re. 33(1) (2004) 101 109 12. Wdle, P.: A foml semntis of pttens in xslt nd xpth. Mkup Lng. 2(2) (2000) 183 202 13. Ame-Yhi, S., Cho, S., Lkshmnn, L.V.S., Sivstv, D.: Tee ptten quey minimiztion. The VLDB Jounl 11(4) (2002) 315 331 14. Wng, J., Yu, J.X., Liu, C.: On tee ptten quey ewiting using views. In: WISE. (2007) 1 12 15. Wng, J., Yu, J.X., Liu, C.: Contined ewitings of xpth queies using views evisited. In: WISE. (2008) 410 425 16. Lkshmnn, L.V.S., Rmesh, G., Wng, H., Zho, Z.J.: On testing stisfiility of tee ptten queies. In: VLDB. (2004) 120 131 17. Ionnidis, Y.E., Rmkishnn, R.: Continment of onjuntive queies: eyond eltions s sets. ACM Tns. Dtse Syst. 20(3) (1995) 288 324 18. ten Cte, B., Lutz, C.: The omplexity of quey ontinment in expessive fgments of xpth 2.0. In: PODS. (2007) 73 82