Data Exchange with Arithmetic Comparisons

Size: px
Start display at page:

Download "Data Exchange with Arithmetic Comparisons"

Transcription

1 Data Exchange with Arithmetic Comparisons Foto Afrati 1 Chen Li 2 Vassia Pavlaki 1 1 National Technical University of Athens (NTUA), Greece {afrati,vpavlaki}@softlab.ntua.gr 2 University of California, Irvine, USA chenli@ics.uci.edu Abstract In this paper, the problem of data exchange in the presence of arithmetic comparisons is studied. For this purpose, tuple generating dependencies with arithmetic comparisons (tgd-ac) and arithmetic comparison generating dependencies (acgd) are introduced to capture the need to use arithmetic comparisons in source-totarget and in target constraints (both in their antecedent and in their consequent). Arithmetic comparisons are interpreted over densely totally ordered domains. A chase procedure called AC-chase is introduced to handle such constraints. AC-chase is a tree rather than a sequence and its depth is polynomial on the size of the source instance if the set of target constraints is the union of a set of target acgds with a weakly acyclic set of target tgd-acs. It is shown that AC-chase computes a universal solution which can be used to compute certain answers for unions of conjunctive queries with arithmetic comparisons (UCQAC). The complexity of existence of a solution is shown to be in NP. The complexity of computing certain answers of UCQAC queries is shown to be conp-complete. In order to identify polynomial cases, we introduce the succinct AC-chase which however does not always produce a solution. We identify cases where its result is a universal solution and we investigate the syntactic conditions of the query under which we can use the result of succinct AC-chase to compute the certain answers in polynomial time. We show that the latter is also feasible in cases where the result of succinct chase is not a universal solution. 1 Introduction Data exchange has been recognized as a fundamental problem decades ago and has recently attracted the interest of many researchers (e.g., [AL05, Ber03, FKMP03, FKP03, Got05, GN06, KPT06, Lib06, VR07]). In the data exchange problem, we want to take data structured under a schema, called the source schema, and transform it into data structured under another schema, called the target schema, for the purpose of being able to answer queries posed over the target schema. In this paper we investigate the data exchange problem in the presence of arithmetic comparisons which is admittedly a much harder problem. The following is an example of such a problem. A real estate company, called A, has a database S that stores information about apartments and owners in Orange County, California. Fig. 1 shows its two tables, namely Home and Owner. Due to business reasons, the company needs to send some of its data to another real estate company, called B, which has a database T with two tables: Apartment(aID, apartaddr, price, zip, name); Person(name, Addr). The table Apartment contains information about apartments, including the ID (aid), address (apartaddr), price (price), zip code (zip), and owner s name (name). The table Person stores information about the names (name) and addresses (Addr) of owners. Thus, in order to facilitate the data transfer from the database of company A to the database of company B, we need to transform automatically the data from the source schema to the Work partially done when all authors were visiting Stanford University. The project is co - funded by the European Social Fund (75%) and National Resources (25%) - Operational Program for Educational and Vocational Training II (EPEAEK II) and particularly the Program HERAKLEITOS.

2 target schema. Fagin et al. [FKMP03] showed that tuple-generating dependencies (tgds) and equality-generating dependencies (egds) are very powerful to define many kinds of semantic correspondences between the schemas and, hence, they can be used to transform data in a data exchange problem. Intuitively, a tuple generating dependency expresses that if some facts exist in a database, then some other facts should also exist. Thus, in our example, the following two tgds could facilitate the data transformation: d st1 : Home(I, A, P) Apartment(I, A, P, Z, N) and d st2 : Owner(I, N, A) Person(N, A). Home: aid apartaddr price Main St Alton Ave Alton Ave 800 Owner: aid name Addr 13 Tom Smith Highland Ave 54 John Parker Franklin Ave 43 Ken Johnson Culver Dr Figure 1: Source database at company A. However, suppose we want to transfer data only about apartments with price 500 thousand dollars. Suppose also that we know that the zip code of apartments in company A is less than (e.g., because we know that all apartments are in Orange County), and we should store this information in the database of company B. Tgds and egds cannot handle either of these two cases. In this paper, we introduce tuple generating dependencies that use also arithmetic comparisons, called tuple-generating dependencies with arithmetic comparisons (tgd-acs in short), by extending the class of tgds to include comparison predicates. We also extend the class of egds to allow for any arithmetic comparison on the consequent and we call them arithmetic-comparison-generating dependencies (acgds in short). The need for including arithmetic comparisons to data dependencies has been observed in earlier work [BCW99, IO93, MS96, Mah97]. In our example, the following two tgd-acs will transform the data while satisfying also the two conditions above: d st1 : Home(I, A, P) P 500 Apartment(I, A, P, Z, N) Z > and d st2 : Owner(I, N, A) Person(N, A). Finally, besides using tgd-acs and acgds to declare how we want the data to be transformed from the source schema to the target schema (source-to-target constraints), we may also use them to declare that some constraints are required to be satisfied on the target data (target constraints). The data exchange problem that we investigate is the following: We are given a source schema S, a target schema T and a set of source-to-target constraints and target constraints. For any source instance, we want to find a target instance which satisfies the constraints (existence of solution problem). Moreover, for a given query q, we want to find the certain answers of q (query answering problem). These problems and their complexity are investigated in [FKMP03] for the case of constraints that are tgds and egds and for conjunctive queries and conjunctive queries with inequalities ( ). In the present paper, we investigate the complexity of these problems for the case of constraints that are tgd-acs and acgds and for conjunctive queries with arithmetic comparisons (<,, >,,, =). Summary of Results: It is known that reasoning on logical formulas with arithmetic comparisons is more complicated than logical formulas without them. More specifically, many of the technical tools used to solve data exchange problems are closely related to query containment tests. Conjunctive query containment is recognized as a much simpler problem (NP-complete) than containment of conjunctive queries in the presence of arithmetic comparisons (Π P 2 -complete) [CM77, Klu88, Mey92]. Thus, in our setting we have to deal with all additional complications that are introduced by the presence of arithmetic comparisons. The organization of the paper and our contributions are as follows: In Section 2, we define the concepts that we use. Since arithmetic comparisons are interpreted over totally ordered domains, the concept of a tgd-ac being satisfied on a database instance which includes labeled nulls is not straightforward. Hence, new notions are needed. The notion of universal solution (which is used to compute certain answers of queries) is extended to be a set of solutions rather than a single solution. Interestingly, also in our setting universal solutions have the property that any two of them are equivalent, if viewed as unions of conjunctive queries with arithmetic comparisons, (Proposition 2.1) which is an extension of the property noticed in [FKMP03]. In Section 3, we define a novel chase procedure called AC-chase which is a tree rather than a sequence and can handle tgd-acs and acgds. We show that the result of AC-chase is a universal solution (Theorem 3.1). We

3 Constraints Query Language Complexity Reference tgds and egds UCQ with two per CQ conp-hard [FKMP03] tgds and egds CQ with seven conp-hard [AD98] tgds and egds CQ with two conp-hard [Mad05] tgds and egds UCQ with inequalities ( ) conp [FKMP03] tgd-acs and acgds UCQAC conp Section 4 tgds and egds UCQ, at most one per CQ PTIME [FKMP03] full tgd-acs UCQAC, each CQAC has the PTIME Section 5 and acgds homomorphism property tgd-acs and acgds UCQAC, each CQAC has the PTIME Section 5 no target constraints homomorphism property tgds and egds UCQAC, each CQAC uses open PTIME Section 5 (closed) LSI (RSI) comparisons only Table 1: Complexity results on query answering in data exchange for weakly acyclic dependencies. CQ = conjunctive query; UCQ = union of conjunctive queries; (U)CQAC = (union of) conjunctive query (queries) with arithmetic comparisons. extend the definition of weakly acyclic set of tgds to define weakly acyclic set of tgd-acs. We prove that for a set of weakly acyclic set of tgd-acs and acgds, the AC-chase always produces a tree of polynomial length (on the size of the input source instance). From that, we derive the complexity results, which show that the problem of existence of solution is in NP (Theorem 3.2). In Section 4, we show that we can compute certain answers of unions of conjunctive queries with arithmetic comparisons using the result of the AC-chase (Theorem 4.1(1)). Again, since arithmetic comparisons are interpreted over totally ordered domains, computing such queries on a database instance with labeled nulls is a challenge and the definition of some new concepts is needed. In particular, we first define the concept of computing certain answers on an instance. We prove that the data complexity of the problem of computing certain answers for the data exchange problem with arithmetic comparisons is co-np complete (Theorem 4.2). Interestingly, we have been able to extend another observation made in [FKMP03] about properties of universal solution (Theorem 4.1(2)). In Section 5, we identify polynomial cases of the problem. For that, we define another chase procedure called succinct AC-chase which is a sequence and whenever it uses acgds and weakly acyclic tgd-acs, its length is polynomial (on the size of the source instance) (Theorem 5.1). However, it does not produce always a universal solution. We show that universal solutions are produced by the succinct AC-chase in two cases: (a) when there are no target constraints and (b) when we use full tgd-acs (i.e., without existentially quantified variables)(theorem 5.2). On query answering we show that whenever the query has the homomorphism property then we can compute certain answers in polynomial time (Lemma 5.1) on a given instance. A consequence of that is Theorem 5.4 which settles the problem of computing certain answers in polynomial time for the cases where succinct AC-chase produces a universal solution. However, we identify an additional case where certain answers of queries with the homomorphism property can be computed in polynomial time on the result of the succinct AC-chase, although the latter is not a universal solution it is when we use only tgds and egds and the query has a certain property (Theorem 5.5) 1. A summary of all our polynomial results can be found in Section 5.5. Table 1 summarizes existing complexity results on computing certain answers. The usefulness of the homomorphism property has been observed for reducing the complexity of the query containment problem [ALM04, Klu88] in the presence of arithmetic comparisons. It contains large classes of queries such as queries which use only comparisons of the form X < c where X is a variable and c is a constant. 1 Note our notion of solution does not coincide with that in [FKMP03]. Hence, although when we use tgds and egds the succinct AC-chase reduces to the chase, its result does not qualify as solution. This remark is implicitly made in [FKMP03] by noticing that this result cannot be used to compute certain answers of queries with and hence the disjunctive chase was introduced there.

4 2 Data Exchange with Arithmetic Comparisons. In this section we define a) dependencies with arithmetic comparisons, b) data exchange setting with arithmetic comparisons, and c) the concepts of solution and universal solution. We prove the first important result which says that universal solutions as defined here have good properties in that any pair of universal solutions are equivalent if viewed as queries (Proposition 2.1). For that, we review the query containment problem for conjunctive queries with arithmetic comparisons. 2.1 Dependencies with Arithmetic Comparisons. We first define the class of arithmetic comparison generating dependencies, which extends equality-generating dependencies (egds). We then define the class of tuple-generating dependencies with arithmetic comparisons, which extends tuple-generating dependencies (tgds). We consider arithmetic comparison predicates of the form AθB, where A is a variable, B is a variable or a constant, and θ is in {,, <, >,, =}. They are interpreted over densely totally ordered domains. We call an arithmetic comparison open if its operator is < or >, and closed if its operator is or. We call an arithmetic comparison AθB semi-interval (SI in short) if either A or B is a constant. If B is a constant we call it left semi-interval and if A is a constant we call it right semi-interval. definition 2.1. (acgd & tgd-ac) Let D be a database schema. An arithmetic-comparison-generating dependency (acgd in short) is a first-order formula of the form x ( φ(x) β φ (x 1 θ x 2 )). x 1 is a variable from x and x 2 is either a variable from x or constant. A tuple-generating dependency with arithmetic comparisons (tgd-ac in short) is a first-order formula of the form x ( φ(x) β φ yψ(x, y) β ψ ). In the formulas, φ(x) is a conjunction of atomic relational formulas over D, with variables in x. Each variable in x occurs in at least one formula in φ(x). In addition, ψ(x, y) is a conjunction of atomic relational formulas with variables in x and y, and each variable in y occurs in at least one formula in ψ(x, y). Also β φ and β ψ are each a conjunction of comparison predicates with variables from φ(x) and ψ(x, y), respectively. In general, a conjunctive formula with arithmetic comparisons is denoted as F = F 0 + β F, where F 0 contains the relational atoms and β F the comparisons of F. A conjunctive formula is in compact form if the conjunction of its comparisons β F do not imply equalities. 2.2 Preliminary Definitions. Let S ={S 1,..., S n } and T = {T 1,..., T m } be two disjoint schemas, where each element in the sets is a relation. A source-to-target dependency in Σ st is a tgd-ac of the form: x ( ) φ S (x) β φ yψ T (x, y) β ψ. A target dependency is a dependency over the target schema T. Every target dependency in Σ t is either a tgd-ac of the form x ( ) φ T (x) β φ yψ T (x, y) β ψ, or an acgd of the form x ( φ T (x) β φ (x 1 θx 2 )). In source-to-target and target dependencies, φ S (x), ψ T (x, y), and φ T (x) are each a conjunction of atomic relational formulas over S or T respectively, and β φ and β ψ are conjunctions of comparison predicates. A data exchange setting with arithmetic comparisons, denoted M = (S, T, Σ st, Σ t ), consists of a source schema S, a target schema T, a set Σ st of source-to-target dependencies, and a set Σ t of target dependencies. We use Const to denote the set of constants that may occur in the database. We assume constants are from an infinite totally ordered dense domain such as the rationals or reals. We assume an infinite set Var of values, called labeled nulls, such that Const Var =. We define instances as conjunctive formulas with arithmetic comparisons using constants and labeled nulls. For an instance K, we use Var(K) and Const(K) to denote the set of labeled nulls and the set of constants in K, respectively. An instance K where Var(K) is empty, is called ground instance. We use Const(Σ st Σ t ) to denote the set of constants appearing in source-to-target and target dependencies. If M = (S, T, Σ st, Σ t ) is a data exchange setting with arithmetic comparisons, and I is a ground instance of S, then an instance K is called a t-instance with respect to M and I, or simply a t-instance (if M and I are understood from the context), if the arithmetic comparisons of K define a total order with respect to C = Const(K) Var(K) Const(Σ st Σ t ) Const(I). A t-instance K is called a t-instance induced by an instance K if K results from K by adding to K additional arithmetic comparisons so that a total order is defined among the labeled nulls and constants in C, possibly with some labeled nulls or constants equated. We define the

5 concept of t-homomorphism as a homomorphism which a) satisfies the arithmetic comparisons and b) it maps a conjunctive formula on a t-instance. The formal definition follows. definition 2.2. (t-homomorphism) Let F = F 0 + β F be a conjunctive formula and K = K 0 + β K be a t-instance in compact form over the same schema. A t-homomorphism from F to K is a mapping h from Const(F ) Var(F ) to Const(K) Var (K) with the following properties: Every constant in F is mapped to itself. For every fact R(x 1,..., x k ) in F we have R(h(x 1 ),..., h(x k )) in K. β K logically implies h(β F ), denoted by β K h(β F ). We say F t-homomorphically maps to K if there is a t-homomorphism from F to K. Note that a t-homomorphism h from F to K directly implies a homomorphism h 0 from F 0 to K 0. We call h 0 the underlined homomorphism of h. There might not be a ground instance corresponding to an instance. We say an instance K is consistent if there is a non-empty ground instance K such that there is a t-homomorphism from K to K. We check whether an instance K is consistent by constructing the inequality graph using the arithmetic comparisons and finding cycles in this graph [Klu88]. Using the cycles in the inequality graph, we can also convert a t-instance to a compact t-instance. It can be done in polynomial time. Example 2.1 illustrates the concepts defined here such as instance, t-instance, ground instance. example 2.1. Let M = (S, T, Σ st, Σ t ) be a data exchange setting with arithmetic comparisons in which S has a relation E(A, B), T a relation H(C, D), Σ t =, and Σ st = {E(X, Y), X < Y Z(H(X, Z), H(Z, Y), Z < X)}. Let I = {E(2, 5), E(4, 7)} be a ground instance of the source S. Consider the following instances. J 1 = {H(2, z 1), H(z 1, 5), H(4, z 2), H(z 2, 7), z 2 < z 1 < 2}; J 2 = {H(2, z 1 ), H(z 1, 5), H(4, z 2 ), H(z 2, 7), z 1 = z 2 < 2}; J 3 = {H(2, z 1 ), H(z 1, 5), H(4, z 2 ), H(z 2, 7), z 1 < z 2 < 2}; J 4 = {H(2, z 1), H(z 1, 5), H(4, z 2), H(z 2, 7), z 1 < 2 = z 2}; J 5 = {H(2, z 1 ), H(z 1, 5), H(4, z 2 ), H(z 2, 7), z 1 < 2 < z 2 < 4}; J 6 = {H(2, z 1), H(z 1, 5), H(4, z 2), H(z 2, 7), z 1 < 2, z 2 > 7}; J 7 = {H(2, z 1 ), H(z 1, 5), H(4, z 2 ), H(z 2, 7), z 2 < z 1 < 3}; J 8 = {H(2, z 1), H(z 1, 5), H(4, z 2), H(z 2, 7), z 1 < 2, z 2 2}; J 9 = {H(2, z 1 ), H(z 1, 5), H(4, z 2 ), H(z 2, 7), z 1 < 2, z 2 < 4}; J 10 = {H(2, 1), H(1, 5), H(4, 0), H(0, 7)}; J 11 = {H(2, 2), H(2, 5), H(4, 0), H(0, 7), H(8, 7)}. J 1 is a t-instance, since its arithmetic comparisons define a total order of its labeled nulls (z 1 and z 2 ), its constants (2, 4, 5, and 7), and the constants in I (there is no constant in the tgd-ac). Similarly, J 2, J 3, J 4, J 5, and J 6 are also t-instances. J 7, J 8, and J 9 are instances, but not t-instances, since the arithmetic comparisons of each of them do not define a total order of its values (constants and labeled nulls). Both J 10 and J 11 are ground instances over T. There is a t-homomorphism h from J 7 to J 1, where h is the identity mapping from Const Var(J 7 ) to Const Var(J 1 ). In particular, the mapping turns z 2 < z 1 < 3 to z 2 < z 1 < 3, which is implied by z 2 < z 1 < 2. Similarly, J 1 t-homomorphically maps to the ground instance J 10. There is no t-homomorphism from J 6 to J 1. Finally, J 1, J 2, J 3, and J 4 are all the t-instances induced by J 8. J 1 is a compact t-instance, since its comparisons do not imply equalities. J 2 is not, since it has an equality z 1 = z 2. A compact t-instance of J 2 is J 2 = {H(2, z 1 ), H(z 1, 5), H(4, z 1 ), H(z 1, 7), z 1 < 2}, which is obtained from J 2 by replacing z 2 with z 1. Let d be a tgd-ac or an acgd, and K be a t-instance of the corresponding schema. K satisfies d if whenever there is a t-homomorphism h from the left-hand side (lhs in short) of d to K, there exists an extension h of h, where h is a t-homomorphism from the conjunction of the lhs and the right-hand side (rhs in short) of d to K. Notice that for an egd or a tgd, there is always a non-empty database on which the dependency is satisfied. This property does not hold for tgd-acs and acgds. We call a tgd-ac or an acgd consistent if there exists a non-empty database instance on which this dependency is satisfied.

6 2.3 T-solutions, universal t-solutions, universal solutions. Let M = (S, T, Σ st, Σ t ) be a data exchange setting with arithmetic comparisons, and I a ground instance of the source schema S. We say an instance J over T is a t-solution for I under M (or, simply a t-solution if I and M are clear from the context), if J is a t-instance, and the instance I, J = I J satisfies Σ st Σ t. A t-solution that is a ground instance is called a ground solution. We say an instance J of T is a solution for I under M (or, simply a solution if I and M are clear from the context), if every t-instance induced by J is a t-solution. A universal solution for I under M, or simply a universal solution is a set J = {J 1,..., J m } of solutions for I, such that for every ground solution K, there is a solution J i J that t-homomorphically maps to K. A universal t-solution is a universal solution which contains only t-solutions. example 2.2. In Example 2.1, the set J 1 = {J 1,..., J 5 } is a universal t-solution. J 2 = {J 5, J 8 } and J 3 = {J 9 } are two universal solutions. It is easy to check that the source-to-target constraint is satisfied on the union of I and every t-instance in J 1. Also, since J 1,..., J 5 are all the induced instances of J 8, the constraint is also satisfied on J 8. How to prove that J 1 is a universal solution is explained in the next section (Example 3.3). How to prove that J 3 is a universal solution is explained in Section 5 (Example 5.2). If J 3 is a universal solution, then J 2 is also a universal solution because if there is a t-homomorphism from an instance F to a t-instance K, then there is some induced t-instance of F which t-homomorphically maps on K (the proof can be found in Lemma 3.2). To every instance K we associate a boolean conjunctive query with arithmetic comparisons q K, whose body contains exactly all the facts in K as subgoals (labeled nulls are replaced by variables), and the (possibly partial) order is added to q K as arithmetic comparison predicates. The predicate in the head uses a fresh predicate name. We call q K the corresponding boolean query of K. To a set of instances we associate a corresponding boolean query by creating a boolean query for each instance with the same head predicate and taking the union. Proposition 2.1 shows that the universal solution has good properties. In what follows we use SOL(M, I) to denote the set of all solutions for I. Proposition 2.1. Let M = (S, T, Σ st, Σ t ) be a data exchange setting with arithmetic comparisons. The following hold. 1. If J and J are two universal solutions for a source instance I, then the corresponding boolean queries q J and q J are equivalent as queries. 2. Let I and I be two source instances, J be a universal solution for I, and J be a universal solution for I under the same data exchange setting M. Then, SOL(M, I) SOL(M, I ) iff q J is contained in q J. Moreover, SOL(M, I) = SOL(M, I ) iff q J and q J are equivalent as queries. The proof of this proposition uses techniques from query containment tests. Subsections 2.4 and 2.5 review the techniques for checking containment between two conjunctive query with arithmetic comparisons (and between unions of them) and in Subsection 2.6 we prove Proposition 2.1. We consider query containment over densely totally ordered domains. The following is an example which illustrates why containment is different if interpreted over discrete domains. example 2.3. Consider queries Q 1 () : r(x), 3 < X < 4 and Q 2 () : r(x), 13 < X < 14. If the comparisons are interpreted over the integers then they are both empty on all databases, hence they are equivalent. However if comparisons are interpreted over the reals then on database {r(3.5)}, query Q 1 computes to true whereas query Q 2 computes to false. 2.4 Containment of Conjunctive Queries with Arithmetic Comparisons (CQACs). In this section we review the problem of checking containment between two conjunctive queries with arithmetic comparisons (CQACs). Kolaitis [Kol05] observed that tgds can be seen as containment between two conjunctive queries. In that sense the intuition behind the check for containment between two CQACs appears in many proofs in the present paper. A conjunctive query with arithmetic comparisons (CQAC) is of the form h(x) : e 1 (X 1 ),..., e k (X k ), C 1,..., C m where each C i is an arithmetic comparison. In general, testing containment between two CQACs is more complex than checking containment between two conjunctive queries. The main reason is that when checking containment of two conjunctive queries with

7 arithmetic comparisons (CQACs), a single containment mapping in general does not suffice, as it is the case for conjunctive queries. The following example illustrates the complexities that arise when checking containment of two CQACs. example 2.4. Consider the following two queries, which are graphically illustrated in Figure 2. Q 1 () : r(x 1, X 2 ), r(x 2, X 3 ), r(x 3, X 4 ), r(x 4, X 5 ), r(x 5, X 1 ), X 1 < X 2. Q 2 () : r(x 1, X 2 ), r(x 2, X 3 ), r(x 3, X 4 ), r(x 4, X 5 ), r(x 5, X 1 ), X 1 < X 3. ½ ½ Ö Ö ¾ Ö Ö ¾ Ö Ö Ö Ö Ö Ö É ½ ½ ¾ É ¾ ½ Figure 2: Graph representation of Q 1 and Q 2. Although the two queries have different comparisons, surprisingly, Q 1 Q 2. To show Q 1 Q 2, we consider the five mappings from the five ordinary subgoals of Q 2 to the five of Q 1. Each mapping corresponds to a rotation of the variables. Under these mappings, the ACs of Q 2 become X 1 < X 3, X 2 < X 4, X 3 < X 5, X 4 < X 1, and X 5 < X 2, respectively. According to the containment test between CQACs of Gupta and Zhang- Ozsoyoglu [Gup94, ZO93] (see below), 2 it is easy, to see that Q 1 Q 2 since if the right hand side of the following implication is false, then X 1 = X 2 : (X 1 < X 2 ) (X 1 < X 3 ) (X 2 < X 4 ) (X 3 < X 5 ) (X 4 < X 1 ) (X 5 < X 2 ). Similarly, we can prove that Q 2 Q 1. Notice that a single containment mapping does not suffice to check containment of two CQACs. Let Q 1 and Q 2 be two CQACs. There are two algorithms in the literature to test whether Q 2 Q 1 : a) the test of canonical databases and b) the test based on logical implication [GSUW94, Klu88]. We briefly present each of them for completeness (in our proofs we make explicit use of the test based on the canonical databases). a) Containment Test based on Canonical Databases. This test is a generalization of the Levy-Sagiv test [LS93], and it is due to Klug [Klu88]. The test produces an exponential number of canonical databases, each of which could be a counterexample to the containment. Suppose we want to test Q 1 Q 2. We do the following: 1. Consider all partitions of the variables of Q 1. For each partition, consider all possible total orders of the blocks of this partition, and assign to each block of the partition a unique constant so that the assigned constants satisfy the total order. Then substitute the variables in each block of the partition by the corresponding constant. Thus we obtain a number of canonical databases D 1, D 2,..., D k, one database for each different order in each partition. Each D i consists of the frozen subgoals of Q 1 excluding the subgoals having comparison predicates. 2. For all D i (together with the corresponding total order) that make the body of Q 1 true, test whether Q 2 (D i ) includes the frozen head of Q 1. The frozen head of Q 1 is obtained by making the same substitution of constants for variables that yielded D i. 3. Q 1 Q 2 if and only if (2) holds. 2 Technically the containment testing requires normalization of the two queries. We can show that in this example the argument also holds after the normalization.

8 b) Containment Test based on Logical Implication. This test was introduced by Gupta [Gup94] and Zhang-Ozsoyoglu [ZO93] in parallel work. First we need to normalize both queries Q 1 and Q 2 to Q 1, Q 2 respectively as follows: 1. Replace each occurrence of a shared variable X in the ordinary subgoals except the first occurrence, by a new distinct variable X i, and add X = X i to the comparisons of the query. 2. Replace each constant c in the query by a new distinct variable Z, and add Z = c to the comparisons of the query. The following theorem states the main claim of the second algorithm. Theorem 2.1. [GSUW94, Klu88] Let Q 1, Q 2 be CQACs and Q 1 = Q 1,0 + β 1, Q 2 = Q 2,0 + β 2 be the queries after normalization. Let µ 1,..., µ k be all the mappings (homomorphisms) from Q 1,0 to Q 2,0. Then Q 2 Q 1 if and only if the following logical implication ϕ is true: ϕ : β 2 µ 1 (β 1)... µ k (β 1). That is, the comparisons in the normalized query Q 2 logically imply (denoted ) the disjunction of the images of the comparisons of the normalized query Q 1 under these mappings. Notice that in Theorem 2.1, the or operation ( ) in the implication is critical, since there might not be a single mapping µ i from Q 1,0 to Q 2,0, such that β 2 µ i (β 1 ). Another important observation is the following. To test the containment of two queries Q 1 and Q 2, using the result in Theorem 2.1, we need to normalize them first. Introducing more comparisons to the queries in the normalization can make the implication test computationally more expensive. Thus, we may want to have a containment result that does not require the queries to be normalized. [ALM04] presented two cases, in which even if Q 1 and Q 2 are not normalized, we still have Q 2 Q 1 if and only if β 2 µ 1 (β 1 )... µ k (β 1 ), where the µ i s are all the homomorphisms from the normal subgoals in Q 1 to the normal subgoals in Q Containment test for unions of CQACs. To prove Proposition 2.1 we need to use a test for checking containment between two unions of CQACs. Figure 3 shows a procedure to test if Q 1 Q 2, where Q 1, Q 2 are unions of CQACs, which was first presented in [Klu88]. Input: Two unions of CQACs: Q 1 = Q i 1, Q 2 = Q j 2. Output: Boolean value to show if Q 1 Q 2. Method: (1) For every CQAC query Q i 1 in Q 1 do: (a) Consider all partitions of the variables of Q i 1. For each partition, consider all possible total orders of the blocks of this partition, and assign to each block of the partition a unique constant so that the assigned constants satisfy the total order. Then substitute the variables in each block of the partition by the corresponding constant. Thus we obtain a number of canonical databases D 1, D 2,..., D k, one database for each different order in each partition. Each D j consists of the frozen subgoals of Q i 1 excluding the subgoals having comparison predicates. (b) For all D j (together with the corresponding total order) that make the body of Q i 1 true, test whether Q 2 (D j ) includes the frozen head of Q i 1. The frozen head of Q i 1 is obtained by making the same substitution of constants for variables that yielded D j. (2) If step (1) holds for all Q i 1 in Q 1, then output yes, otherwise, output no. Figure 3: Procedure for testing containment of unions of CQACs Theorem 2.2. The procedure in Figure 3 can correctly test if Q 1 Q 2 [Klu88]. Proof. Suppose the test output yes. Let D be a database and let t be a tuple in the answers of query Q 1 on D, i.e., t is in Q 1 (D). Then, there is a CQAC query Q i 1 in Q 1, such that t is in Q i 1(D). Hence there is a

9 homomorphism from the ordinary subgoals in the body of Q i 1 to D, which maps the head of Q i 1 on t and satisfies the arithmetic comparisons. The image of this homomorphism is isomorphic to one of the databases D j that make the body of Q i 1 true. Thus the answers of Q 2 on D also contain the tuple t. Therefore, Q 1 Q 2. Suppose Q 1 Q 2. Then every database D j that makes the body of some Q i 1 true is such that the frozen head of Q i 1 is in the answers of Q 1. Since Q 1 Q 2, the frozen head of Q i 1 is in the answers of Q 2 too. Hence the test will report yes. 2.6 Proof of Proposition 2.1. First we need the following lemma which proves a property of q J. Lemma 2.1. Let K be an instance and q K its corresponding boolean query. Then the following hold: a) for every canonical database D of q K such that q K (D) is true, there is a t-instance induced by K such that there is a surjective and injective t-homomorphism from K to D; b) for every t-instance K i induced by K, there is a canonical database D of q K, where q K (D) is true, and there is a surjective and injective t-homomorphism from K i to D. Let J be a universal solution and q J its corresponding boolean query (a union of CQACs). Then the following hold: a) for every CQAC query qi J in q J it holds: for every canonical database D of qi J such that qi J (D) is true, there is a t-instance J j k induced by a solution J j in J, such that there is a surjective and injective t-homomorphism from J jk to D; and b) for every solution J j in J it holds: for every t-instance J jk induced by J j, there is a CQAC qi J in q J, such that there is a canonical database D of qi J where qi J (D) is true, and there is a surjective and injective t-homomorphism from J jk to D. Proof. We will prove each item separately. (a) Let D be a canonical database of q K. Hence there is a t-homomorphism h from q K to D which is surjective. Since q K is the corresponding boolean query of K, K essentially is obtained after renaming the variables of q K. Hence t-homomorphism h implies a t-homomorphism h from K to D, which is surjective. (b) Let K D be a t-instance induced by K. Hence there is a t-homomorphism h from K to K D which is surjective. Since q K is the corresponding boolean query of K, for the same reason as above, t-homomorphism h implies a t-homomorphism h from q K to K D, which is surjective. This is a consequence of the claims in the first item and the way we extend the definition of the corresponding boolean query to refer also to unions of solutions. We proceed to the proof of Proposition 2.1. Proof. First item: By the definition of universal solution, the following holds: For every t-solution J 1 induced from a solution in J, there exists a solution J 1 in J such that there exists a t-homomorphism from J 1 to J 1. By Lemma 2.1, this statement is equivalent to the containment test for unions of CQACs presented in Section 2.5. Hence q J is contained in q J. The same reasoning also applies to the other direction to prove q J q J. Second item: Suppose SOL(I) SOL(I ). For every t-solution J i induced from a solution in J, since J i is in SOL(I ), there is a t-homomorphism from a solution in J to J i, because J is a universal solution for I. Hence, according to the containment test of union of CQACs presented in Section 2.5, q J is contained in q J. For the other direction, assume q J is contained in q J. By Lemma 2.1, for every t-solution J j induced by a solution in J, there is a solution J j in J and a t-homomorphism h that maps J j to J j (denoted by (I) in Figure 4). Let J be a solution for I. We must show that J is a solution for I, which amounts to showing that I, J satisfies Σ st and J satisfies Σ t. Since J is a solution for I, we know J satisfies Σ t. So it suffices to show that I, J satisfies Σ st. Consider a tgd-ac d : ϕ(x) β ϕ ψ(x, y) β ψ in Σ st. To show I, J satisfies d, we need to show that for every t-solution Ji induced from J, I, Ji satisfies d. For every t-solution J i induced from a solution in J, I, J i satisfies d since we deal with source-to-target constraints (i.e. the schemata are disjoint), it follows that for every vector a of constants such that I satisfies

10 ϕ( a) β ϕ, there is a vector b of elements of J i such that J i satisfies ψ( a, b) β ψ (denoted by (II) in Figure 4). Since J is a universal solution for I, by definition of universal solution, for t-solution Ji induced from J there is a t-homomorphism h from a t-solution J j in J to Ji (denoted by (III) in Figure 4). Also, as we observed at the beginning of this direction, for J j there is a solution J j in J such that there is a t-homomorphism h from J j to J j (denoted by (I) in Figure 4). Note that atomic formulas are preserved under t-homomorphisms. So applying t-homomorphism h followed by h (i.e, the composition h h) to a we get again a because a contains constants only. It follows that I, Ji satisfies the tgd-ac d because J i satisfies ψ( a, h h b) β ψ (denoted by (IV) in Figure 4). This argument applies to every t-solution induced from J, hence J satisfies the tgd-ac d. II III I II IV Figure 4: Proof of Proposition Computing Universal Solutions 3.1 AC-Chase. We define a chase procedure, called AC-chase, which deals with arithmetic comparisons and computes a universal solution. The procedure shares the main idea of the chase procedure in the literature [MMS79, BV84b, BV84a]. However, AC-chase produces a tree rather than a sequence for the following reason. Since the result of an AC-chase step on a t-instance may not be a t-instance, whenever necessary, we need to replace this resulting instance with the set of its induced t-instances. definition 3.1. (AC-Chase Step) Let K be a t-instance. We have three stages: Stage I: We construct an instance K by adding relational facts and comparisons to K. We have two possible cases: a) (acgd) Let d be an acgd φ(x) β φ (x 1 θx 2 ). Let h be a t-homomorphism from φ(x) β φ to K = K 0 + β K which cannot be extended. We say that d can be applied to K with t-homomorphism h. We add h(x 1 θx 2 ) to K and produce K. b) (tgd-ac) Let d : φ(x) β φ ψ(x, y) β ψ be a tgd-ac. Let h be a t-homomorphism from φ(x) β φ to K, such that there is no extension of h to a t-homomorphism from φ(x) β φ ψ(x, y) β ψ to K. We say that d can be applied to K with t-homomorphism h. Let h 0 be the underlined homomorphism of h. Let K 0 be the union of K 0 with the set of facts obtained by: (a) extending h 0 to a homomorphism h 0, such that each variable in y is assigned a fresh labeled null, followed by (b) taking the image of the atoms on the rhs of d under h 0. We add to K 0 all the comparisons h 0(x 1 θx 2 ) for each comparison x 1 θx 2 in β ψ and obtain K. Stage II: We check K for consistency. We distinguish two cases: 1. If K is not consistent we say that the result of applying d to K with h is failure and write K d,h. 2. Otherwise, we say that the result of applying d to K with h is K, and write K d,h K. Stage III: We produce all induced t-instances K 1, K 2,..., K n of K in compact form and write also (equivalently) K d,h {K 1, K 2,..., K n}. The following example illustrates an AC-chase step. example 3.1. Consider the following tgd-ac d and a t-instance K: d: R(A) S(B), A < B; K: {R(x), T (y, 3), x < y < 3}.

11 x and y are labeled nulls. There is a t-homomorphism h from the lhs of d to K which maps A to x. There is no extension of this mapping to a t-homomorphism from the rhs of d to K. Applying d to K under h derives one fact S(b) with a comparison x < b, where b is a fresh labeled null. The new instance is K : {R(x), T (y, 3), S(b), x < y < 3, x < b}. Stage II checks that K is consistent, which it is, because there are the following values that satisfy the comparisons: x = 1, y = 2, b = 5. Hence this is a successful AC-chase step. K is not a t-instance, since it does not define a total order of b w.r.t. the labeled null b and the constant 3. Hence in stage III, we produce all induced t-instances of K. In this case, we have five induced t-instances K i where each is as follows: K i : {R(x), T (y, 3), S(b), x < y < 3, x < b, C i}, where C i is a conjunction of comparisons: C 1 : b < y, C 2 : b = y, C 3 : y < b < 3, C 4 : b = 3, C 1 : b > 3. definition 3.2. (AC-Chase) Let Σ be a set of tgd-acs and acgds, and J be a t-instance on the same database schema. An AC-chase tree of J with Σ is a tree (finite or infinite) such that: The root is J. Each node in the tree is a t-instance. For every node K in the tree, let {K 1,..., K l } be the set of its children. Then there must exist a tgd-ac or an acgd d in Σ, a t-homomorphism h from the lhs of d to K, and an instance K, such that K d,h K, and K 1,..., K l are all the t-instances induced by K. A finite AC-chase of J with Σ is a finite chase tree such that each leaf is either, or satisfies Σ. The result of a finite AC-chase is the union of all the leaf nodes that are not. If the result is not empty, we refer to it as the result of a successful AC-chase. Theorem 3.1 states that AC-chase can be used to compute a universal solution. Theorem 3.1. Let M = (S, T, Σ st, Σ t ) be a data exchange setting with arithmetic comparisons, and I be a source ground instance. 1. Let L = { I, J 1, I, J 2,..., I, J k } be the result of a successful finite AC-chase of I, with Σ = Σ st Σ t, then J = {J 1, J 2,..., J k } is a universal t-solution. 2. If the result of a finite chase of I, with Σ = Σ st Σ t is empty, then there is no solution. The proof of the Theorem 3.1 makes use of a basic property of an AC-chase step, as described in the following lemma. This property has been used and proven in [BV84b, MUV84, FKMP03] in different contexts or settings that are more restricted than ours. Lemma 3.1 essentially proves that the instance K produced in Stage II of the AC-chase Step t-homomorphically maps on a solution E as long as the instance K that created K in this step, t-homomorphically maps on E. However, in Stage III of the AC-chase Step we replace K by its induced instances. Thus, we need to prove that at least one of those induced instances t-homomorphically maps to E. This is done in Lemma 3.2. Lemma 3.1. Let d be a tgd-ac of the form, K be a t-instance, and K d,h K be an AC-chase step (where K ) with a t-homomorphism h from the lhs of d to K. Let E be a t-instance such that: (i) E satisfies d; and (ii) K t-homomorphically maps to E. Then, K t-homomorphically maps to E. Proof. Let d be: φ(x) β φ ψ(x, y) β ψ. Since the result of composing two t-homomorphisms is still a t- homomorphism, µ h is a t-homomorphism from φ(x) β φ to E, as illustrated in Figure 5. Since E satisfies d, there exists an extension of µ h, denoted by h, which is a t-homomorphism from φ(x) β φ ψ(x, y) β ψ to E. For each variable y y, denote by Λ y the labeled null replacing y in the AC-chase step. Define ν on Var(K ) to E as follows: ν(λ) = µ(λ) if Λ Var(K), and ν(λ y ) = h (y) for y y. Now we show that ν is a t-homomorphism from K to E. 1. ν maps the relational facts of K to relational facts of E. This claim is true for those relational facts of K coming from K, since µ is a t-homomorphism. Let T (x 0, y 0 ) be an arbitrary relational atom in the conjunction ψ(x, y). (Here x 0 and y 0 contain variables in x and y, respectively.) Then K contains, in addition to those relational facts of K, a relational fact T (h(x 0 ), Λ y0 ). The image under ν of this fact is, by definition of ν, the fact T (µ(h(x 0 )), h (y 0 )). Since h (x 0 ) = µ(h(x 0 )), this is the same as T (h (x 0 ), h (y 0 )). But h homomorphically maps all relational atoms in φ(x) ψ(x, y), in particular, T (x 0, y 0 ), into relational facts of E.

12 d: φ( x) β yψ( x, y) ϕ β ψ h µoh h,d K K' h': extension of µoh µ ν t-instance: E Figure 5: Proof of Lemma ν preserves the order of K. We use the symbol θ to denote a comparison operator, including <,, =,, >, or. Let α 1 θα 2 be an arbitrary comparison in the conjunction β ψ. Without loss of generality, we have to consider the following cases: α 1 x and α 2 x. Then K contains the comparison h(α 1 )θh(α 2 ). Similarly, the image under ν of this comparison is, by definition of ν, the comparison µ(h(α 1 ))θµ(h(α 2 )). This is the same as h (α 1 )θh (α 2 ). By the definition of h, this image is true in E. α 1 x and α 2 y. Then K contains the comparison h(α 1 )θh(λ α2 ). Then the image under ν of this comparison is, by definition of ν, the comparison µ(h(α 1 ))θh (α 2 ), which is still the same as h (α 1 )θh (α 2 ). Using the same reasoning above, this image is also true in E. α 1 y and α 2 y. The reasoning is similar to the above. Thus, ν is a t-homomorphism from K to E. Lemma 3.2. Let K be an instance (not necessarily t-instance) and E be a t-instance. If there is a t-homomorphism h from K to E, then there is a t-instance K i induced by K, such that h is also a t-homomorphism from K i to E. Proof. For every two values (constant or labeled null) α 1 and α 2 in K, let h(α 1 )θh(α 2 ) be the order between their images in E under the mapping h. For each comparison C in K, we can see that α 1 θα 2 is not contradictory to C, since otherwise the image of the comparison C under h will be contradictory to the comparison h(α 1 )θh(α 2 ) in E. If α 1 θα 2 is not in K, we add it to K. We repeat this addition step for every pair of values in K, and get a t-instance induced by K. Clearly h is also a t-homomorphism from this t-instance to E. We proceed to the proof of Theorem 3.1. Proof. Claim 1: By Definition 3.2, every leaf node I, J i in the finite AC-chase tree is a t-instance that satisfies Σ. By definition of t-solution, J i is also a t-solution. Now consider every ground solution E. Notice that the identity mapping is a t-homomorphism from I, to I, E. By Lemmas 3.1 and 3.2, there exists a J i in J, such that there is a t-homomorphism from I, J i to I, E. Thus, J i can t-homomorphically map to E. Therefore, J is a universal t-solution. Claim 2: If the result of a finite AC-chase of I, with Σ = Σ st Σ t is empty, then each leaf node in the corresponding AC-chase tree, which is, corresponds to a set of derived facts that contain contradictory facts. Suppose there exists a t-solution E. Following the same argument as in the proof of Claim 1, there is a leaf node L, such that there exists a t-homomorphism ν from I, F L to I, E, in which F L is the set of facts corresponding to the leaf node L. Since the image of the pair of contradictory comparisons under ν is still a pair of contradictory comparisons, we have that E contains a pair of contradictory comparisons, which is a contradiction to the fact that E is a ground instance. The following is an example that a) illustrates an AC-chase tree and b) is used in Proposition 3.1 to prove that in this simple case, there is no universal solution which contains a single instance.

13 example 3.2. Consider the following data exchange setting. Target acgd d 1 : E(X, Y, Z), X < Z X = Y Target acgd d 2 : E(X, Y, Z), X = Z X = 5 Source-to-target tgd-ac d 3 : A(Y ) E(X, Y, Z). Let I = {A(8)} be the source instance. Then, after chasing with d 3, we get E(X, 8, Z). When we chase with d 1, we get the solution {E(8, 8, Z), 8 < Z}, whereas when we chase with d 2 we get the solution {E(5, 8, 5)}. These two solutions together with {E(X, 8, Z), X > Z} comprise a universal solution. Still, there is no singleton universal solution as Proposition 3.1 proves. In Figure 6 we have the AC-chase tree for this example. The three t-solutions that comprise the universal solution are the leaf nodes in the oval. A(8) d3 E(X,8,Z) E(X,8,Z),X<Z E(X,8,Z),X=Z E(X,8,Z),X>Z d1 d2 E(8,8,Z), 8<Z E(5,8,5) Figure 6: AC-Chase Tree for Example 3.1. Proposition 3.1. There is a data exchange setting with arithmetic comparisons which uses a single tgd in the source-to-target constraints and two acgds as target constraints such that there is no universal solution which contains one instance. Proof. We use the Example 3.2. We have proved that any two universal solutions are equivalent if viewed as Boolean queries. Hence if there exists a single instance universal solution then its corresponding Boolean query q J is equivalent to the union of Boolean queries: Q 1 : E(8, 8, Z), 8 < Z, Q 2 : E(5, 8, 5) and Q 3 : E(X, 8, Z), X > Z. Let J be this single universal solution. Then J must have no E atom with a constant in the first argument because otherwise there is either no mapping on Q 1 or no mapping on Q 2 since these two have different constants in the first arguments. Thus let E(A, C, B) be an atom of J. Then (using the containment test with canonical databases), there is a canonical database D of q J in which A < B. Hence, there is no mapping from Q 3. However, there is no mapping from Q 1 or Q 2 either since constants should map to the same constants. The following is an example continued from Example 2.2, where we mentioned that J 1 is a universal solution of the setting of Example 2.1. example 3.3. To prove that J 1 is a universal solution of the setting of Example 2.1, apply an AC-chase step to the source instance I = {E(2, 5), E(4, 7)} with t-homomorphism: X 2, Y 5. This step will produce a number of t-instances (as a consequence of stage III). To each of these t-instances, apply an AC-chase step with t-homomorphism: X 4, Y Complexity. We investigate the conditions under which an AC-chase procedure terminates and we study the complexity of the problem. We show that an AC-chase tree is finite when the target dependencies is a set of

14 weakly acyclic tgd-acs and a set of acgds. The definition of weak acyclicity for tgd-acs is an extension of the definition of weak acyclicity for tgds as in [FKMP03] 3. definition 3.3. (Weakly acyclic set of tgd-acs) Let Σ be a set of tgd-acs over a database schema in compact form. We construct a directed graph (called the dependency graph) as follows: (1) There is a node for every pair (R, A) with a relation symbol R of the schema and an attribute A of R. We call such a pair (R, A) a position. (2) Add edges as follows: for every tgd-ac φ(x) β φ ψ(x, y) β ψ in Σ and for every x in x that occurs in ψ(x, y) β ψ, we call x a propagated variable. For such a propagated variable x, for every occurrence of x in φ(x) in position (R, A i ), do the following: (i) For every occurrence of x in ψ(x, y) in position (S, B j ), add an edge (R, A i ) (S, B j ) (if it does not already exist); (ii) In addition, for every existentially quantified variable y in y and for every occurrence of y in ψ(x, y) in position (T, C k ), add a special edge (R, A i ) (T, C k ) (if it does not already exist). We say Σ is weakly acyclic if the dependency graph has no cycle going through a special edge. The above definition differs from the one in [FKMP03] in that a variable from the lhs of a tgd-ac that appears in an arithmetic comparison on the rhs affects acyclicity even if this variable does not appear in any relational atom on the rhs. Example 3.4 shows such a case. Theorem 3.2. Let M = (S, T, Σ st, Σ t ) be a data exchange setting with arithmetic comparisons, such that Σ t is the union of a weakly acyclic set of tgd-acs and a set of acgds and let I be a ground source instance. Then the following hold: (1) There is a polynomial in the size of I that bounds the depth of every AC-chase beginning with I. (2) There is a nondeterministically polynomial-time algorithm such that it tests whether a t-solution for I exists. (3) AC-chase takes exponential time to compute a universal t-solution for I. Before proceeding to the proof of Theorem 3.2 we start with an example about weak acyclicity which shows why we had to modify the definition in the presence of arithmetic comparisons. example 3.4. Consider a database schema with a relation R(A) and a relation S(B). We consider first the following set of tgds: Σ = {d 1, d 2 }, where d 1 : R(X) S(Y ) and d 2 : S(Y ) R(Y ). The dependency graph is weakly acyclic so and a chase on any input terminates. Now we change d 1 slightly, i.e., we consider the following set of tgd-acs: Σ = {d 1, d 2 }, where d 1 : R(X) S(Y ), X < Y. The dependency graph of Σ is not weakly acyclic. In fact, it is easy to see that every AC-chase on the instance {R(1)} using Σ does not terminate because every time we apply tgd-ac d 1 we must add an atom with a value for its argument which is larger than the value in the atom we added in the previous application of d 1. Theorem 3.2 covers interesting cases that arise in many applications. For instance, the class of weakly acyclic tgd-acs includes the class of full tgd-acs (no existentially quantified variables). To illustrate the technical part of the proof of Theorem 3.2, we first prove part (1) of the theorem (as Theorem 3.3) and then the other parts can be easily proved from part (1), and are included again in separate statements (Theorem 3.4 and Theorem 3.5). Theorem 3.3. Let Σ be a set of weakly cyclic tgd-acs. Then, there is a polynomial in the size of a t-instance K that bounds the depth of every AC-chase tree of K with Σ. Proof. We start with introducing some notations used in the proof. Every node in the dependency graph of Σ is of the form (R, A), where R is a relation symbol of the schema and A an attribute of R. We call such a pair (R, A) a position. An incoming path for (R, A) is a path (finite or infinite) ending in (R, A). We define the rank of (R, A), denoted by rank(r, A), as the maximum number of special edges on any incoming path. rank(r, A) is finite since Σ is weakly acyclic. We use the following notations in the proof. r: maximum rank rank(r, A) over all positions (R, A); n: number of distinct values (constants or labeled nulls) in the t-instance K; D: number of dependencies in Σ. 3 The concept of weak acyclicity for the case of tgds also appears in [DT03] as constraints with stratified witness.

The Structure of Inverses in Schema Mappings

The Structure of Inverses in Schema Mappings To appear: J. ACM The Structure of Inverses in Schema Mappings Ronald Fagin and Alan Nash IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 Contact email: fagin@almaden.ibm.com Abstract A schema

More information

arxiv: v1 [cs.db] 21 Sep 2016

arxiv: v1 [cs.db] 21 Sep 2016 Ladan Golshanara 1, Jan Chomicki 1, and Wang-Chiew Tan 2 1 State University of New York at Buffalo, NY, USA ladangol@buffalo.edu, chomicki@buffalo.edu 2 Recruit Institute of Technology and UC Santa Cruz,

More information

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation Knowledge Bases and Databases Part 1: First-Order Queries Diego Calvanese Faculty of Computer Science Master of Science in Computer Science A.Y. 2007/2008 Overview of Part 1: First-order queries 1 First-order

More information

Özgür L. Özçep Data Exchange 2

Özgür L. Özçep Data Exchange 2 INSTITUT FÜR INFORMATIONSSYSTEME Özgür L. Özçep Data Exchange 2 Lecture 6: Universal Solutions, Core, Certain Answers 23 November, 2016 Foundations of Ontologies and Databases for Information Systems CS5130

More information

Online Appendix to The Recovery of a Schema Mapping: Bringing Exchanged Data Back

Online Appendix to The Recovery of a Schema Mapping: Bringing Exchanged Data Back Online Appendix to The Recovery of a Schema Mapping: Bringing Exchanged Data Back MARCELO ARENAS and JORGE PÉREZ Pontificia Universidad Católica de Chile and CRISTIAN RIVEROS R&M Tech Ingeniería y Servicios

More information

Approximate Rewriting of Queries Using Views

Approximate Rewriting of Queries Using Views Approximate Rewriting of Queries Using Views Foto Afrati 1, Manik Chandrachud 2, Rada Chirkova 2, and Prasenjit Mitra 3 1 School of Electrical and Computer Engineering National Technical University of

More information

Locally Consistent Transformations and Query Answering in Data Exchange

Locally Consistent Transformations and Query Answering in Data Exchange Locally Consistent Transformations and Query Answering in Data Exchange Marcelo Arenas University of Toronto marenas@cs.toronto.edu Pablo Barceló University of Toronto pablo@cs.toronto.edu Ronald Fagin

More information

Core Schema Mappings: Scalable Core Computations in Data Exchange

Core Schema Mappings: Scalable Core Computations in Data Exchange Core Schema Mappings: Scalable Core Computations in Data Exchange Giansalvatore Mecca 1, Paolo Papotti 2, Salvatore Raunich 3 1 Università della Basilicata Potenza, Italy 2 Università Roma Tre Roma, Italy

More information

Composing Schema Mappings: Second-Order Dependencies to the Rescue

Composing Schema Mappings: Second-Order Dependencies to the Rescue Composing Schema Mappings: Second-Order Dependencies to the Rescue RONALD FAGIN IBM Almaden Research Center PHOKION G. KOLAITIS IBM Almaden Research Center LUCIAN POPA IBM Almaden Research Center WANG-CHIEW

More information

Equivalence of SQL Queries In Presence of Embedded Dependencies

Equivalence of SQL Queries In Presence of Embedded Dependencies Equivalence of SQL Queries In Presence of Embedded Dependencies Rada Chirkova Department of Computer Science NC State University, Raleigh, NC 27695, USA chirkova@csc.ncsu.edu Michael R. Genesereth Department

More information

Özgür L. Özçep Data Exchange 1

Özgür L. Özçep Data Exchange 1 INSTITUT FÜR INFORMATIONSSYSTEME Özgür L. Özçep Data Exchange 1 Lecture 5: Motivation, Relational DE, Chase 16 November, 2016 Foundations of Ontologies and Databases for Information Systems CS5130 (Winter

More information

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas. 1 Chapter 1 Propositional Logic Mathematical logic studies correct thinking, correct deductions of statements from other statements. Let us make it more precise. A fundamental property of a statement is

More information

Parallel-Correctness and Transferability for Conjunctive Queries

Parallel-Correctness and Transferability for Conjunctive Queries Parallel-Correctness and Transferability for Conjunctive Queries Tom J. Ameloot1 Gaetano Geck2 Bas Ketsman1 Frank Neven1 Thomas Schwentick2 1 Hasselt University 2 Dortmund University Big Data Too large

More information

Chapter 3 Deterministic planning

Chapter 3 Deterministic planning Chapter 3 Deterministic planning In this chapter we describe a number of algorithms for solving the historically most important and most basic type of planning problem. Two rather strong simplifying assumptions

More information

The Chase Revisited. Alin Deutsch UC San Diego Alan Nash IBM Almaden

The Chase Revisited. Alin Deutsch UC San Diego Alan Nash IBM Almaden The Chase Revisited Alin Deutsch UC San Diego deutsch@cs.ucsd.edu Alan Nash IBM Almaden anash3@gmail.com Jeff Remmel UC San Diego remmel@math.ucsd.edu ABSTRACT We revisit the classical chase procedure,

More information

Comp487/587 - Boolean Formulas

Comp487/587 - Boolean Formulas Comp487/587 - Boolean Formulas 1 Logic and SAT 1.1 What is a Boolean Formula Logic is a way through which we can analyze and reason about simple or complicated events. In particular, we are interested

More information

Relaxed Notions of Schema Mapping Equivalence Revisited

Relaxed Notions of Schema Mapping Equivalence Revisited Relaxed Notions of Schema Mapping Equivalence Revisited Reinhard Pichler TU Vienna pichler@dbai.tuwien.ac.at Emanuel Sallinger TU Vienna sallinger@dbai.tuwien.ac.at Vadim Savenkov TU Vienna savenkov@dbai.tuwien.ac.at

More information

CWA-Solutions for Data Exchange Settings with Target Dependencies

CWA-Solutions for Data Exchange Settings with Target Dependencies CWA-Solutions for Data Exchange Settings with Target Dependencies ABSTRACT André Hernich Institut für Informatik Humboldt-Universität Berlin, Germany hernich@informatik.hu-berlin.de Data exchange deals

More information

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation Dichotomy for Non-Repeating Queries with Negation Queries with Negation in in Probabilistic Databases Robert Dan Olteanu Fink and Dan Olteanu Joint work with Robert Fink Uncertainty in Computation Simons

More information

Preliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic

Preliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic Introduction to EF-games Inexpressivity results for first-order logic Normal forms for first-order logic Algorithms and complexity for specific classes of structures General complexity bounds Preliminaries

More information

Pattern Logics and Auxiliary Relations

Pattern Logics and Auxiliary Relations Pattern Logics and Auxiliary Relations Diego Figueira Leonid Libkin University of Edinburgh Abstract A common theme in the study of logics over finite structures is adding auxiliary predicates to enhance

More information

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies Mark Levene Department of Computer Science University College London Gower Street London WC1E 6BT, U.K.

More information

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2018 5. Complexity of Query Evaluation Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 17 April, 2018 Pichler

More information

A MODEL-THEORETIC PROOF OF HILBERT S NULLSTELLENSATZ

A MODEL-THEORETIC PROOF OF HILBERT S NULLSTELLENSATZ A MODEL-THEORETIC PROOF OF HILBERT S NULLSTELLENSATZ NICOLAS FORD Abstract. The goal of this paper is to present a proof of the Nullstellensatz using tools from a branch of logic called model theory. In

More information

Single-Round Multi-Join Evaluation. Bas Ketsman

Single-Round Multi-Join Evaluation. Bas Ketsman Single-Round Multi-Join Evaluation Bas Ketsman Outline 1. Introduction 2. Parallel-Correctness 3. Transferability 4. Special Cases 2 Motivation Single-round Multi-joins Less rounds / barriers Formal framework

More information

Bounded Query Rewriting Using Views

Bounded Query Rewriting Using Views Bounded Query Rewriting Using Views Yang Cao 1,3 Wenfei Fan 1,3 Floris Geerts 2 Ping Lu 3 1 University of Edinburgh 2 University of Antwerp 3 Beihang University {wenfei@inf, Y.Cao-17@sms}.ed.ac.uk, floris.geerts@uantwerpen.be,

More information

Query answering using views

Query answering using views Query answering using views General setting: database relations R 1,...,R n. Several views V 1,...,V k are defined as results of queries over the R i s. We have a query Q over R 1,...,R n. Question: Can

More information

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes These notes form a brief summary of what has been covered during the lectures. All the definitions must be memorized and understood. Statements

More information

THE problem of updating a database through a set

THE problem of updating a database through a set FULL VERSION WITH COMPLETE PROOFS 1 Lossless Selection Views under Conditional Domain Constraints Ingo Feinerer, Enrico Franconi and Paolo Guagliardo name suggests, prescribes how a database relation can

More information

Homomorphism Preservation Theorem. Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain

Homomorphism Preservation Theorem. Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain Homomorphism Preservation Theorem Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain Structure of the talk 1. Classical preservation theorems 2. Preservation theorems in finite model

More information

GAV-sound with conjunctive queries

GAV-sound with conjunctive queries GAV-sound with conjunctive queries Source and global schema as before: source R 1 (A, B),R 2 (B,C) Global schema: T 1 (A, C), T 2 (B,C) GAV mappings become sound: T 1 {x, y, z R 1 (x,y) R 2 (y,z)} T 2

More information

On Chase Termination Beyond Stratification

On Chase Termination Beyond Stratification On Chase Termination Beyond Stratification Michael Meier Michael Schmidt Georg Lausen University of Freiburg Institute for Computer Science Georges-Köhler-Allee, Building 051 79110 Freiburg i. Br., Germany

More information

Definite Logic Programs

Definite Logic Programs Chapter 2 Definite Logic Programs 2.1 Definite Clauses The idea of logic programming is to use a computer for drawing conclusions from declarative descriptions. Such descriptions called logic programs

More information

arxiv: v2 [cs.db] 6 May 2009

arxiv: v2 [cs.db] 6 May 2009 Stop the Chase Michael Meier, Michael Schmidt and Georg Lausen arxiv:0901.3984v2 [cs.db] 6 May 2009 University of Freiburg, Institute for Computer Science Georges-Köhler-Allee, 79110 Freiburg, Germany

More information

Normalization and optimization of schema mappings

Normalization and optimization of schema mappings The VLDB Journal (2011) 20:277 302 DOI 10.1007/s00778-011-0226-x SPECIAL ISSUE PAPER Normalization and optimization of schema mappings Georg Gottlob Reinhard Pichler Vadim Savenkov Received: 23 August

More information

A Local Normal Form Theorem for Infinitary Logic with Unary Quantifiers

A Local Normal Form Theorem for Infinitary Logic with Unary Quantifiers mlq header will be provided by the publisher Local Normal Form Theorem for Infinitary Logic with Unary Quantifiers H. Jerome Keisler 1 and Wafik Boulos Lotfallah 2 1 Department of Mathematics University

More information

On the Expressive Power of Logics on Finite Models

On the Expressive Power of Logics on Finite Models On the Expressive Power of Logics on Finite Models Phokion G. Kolaitis Computer Science Department University of California, Santa Cruz Santa Cruz, CA 95064, USA kolaitis@cs.ucsc.edu August 1, 2003 Partially

More information

Part II. Logic and Set Theory. Year

Part II. Logic and Set Theory. Year Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 60 Paper 4, Section II 16G State and prove the ǫ-recursion Theorem. [You may assume the Principle of ǫ- Induction.]

More information

Automata theory. An algorithmic approach. Lecture Notes. Javier Esparza

Automata theory. An algorithmic approach. Lecture Notes. Javier Esparza Automata theory An algorithmic approach Lecture Notes Javier Esparza July 2 22 2 Chapter 9 Automata and Logic A regular expression can be seen as a set of instructions ( a recipe ) for generating the words

More information

About the relationship between formal logic and complexity classes

About the relationship between formal logic and complexity classes About the relationship between formal logic and complexity classes Working paper Comments welcome; my email: armandobcm@yahoo.com Armando B. Matos October 20, 2013 1 Introduction We analyze a particular

More information

On the Complexity of the Reflected Logic of Proofs

On the Complexity of the Reflected Logic of Proofs On the Complexity of the Reflected Logic of Proofs Nikolai V. Krupski Department of Math. Logic and the Theory of Algorithms, Faculty of Mechanics and Mathematics, Moscow State University, Moscow 119899,

More information

A New 3-CNF Transformation by Parallel-Serial Graphs 1

A New 3-CNF Transformation by Parallel-Serial Graphs 1 A New 3-CNF Transformation by Parallel-Serial Graphs 1 Uwe Bubeck, Hans Kleine Büning University of Paderborn, Computer Science Institute, 33098 Paderborn, Germany Abstract For propositional formulas we

More information

Efficient Reasoning about Data Trees via Integer Linear Programming

Efficient Reasoning about Data Trees via Integer Linear Programming Efficient Reasoning about Data Trees via Integer Linear Programming Claire David Université Paris-Est Claire.David@univ-mlv.fr Leonid Libkin University of Edinburgh libkin@inf.ed.ac.uk Tony Tan University

More information

Query Languages for Data Exchange: Beyond Unions of Conjunctive Queries

Query Languages for Data Exchange: Beyond Unions of Conjunctive Queries DOI 10.1007/s00224-010-9259-6 Query Languages for Data Exchange: Beyond Unions of Conjunctive Queries Marcelo Arenas Pablo Barceló Juan Reutter Springer Science+Business Media, LLC 2010 Abstract The class

More information

The Query Containment Problem: Set Semantics vs. Bag Semantics. Phokion G. Kolaitis University of California Santa Cruz & IBM Research - Almaden

The Query Containment Problem: Set Semantics vs. Bag Semantics. Phokion G. Kolaitis University of California Santa Cruz & IBM Research - Almaden The Query Containment Problem: Set Semantics vs. Bag Semantics Phokion G. Kolaitis University of California Santa Cruz & IBM Research - Almaden PROBLEMS Problems worthy of attack prove their worth by hitting

More information

VC-DENSITY FOR TREES

VC-DENSITY FOR TREES VC-DENSITY FOR TREES ANTON BOBKOV Abstract. We show that for the theory of infinite trees we have vc(n) = n for all n. VC density was introduced in [1] by Aschenbrenner, Dolich, Haskell, MacPherson, and

More information

Composing Schema Mappings: Second-Order Dependencies to the Rescue

Composing Schema Mappings: Second-Order Dependencies to the Rescue Composing Schema Mappings: Second-Order Dependencies to the Rescue RONALD FAGIN IBM Almaden Research Center PHOKION G. KOLAITIS 1 IBM Almaden Research Center LUCIAN POPA IBM Almaden Research Center WANG-CHIEW

More information

Datalog and Constraint Satisfaction with Infinite Templates

Datalog and Constraint Satisfaction with Infinite Templates Datalog and Constraint Satisfaction with Infinite Templates Manuel Bodirsky 1 and Víctor Dalmau 2 1 CNRS/LIX, École Polytechnique, bodirsky@lix.polytechnique.fr 2 Universitat Pompeu Fabra, victor.dalmau@upf.edu

More information

Querying Visible and Invisible Information

Querying Visible and Invisible Information Querying Visible and Invisible Information Michael Benedikt Pierre Bourhis Balder ten Cate Gabriele Puppis Abstract We provide a wide-ranging study of the scenario where a subset of the relations in the

More information

First-Order Logic. 1 Syntax. Domain of Discourse. FO Vocabulary. Terms

First-Order Logic. 1 Syntax. Domain of Discourse. FO Vocabulary. Terms First-Order Logic 1 Syntax Domain of Discourse The domain of discourse for first order logic is FO structures or models. A FO structure contains Relations Functions Constants (functions of arity 0) FO

More information

Exchanging More than Complete Data

Exchanging More than Complete Data Exchanging More than Complete Data Marcelo Arenas Jorge Pérez Juan Reutter PUC Chile, Universidad de Chile, University of Edinburgh Father Andy Bob Bob Danny Danny Eddie Mother Carrie Bob Father(x, y)

More information

Database Theory VU , SS Ehrenfeucht-Fraïssé Games. Reinhard Pichler

Database Theory VU , SS Ehrenfeucht-Fraïssé Games. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2018 7. Ehrenfeucht-Fraïssé Games Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Pichler 15

More information

CHAPTER 10. Gentzen Style Proof Systems for Classical Logic

CHAPTER 10. Gentzen Style Proof Systems for Classical Logic CHAPTER 10 Gentzen Style Proof Systems for Classical Logic Hilbert style systems are easy to define and admit a simple proof of the Completeness Theorem but they are difficult to use. By humans, not mentioning

More information

Finite Model Theory: First-Order Logic on the Class of Finite Models

Finite Model Theory: First-Order Logic on the Class of Finite Models 1 Finite Model Theory: First-Order Logic on the Class of Finite Models Anuj Dawar University of Cambridge Modnet Tutorial, La Roche, 21 April 2008 2 Finite Model Theory In the 1980s, the term finite model

More information

Quantified Boolean Formulas Part 1

Quantified Boolean Formulas Part 1 Quantified Boolean Formulas Part 1 Uwe Egly Knowledge-Based Systems Group Institute of Information Systems Vienna University of Technology Results of the SAT 2009 application benchmarks for leading solvers

More information

Tuples of Disjoint NP-Sets

Tuples of Disjoint NP-Sets Tuples of Disjoint NP-Sets (Extended Abstract) Olaf Beyersdorff Institut für Informatik, Humboldt-Universität zu Berlin, 10099 Berlin, Germany beyersdo@informatik.hu-berlin.de Abstract. Disjoint NP-pairs

More information

arxiv: v3 [cs.db] 26 Jun 2009

arxiv: v3 [cs.db] 26 Jun 2009 Equivalence of SQL Queries In Presence of Embedded Dependencies Rada Chirkova Department of Computer Science NC State University, Raleigh, NC 27695, USA chirkova@csc.ncsu.edu Michael R. Genesereth Department

More information

Equational Logic. Chapter Syntax Terms and Term Algebras

Equational Logic. Chapter Syntax Terms and Term Algebras Chapter 2 Equational Logic 2.1 Syntax 2.1.1 Terms and Term Algebras The natural logic of algebra is equational logic, whose propositions are universally quantified identities between terms built up from

More information

Computing Query Probability with Incidence Algebras Technical Report UW-CSE University of Washington

Computing Query Probability with Incidence Algebras Technical Report UW-CSE University of Washington Computing Query Probability with Incidence Algebras Technical Report UW-CSE-10-03-02 University of Washington Nilesh Dalvi, Karl Schnaitter and Dan Suciu Revised: August 24, 2010 Abstract We describe an

More information

Friendly Logics, Fall 2015, Lecture Notes 5

Friendly Logics, Fall 2015, Lecture Notes 5 Friendly Logics, Fall 2015, Lecture Notes 5 Val Tannen 1 FO definability In these lecture notes we restrict attention to relational vocabularies i.e., vocabularies consisting only of relation symbols (or

More information

On Equality-Generating Dependencies in Ontology Querying - Preliminary Report

On Equality-Generating Dependencies in Ontology Querying - Preliminary Report On Equality-Generating Dependencies in Ontology Querying - Preliminary Report Andrea Calì 1,3 and Andreas Pieris 2 1 Dept. of Computer Science and Inf. Systems, Birkbeck University of London, UK 2 Computing

More information

Decision Procedures for Satisfiability and Validity in Propositional Logic

Decision Procedures for Satisfiability and Validity in Propositional Logic Decision Procedures for Satisfiability and Validity in Propositional Logic Meghdad Ghari Institute for Research in Fundamental Sciences (IPM) School of Mathematics-Isfahan Branch Logic Group http://math.ipm.ac.ir/isfahan/logic-group.htm

More information

Encoding Deductive Argumentation in Quantified Boolean Formulae

Encoding Deductive Argumentation in Quantified Boolean Formulae Encoding Deductive Argumentation in Quantified Boolean Formulae Philippe Besnard IRIT-CNRS, Universitè Paul Sabatier, 118 rte de Narbonne, 31062 Toulouse, France Anthony Hunter Department of Computer Science,

More information

Completeness in the Monadic Predicate Calculus. We have a system of eight rules of proof. Let's list them:

Completeness in the Monadic Predicate Calculus. We have a system of eight rules of proof. Let's list them: Completeness in the Monadic Predicate Calculus We have a system of eight rules of proof. Let's list them: PI At any stage of a derivation, you may write down a sentence φ with {φ} as its premiss set. TC

More information

Definitions. Notations. Injective, Surjective and Bijective. Divides. Cartesian Product. Relations. Equivalence Relations

Definitions. Notations. Injective, Surjective and Bijective. Divides. Cartesian Product. Relations. Equivalence Relations Page 1 Definitions Tuesday, May 8, 2018 12:23 AM Notations " " means "equals, by definition" the set of all real numbers the set of integers Denote a function from a set to a set by Denote the image of

More information

Propositional and Predicate Logic - V

Propositional and Predicate Logic - V Propositional and Predicate Logic - V Petr Gregor KTIML MFF UK WS 2016/2017 Petr Gregor (KTIML MFF UK) Propositional and Predicate Logic - V WS 2016/2017 1 / 21 Formal proof systems Hilbert s calculus

More information

Propositional Logic. Testing, Quality Assurance, and Maintenance Winter Prof. Arie Gurfinkel

Propositional Logic. Testing, Quality Assurance, and Maintenance Winter Prof. Arie Gurfinkel Propositional Logic Testing, Quality Assurance, and Maintenance Winter 2018 Prof. Arie Gurfinkel References Chpater 1 of Logic for Computer Scientists http://www.springerlink.com/content/978-0-8176-4762-9/

More information

Data Exchange: Getting to the Core

Data Exchange: Getting to the Core Data Exchange: Getting to the Core RONALD FAGIN, PHOKION G. KOLAITIS, and LUCIAN POPA IBM Almaden Research Center Data exchange is the problem of taking data structured under a source schema and creating

More information

Syntactic Characterisations in Model Theory

Syntactic Characterisations in Model Theory Department of Mathematics Bachelor Thesis (7.5 ECTS) Syntactic Characterisations in Model Theory Author: Dionijs van Tuijl Supervisor: Dr. Jaap van Oosten June 15, 2016 Contents 1 Introduction 2 2 Preliminaries

More information

Lecture 2: Syntax. January 24, 2018

Lecture 2: Syntax. January 24, 2018 Lecture 2: Syntax January 24, 2018 We now review the basic definitions of first-order logic in more detail. Recall that a language consists of a collection of symbols {P i }, each of which has some specified

More information

Informal Statement Calculus

Informal Statement Calculus FOUNDATIONS OF MATHEMATICS Branches of Logic 1. Theory of Computations (i.e. Recursion Theory). 2. Proof Theory. 3. Model Theory. 4. Set Theory. Informal Statement Calculus STATEMENTS AND CONNECTIVES Example

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Chapter 2. Assertions. An Introduction to Separation Logic c 2011 John C. Reynolds February 3, 2011

Chapter 2. Assertions. An Introduction to Separation Logic c 2011 John C. Reynolds February 3, 2011 Chapter 2 An Introduction to Separation Logic c 2011 John C. Reynolds February 3, 2011 Assertions In this chapter, we give a more detailed exposition of the assertions of separation logic: their meaning,

More information

Representability in DL-Lite R Knowledge Base Exchange

Representability in DL-Lite R Knowledge Base Exchange epresentability in DL-Lite Knowledge Base Exchange M. Arenas 1, E. Botoeva 2, D. Calvanese 2, V. yzhikov 2, and E. Sherkhonov 3 1 Dept. of Computer Science, PUC Chile marenas@ing.puc.cl 2 KDB esearch Centre,

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information

Tuples of Disjoint NP-Sets

Tuples of Disjoint NP-Sets Electronic Colloquium on Computational Complexity, Report No. 123 (2005) Tuples of Disjoint NP-Sets Olaf Beyersdorff Institut für Informatik, Humboldt-Universität zu Berlin, 10099 Berlin, Germany beyersdo@informatik.hu-berlin.de

More information

Propositional Logic: Models and Proofs

Propositional Logic: Models and Proofs Propositional Logic: Models and Proofs C. R. Ramakrishnan CSE 505 1 Syntax 2 Model Theory 3 Proof Theory and Resolution Compiled at 11:51 on 2016/11/02 Computing with Logic Propositional Logic CSE 505

More information

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1 Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Lecture 4 Part 1 1 Thematic Roadmap Logic and Database Query Languages Relational Algebra and Relational Calculus Conjunctive

More information

Lecture 7: The Polynomial-Time Hierarchy. 1 Nondeterministic Space is Closed under Complement

Lecture 7: The Polynomial-Time Hierarchy. 1 Nondeterministic Space is Closed under Complement CS 710: Complexity Theory 9/29/2011 Lecture 7: The Polynomial-Time Hierarchy Instructor: Dieter van Melkebeek Scribe: Xi Wu In this lecture we first finish the discussion of space-bounded nondeterminism

More information

Some algorithmic improvements for the containment problem of conjunctive queries with negation

Some algorithmic improvements for the containment problem of conjunctive queries with negation Some algorithmic improvements for the containment problem of conjunctive queries with negation Michel Leclère and Marie-Laure Mugnier LIRMM, Université de Montpellier, 6, rue Ada, F-3439 Montpellier cedex

More information

Schema Mapping Discovery from Data Instances

Schema Mapping Discovery from Data Instances Schema Mapping Discovery from Data Instances GEORG GOTTLOB 6 University of Oxford, Oxford, United Kingdom AND PIERRE SENELLART Institut Télécom; Télécom ParisTech; CNRS LTCI, Paris, France Abstract. We

More information

Löwenheim-Skolem Theorems, Countable Approximations, and L ω. David W. Kueker (Lecture Notes, Fall 2007)

Löwenheim-Skolem Theorems, Countable Approximations, and L ω. David W. Kueker (Lecture Notes, Fall 2007) Löwenheim-Skolem Theorems, Countable Approximations, and L ω 0. Introduction David W. Kueker (Lecture Notes, Fall 2007) In its simplest form the Löwenheim-Skolem Theorem for L ω1 ω states that if σ L ω1

More information

CMPS 277 Principles of Database Systems. Lecture #9

CMPS 277 Principles of Database Systems.  Lecture #9 CMPS 277 Principles of Database Systems http://www.soe.classes.edu/cmps277/winter10 Lecture #9 1 Summary The Query Evaluation Problem for Relational Calculus is PSPACEcomplete. The Query Equivalence Problem

More information

Overview of Topics. Finite Model Theory. Finite Model Theory. Connections to Database Theory. Qing Wang

Overview of Topics. Finite Model Theory. Finite Model Theory. Connections to Database Theory. Qing Wang Overview of Topics Finite Model Theory Part 1: Introduction 1 What is finite model theory? 2 Connections to some areas in CS Qing Wang qing.wang@anu.edu.au Database theory Complexity theory 3 Basic definitions

More information

Notes for Lecture 21

Notes for Lecture 21 U.C. Berkeley CS170: Intro to CS Theory Handout N21 Professor Luca Trevisan November 20, 2001 Notes for Lecture 21 1 Tractable and Intractable Problems So far, almost all of the problems that we have studied

More information

Notes. Corneliu Popeea. May 3, 2013

Notes. Corneliu Popeea. May 3, 2013 Notes Corneliu Popeea May 3, 2013 1 Propositional logic Syntax We rely on a set of atomic propositions, AP, containing atoms like p, q. A propositional logic formula φ Formula is then defined by the following

More information

Propositional Logic. Methods & Tools for Software Engineering (MTSE) Fall Prof. Arie Gurfinkel

Propositional Logic. Methods & Tools for Software Engineering (MTSE) Fall Prof. Arie Gurfinkel Propositional Logic Methods & Tools for Software Engineering (MTSE) Fall 2017 Prof. Arie Gurfinkel References Chpater 1 of Logic for Computer Scientists http://www.springerlink.com/content/978-0-8176-4762-9/

More information

First-Order Logic First-Order Theories. Roopsha Samanta. Partly based on slides by Aaron Bradley and Isil Dillig

First-Order Logic First-Order Theories. Roopsha Samanta. Partly based on slides by Aaron Bradley and Isil Dillig First-Order Logic First-Order Theories Roopsha Samanta Partly based on slides by Aaron Bradley and Isil Dillig Roadmap Review: propositional logic Syntax and semantics of first-order logic (FOL) Semantic

More information

Cooperation of Background Reasoners in Theory Reasoning by Residue Sharing

Cooperation of Background Reasoners in Theory Reasoning by Residue Sharing Cooperation of Background Reasoners in Theory Reasoning by Residue Sharing Cesare Tinelli tinelli@cs.uiowa.edu Department of Computer Science The University of Iowa Report No. 02-03 May 2002 i Cooperation

More information

On Datalog vs. LFP. Anuj Dawar and Stephan Kreutzer

On Datalog vs. LFP. Anuj Dawar and Stephan Kreutzer On Datalog vs. LFP Anuj Dawar and Stephan Kreutzer 1 University of Cambridge Computer Lab, anuj.dawar@cl.cam.ac.uk 2 Oxford University Computing Laboratory, kreutzer@comlab.ox.ac.uk Abstract. We show that

More information

5 Set Operations, Functions, and Counting

5 Set Operations, Functions, and Counting 5 Set Operations, Functions, and Counting Let N denote the positive integers, N 0 := N {0} be the non-negative integers and Z = N 0 ( N) the positive and negative integers including 0, Q the rational numbers,

More information

The Complexity of Evaluating Tuple Generating Dependencies

The Complexity of Evaluating Tuple Generating Dependencies The Complexity of Evaluating Tuple Generating Dependencies Reinhard Pichler Vienna University of Technology pichler@dbai.tuwien.ac.at Sebastian Skritek Vienna University of Technology skritek@dbai.tuwien.ac.at

More information

Lecture 8. MINDNF = {(φ, k) φ is a CNF expression and DNF expression ψ s.t. ψ k and ψ is equivalent to φ}

Lecture 8. MINDNF = {(φ, k) φ is a CNF expression and DNF expression ψ s.t. ψ k and ψ is equivalent to φ} 6.841 Advanced Complexity Theory February 28, 2005 Lecture 8 Lecturer: Madhu Sudan Scribe: Arnab Bhattacharyya 1 A New Theme In the past few lectures, we have concentrated on non-uniform types of computation

More information

Finite and Algorithmic Model Theory II: Automata-Based Methods

Finite and Algorithmic Model Theory II: Automata-Based Methods Finite and Algorithmic Model Theory II: Automata-Based Methods Anuj Dawar University of Cambridge Computer Laboratory Simons Institute, 30 August 2016 Review We aim to develop tools for studying the expressive

More information

Lecture Notes on DISCRETE MATHEMATICS. Eusebius Doedel

Lecture Notes on DISCRETE MATHEMATICS. Eusebius Doedel Lecture Notes on DISCRETE MATHEMATICS Eusebius Doedel c Eusebius J. Doedel, 009 Contents Logic. Introduction............................................................................... Basic logical

More information

Mathematical Preliminaries. Sipser pages 1-28

Mathematical Preliminaries. Sipser pages 1-28 Mathematical Preliminaries Sipser pages 1-28 Mathematical Preliminaries This course is about the fundamental capabilities and limitations of computers. It has 3 parts 1. Automata Models of computation

More information

arxiv: v1 [cs.db] 1 Sep 2015

arxiv: v1 [cs.db] 1 Sep 2015 Decidability of Equivalence of Aggregate Count-Distinct Queries Babak Bagheri Harari Val Tannen Computer & Information Science Department University of Pennsylvania arxiv:1509.00100v1 [cs.db] 1 Sep 2015

More information

Guarded resolution for Answer Set Programming

Guarded resolution for Answer Set Programming Under consideration for publication in Theory and Practice of Logic Programming 1 Guarded resolution for Answer Set Programming V.W. Marek Department of Computer Science, University of Kentucky, Lexington,

More information

A Logical Formulation of the Granular Data Model

A Logical Formulation of the Granular Data Model 2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University

More information

On the Undecidability of the Equivalence of Second-Order Tuple Generating Dependencies

On the Undecidability of the Equivalence of Second-Order Tuple Generating Dependencies On the Undecidability of the Equivalence of Second-Order Tuple Generating Dependencies Ingo Feinerer, Reinhard Pichler, Emanuel Sallinger, and Vadim Savenkov Vienna University of Technology {feinerer pichler

More information