Armstrong Databases and Reasoning for Functional Dependencies and Cardinality Constraints over Partial Bags

Size: px
Start display at page:

Download "Armstrong Databases and Reasoning for Functional Dependencies and Cardinality Constraints over Partial Bags"

Transcription

1 Armstrong Databases and Reasoning for Functional Dependencies and Cardinality Constraints over Partial Bags Sven Hartmann 1, Henning Köhler 2, Sebastian Link 3, and Bernhard Thalheim 4 1 Institut für Informatik, Technische Universität Clausthal, Germany 2 N-Squared Software, Palmerston North, New Zealand 3 Department of Computer Science, University of Auckland, New Zealand 4 Institut für Informatik, Christian-Albrechts-University Kiel, Germany Abstract. Data dependencies capture meaningful information about an application domain within the target database. The theory of data dependencies is largely a theory over relations. To make data processing more efficient in practice, partial bags are permitted as database instances to accommodate partial and duplicate information. However, data dependencies interact differently over partial bags than over the idealized special case of relations. In this paper, we study the implication problem of the combined class of functional dependencies and cardinality constraints over partial bags. We establish an axiomatic and an algorithmic characterization of the implication problem. These findings have important applications in database design and data processing. Finally, we investigate structural and computational properties of Armstrong databases for the class of data dependencies under consideration. These results can be utilized to consolidate and communicate the understanding of the application domain between different stake-holders of a database. 1 Introduction Quality database schemata must capture both the structure and semantics of the underlying application domain. Data dependencies are classes of first-order formulae that can model semantically meaningful information in the target database. In the relational model of data, approximately 100 different classes of data dependencies have been studied [24]. Among those, functional dependencies and cardinality constraints represent two classes of data dependencies that are popular in database practice and theory. Cardinality constraints, in particular, have been studied extensively in Chen s Entity-Relationship model. In practice, however, relations represent idealized special cases in which all information is always available and no duplicate information can occur. In relational database management systems, database instances are partial bags. That is, duplicate rows can occur and columns may contain partial information in the form of null marker occurrences, unless they have been specified as NOT NULL. Inthis paper we are concerned with the implication problem of the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over T. Lukasiewicz and A. Sali (Eds.): FoIKS 2012, LNCS 7153, pp , c Springer-Verlag Berlin Heidelberg 2012

2 166 S. Hartmann et al. partial bags. The implication problem is to decide whether every partial bag that satisfies a given set of data dependencies also satisfies another given data dependency. The problem is essential in database design, and has found numerous applications in almost all data processing tasks. While different classes of data dependencies co-occur in practice, this co-occurrence is often the source for the intractability or even infeasibility of the associated implication problem. It is therefore a challenge to identify combined classes of data dependencies that can be reasoned about effectively and efficiently. Example 1. Suppose that in designing an information system for a company the team of data engineers has established the following SQL table definition: CREATE TABLE Employment ( Emp VARCHAR NOT NULL, Dept VARCHAR, Mgr VARCHAR NOT NULL); Here, employees (Emp) work within a department (Dept) under a manager (Mgr). Null marker occurrences are only permitted in the column Dept. Asinterpretation of the null marker we choose the most primitive one as no information, i.e. a total value may not exist or may exist but is currently unknown. The team of data engineers has started to think about the semantics of the application domain. So far, they have acquired the following business rules. Employees can work for at most one department, and departments have at most one manager. Moreover, every employee can be associated with at most 4 combinations of any department and any manager, every manager can be associated with at most 2 combinations of any employee and any department, and every combination of any employee and any manager must be unique. These business rules can be expressed as functional dependencies and cardinality constraints. The team of engineers would like to consult the experts of the application domain to find out whether their current perceptions about the semantics captures all the requirements necessary. In order to validate their own understanding of the application domain and to facilitate the knowledge acquisition from the domain experts the team plans to create test data, in particular. Example 1 illustrates how quality database designs require a deep understanding of the application domain s semantics. In particular, it is necessary to comprehend the interactions between different classes of data dependencies in the presence of partial and duplicate information. Such an advanced understanding can also lead to more efficient ways of data processing. For example, suppose that we want to retrieve all distinct combinations of an employee and a department from the current database instance. Since the business rules above are enforced on all instances, and since the constraint that every combination of an employee and a department is unique is implied by these business rules, it follows that the distinct clause in our query is superfluous. Query optimizers with built-in reasoning abilities for these constraints can therefore detect such opportunities effectively, and depending on the complexity of the associated implication

3 Functional Dependencies and Cardinality Constraints over Partial Bags 167 problem, even efficiently. For these reasons, an in-depth investigation of associated implication problems are both challenging and in high demand. Contributions. So far, the combined class of functional dependencies and cardinality constraints has only been considered over relations, i.e., in the idealized special case where no duplicate rows and no null marker occurrences are permitted. In this paper we make three major contributions. Firstly, we characterize axiomatically the implication problem for the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over partial bags. Secondly, we characterize the implication problem also algorithmically. Our results show how reasoning about this combined class of constraints over partial bags can be done effectively and efficiently. For our third and final contribution we investigate the concept of Armstrong databases for the combined class under discussion. We establish structural and computational properties of Armstrong tables. In particular, we characterize the structure of Armstrong tables, i.e., provide sufficient and necessary conditions that allow us to test whether a given partial bag is Armstrong with respect to a given set of constraints in this class. This characterization enables us to derive further properties. For example, we characterize for which sets of constraints in this class Armstrong tables exist. We show that the problem of computing Armstrong tables for a given set of constraints in this class is precisely exponential in the size of the given set. Nevertheless, we are able to establish an algorithm that always computes an Armstrong table for a given set of constraints whenever it exists and whose number of rows is at most quadratic in the number of rows of a minimum-sized Armstrong table and the number of given constraints. Organization. We discuss related work in Section 2. Subsequently, we introduce the data model in Section 3 which includes a definition of the syntax and semantics used in this paper. In Section 4 we characterize the implication problem axiomatically and algorithmically. The structural and computational properties of Armstrong tables are established in Section 5. Finally, we conclude in Section 6 where we also comment briefly on future work. Due to space limitations we have moved some of the proofs into the appendix. 2 Related Work Data dependencies and Armstrong databases have been studied thoroughly in the relational model of data, cf. [1,9]. Dependencies are essential to the design of the target database, the maintenance of the database during its lifetime, and all major data processing tasks [1,26]. Armstrong databases are a useful design aid for data engineers that can help with the consolidation of data dependencies [16], the design of databases [21] and the creation of concise test data [6]. Armstrong [2] established the first axiomatization for functional dependencies. In general, axiomatizations can be applied by designers and administrators to validate the specification of explicit knowledge, to design and fine-tune databases or to optimize queries. An axiomatization ensures that all opportunities of utilizing implicit knowledge have been exploited. An analysis of the completeness

4 168 S. Hartmann et al. argument can provide invaluable hints for finding algorithms that efficiently decide the implication problem. The implication problem of functional dependencies can be decided in time linear in the input [8]. For relations, the structural and computational properties of Armstrong relations for the class of functional dependencies are well-studied [4,21]. Cardinality constraints have mostly been investigated in conceptual models under a relational semantics [10,17,19,25], and recently in XML [13,22]. One of the most important extensions of the basic relational model [5] is incomplete information [15]. This is mainly due to the high demand for the correct handling of such information in real-world applications. Approaches to deal with incomplete information comprise incomplete relations, or-relations or fuzzy relations. In this paper we focus on incomplete relations. In the literature many kinds of null makers have been proposed; for example, missing or value unknown at present, non-existence, inapplicable, no information and open. Several works on functional dependencies in incomplete relations exist. Levene and Loizou studied classes of functional dependencies with a weak and strong possible world semantics [18]. Atzeni and Morfuni established an axiomatization of functional dependencies in the presence of NOT NULL constraints under the no information interpretation [3]. In this context, Hartmann and Link established an equivalence of the implication problem for this class of functional dependencies and NOT NULL constraints to that of propositional Horn clauses in Cadoli and Schaerf s family of S-3 logics [14]. Both articles consider only instances where functional dependencies subsume uniqueness constraints, but do not consider neither tables with duplicate rows nor cardinality constraints. In [11] structural and computational properties of Armstrong databases have been established for the combined class of functional dependencies and NOT NULL constraints. In the present paper, we draw from this body of research and establish fundamental results for the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over partial bags. 3 The Data Model Let H = {H 1,H 2,...} be a countably infinite set of symbols, called column headers or headers for short. A table schema is a finite non-empty subset T of H. Each header H of a table schema T is associated with a countably infinite domain dom(h) of the possible values that can occur in column H. To encompass partial information every column can have a null marker, denoted by ni dom(h). The intention of ni is to mean no information. We would like to stress that a null marker is different from a domain value. The inclusion of ni into the domain is a syntactic convenience. For header sets X and Y we may write XY for X Y.IfX = {H 1,...,H m }, then we may write H 1 H m for X. In particular, we may write simply H to represent the singleton {H}. Arow over T (T -row or simply row, if T is understood) is a function r : T H T dom(h) with r(h) dom(h) for all H T. The null marker occurrence r(h) =ni associated with a header H in

5 Functional Dependencies and Cardinality Constraints over Partial Bags 169 arowr means that there is no information about r(h). That is, r(h) maynot exist or r(h) exists but is unknown. For X T let r(x) denote the restriction of the row r over T to X. Atable t over T is a finite multi-set (bag) of rows over T. We sometimes use the phrase partial bag to indicate that these bags can contain partial information in the form of null marker occurrences. In this paper, the terms table and partial bag can be used interchangeably. For a row r over T and a set X T, r is said to be X-total if for all H X, r(h) ni. Similar, a table t over T is said to be X-total, if every row r of t is X-total. A table t over T is said to be a total table if it is T -total. Following Atzeni and Morfuni [3] a null-free subschema (NFS) over the table schema T is a an expression nfs(t s )wheret s T.TheNFST s over T is satisfied by a table t over T, denoted by = t nfs(t s ), if and only if t is T s -total. SQL allows the specification of column headers as NOT NULL. NFSs occur in everyday database practice: the set of headers declared NOT NULL forms the single NFS over the underlying table schema. Following Lien [20] a functional dependency (FD) over the table schema T is a statement X Y where X, Y T.TheFDX Y over T is satisfied by a table t over T, denoted by = t X Y, if and only if for all r 1,r 2 t the following holds: if r 1 (X) =r 2 (X) andr 1,r 2 are X-total, then r 1 (Y )=r 2 (Y ). FDs of the form Y are called non-standard, otherwise FDs are called standard. The size σ of an FD σ = X Y is defined as X + Y. We now introduce the concept of a cardinality constraint into databases with partial information. Let N denote the positive integers. A cardinality constraint (CC) over the table schema T is a statement card(x) b where X T and b N. TheCCcard(X) b over T is satisfied by a table t over T, denoted by = t card(x) b, if and only if for all r 1,r 2,...,r b+1 t the following holds: if i, j {1,...,b+1}(r i (X) =r j (X)) and i {1,...,b+1}(r i (X) isx-total), then i, j {1,...,b +1}(r i = r j ). CCs of the form card( ) b are called non-standard, otherwise CCs are called standard. CCs subsume the concept of uniqueness constraints for the special case where card(x) 1. The size σ of a CC σ = card(x) b is defined as X +logb. For a set Σ of constraints over some table schema T, we say that a table t over T satisfies Σ, denoted by = t Σ,ift satisfies every element of Σ. Ifforsome σ Σ the table t does not satisfy σ we sometimes say that t violates σ (in which case t also violates Σ) and write = t σ ( = t Σ). The size Σ of a set Σ of FDs and CCs is defined as the sum of sizes over all elements of Σ. The cardinality Σ of a finite set Σ is defined as the number of its elements. Example 2. The SQL table definition from Example 1 can be captured in our data model as follows. The table schema T = Employment consists of the column headers Emp, Dept and Mgr. TheNFSnfs(T s ) is defined by T s = {Emp, Mgr}. ThesetΣ consists of the FDs Emp Dept and Dept Mgr, and the CCs card(emp) 4, card(mgr) 2andcard(Emp, Mgr) 1. For the design, maintenance and applications of a relational database, data dependencies are identified as semantic constraints on the relations which are intended to be instances of the database schema. During the design process or

6 170 S. Hartmann et al. lifetime of a database one usually needs to determine further dependencies which are logically implied by the given ones. In line with the literature of database constraints, we restrict our attention to the implication of constraints in some fixed class C: FDs and CCs in the presence of an NFS. Let T be a table schema, let nfs(t s ) denote an NFS over T, and let Σ {ϕ} be a set of FDs and CCs over T.WesaythatΣ implies ϕ in the presence of nfs(t s ), denoted by Σ = Ts ϕ,ifeveryt s -total table t over T that satisfies Σ also satisfies ϕ. IfΣ does not imply ϕ in the presence of nfs(t s )wemayalso write Σ = Ts ϕ. The implication problem for functional dependencies and cardinality constraints in the presence of a null-free subschema is to decide, given any table schema T,anyNFSnfs(T s )overt,andanysetσ {ϕ} of FDs and CCs over T, whether Σ = Ts ϕ. For the class of FDs and CCs, the sets Σ {ϕ} over a fixed table schema T are not necessarily always finite. While for a fixed T there are only finitely many FDs, there might be infinitely many CCs by taking arbitrarily large upper bounds b N. However, for a fixed X T only the least b N that occurs is relevant. Therefore, we assume without loss of generality that they are finite. Note that for FDs and CCs (in the presence of an NFS) it does not matter whether we restrict our tables to those that are finite, i.e., the implication problem coincides with the finite implication problem where only finite tables are considered. For this reason, we will only speak of the implication problem. For an FD set Σ over a table schema T and an NFS nfs(t s )overt,letthe FD set ΣT s = {ϕ Σ = Ts ϕ} denote the semantic closure of Σ and nfs(t s ), and for a set X T let XΣ,T s = {H T Σ = Ts X H} denote the closure of X under Σ and nfs(t s ). For a set Σ of FDs and CCs over T let Σ[FD] = {X Y X Y Σ} {X T card(x) 1 Σ}. Foraset Σ {ϕ} of FDs and CCs, an NFS nfs(t s ), and a set R of inference rules let Σ R ϕ denote an inference of ϕ from Σ by R. That is, there is some sequence γ =[σ 1,...,σ n ] of FDs and CCs such that σ n = ϕ and every σ i is an element of Σ or results from an application of an inference rule in R to some elements in {σ 1,...,σ i 1 }. For a finite set Σ of FDs and CCs let Σ + R = {ϕ Σ R ϕ} denote the syntactic closure of Σ under inferences by R. R is said to be sound (complete) for the implication of FDs and CCs in the presence of an NFS if for every table schema T, for every NFS nfs(t s )overt and for every set Σ of FDs and CCs over T we have Σ + R Σ T s (ΣT s Σ + R ). The (finite) set R is said to be a (finite) axiomatization for the implication of FDs and CCs in the presence of an NFS if R is both sound and complete for the implication of FDs and CCs in the presence of an NFS. Example 3. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then the following are examples of CCs implied by Σ in the presence of nfs(t s ): card(dept) 2andcard(Emp, Dept) 1. However, neither the CC card(emp) 2northeFDEmp Mgr are implied by Σ in the presence of nfs(t s ). Indeed, the T s -total table

7 Functional Dependencies and Cardinality Constraints over Partial Bags 171 Table 1. Axiomatization S of FDs and CCs in the presence of an NFS X YZ X Y X Z XY X X Y X YZ (reflexivity) (decomposition) (union) X Y Y Z card(x) b card(x) 1 Y XT s X Z card(x) b +1 X T (null transitivity) (weakening) (demotion) X Y card(y ) b Y XT s card(x) b (null pullback) Emp Dept Mgr Sisyphus ni Trump Sisyphus ni Gates Sisyphus ni Jobs satisfies Σ, but violates card(emp) 2andEmp Mgr. 4 Characterizations of the Implication Problem The first target in our analysis is the establishment of an axiomatization for the implication of FDs and CCs in the presence of an NFS. The insights from the completeness proof will enable us to characterize the implication problem algorithmically, subsequently. 4.1 Axiomatic Characterization Let S denote the set of inference rules from Table 1. It is our goal to show that S forms a finite axiomatization. In our proof we will use the result by Atzeni and Morfuni that the set M, consisting of the reflexivity axiom, and the decomposition, union and null transitivity rule, forms a finite axiomatization for the implication of FDs [3]. Lemma 1. The weakening, demotion and null pullback rules are sound for the implication of FDs and CCs in the presence of an NFS. Note that the soundness of the reflexivity axiom and the null pullback rule also card(x) b imply the soundness of the superset rule. Indeed, the trivial FD card(xy ) b XY Y and the CC card(x) b allow us to infer the CC card(xy ) b by an application of the null pullback rule since Y XYT s.

8 172 S. Hartmann et al. Example 4. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then the following are examples of inferences from Σ and nfs(t s ) by S. An application of the null pullback rule to Dept Mgr, card(mgr) 2, and Mgr T s results in the CC card(dept) 2. That is, Dept Mgr card(mgr) 2 card(dept) 2. We now outline an inference of card(emp, Dept) 1 from Σ and nfs(t s ) by S. Applications of the reflexivity axiom result in Emp,Dept Emp and Emp,Dept Dept. An application of the null transitivity rule to Emp,Dept Dept, anddept Mgr as well as Dept {Emp,Dept,Mgr} results in the FD Emp,Dept Mgr. An application of the union rule to Emp,Dept Emp and Emp,Dept Mgr results in the FD Emp,Dept Emp,Mgr. Finally, an application of the null pullback rule to Emp,Dept Emp,Mgr, card(emp,mgr) 1, and {Emp, Mgr} {Emp,Dept,Mgr} results in card(emp,dept) 1. The tree Emp,Dept Dept Dept Mgr Emp,Dept Emp Emp,Dept Mgr Emp,Dept Emp,Mgr card(emp,mgr) 1 card(emp, Dept) 1 illustrates this inference. Before we turn to the completeness argument, we want to emphasize that any set of FDs alone can never imply any cardinality constraint. Proposition 1. Let T be a table schema, nfs(t s ) an NFS, and Σ asetoffds over T. Then for all cardinality constraints card(x) b over T we have Σ = Ts card(x) b. Proof. Let t denote the table over T that consists of b + 1 rows which have for every column header of T the same non-null value, i.e., t consists of b+1 duplicate total rows. Clearly, t satisfies Σ and nfs(t s ). Since t violates card(x) b it follows that Σ = Ts card(x) b. Corollary 1. Let T be a table schema. Then the FD X T over T does not imply the cardinality constraint card(x) 1. For the completeness of S the following lemma is central. Lemma 2. Let Σ be a set of FDs and CCs, and nfs(t s ) be an NFS over table schema T. Then the following hold: 1. Σ = Ts X Y if and only if Σ[FD] = Ts X Y,and 2. Σ = Ts card(x) b if and only if there is some card(y ) b Σ such that b b and Y XT s XΣ[FD],T s.

9 Functional Dependencies and Cardinality Constraints over Partial Bags 173 For the second part of Lemma 2 consider the special case where Σ consists of FDs only. Then no cardinality constraint can be implied by Σ in the presence of the NFS, in consistency with Proposition 1. We have now the means to verify that S is a finite axiomatization for the implication of FDs and CCs in the presence of an NFS. Note that S is indeed finite, since the rules apply to any given table schema T,anygivensetsX, Y, Z, T s T of column headers, and any given b N. In particular, the weakening rule applies to every given b N. Theorem 1. The set S is a finite axiomatization for the implication of FDs andccsinthepresenceofannfs. Proof. The soundness of S follows from Lemma 1 and the soundness of the rules in M, established in previous work [3]. Let Σ {ϕ} denote a set of FDs and CCs, and nfs(t s ) denote an NFS over table schema T. For the completeness of S we need to show that Σ = Ts ϕ implies Σ S ϕ. We distinguish between two cases. Firstly, let ϕ denote the FD X Y.FromΣ = Ts X Y we conclude that Σ[FD] = Ts X Y holds by the first part of Lemma 2. The completeness of M for the implication of FDs in the presence of an NFS shows that Σ[FD] M X Y holds. Since the demotion rule is part of S it follows that Σ S σ holds for every σ Σ[FD]. From M S we therefore conclude that Σ S X Y holds indeed. Secondly, let ϕ denote the CC card(x) b. From the second part of Lemma 2 it follows that Σ[FD] = Ts X Y,andthatcard(Y ) b Σ for some Y XT s and some b b. The first case of this completeness proof shows that Σ S X Y. An application of the null pullback rule yields Σ S card(x) b. Finally, applications of the weakening rule result in Σ S card(x) b. 4.2 Algorithmic Characterization In many situations it is not necessary to compute the set of all constraints implied by a given set. Instead, the question is whether a given fixed candidate constraint is implied by the given set of constraints. We will now investigate an algorithmic characterization of the implication problem for the combined class of functional dependencies and cardinality constraints in the presence of an NFS. Lemma 2 reduces the implication problem for the combined class of FDs and CCs in the presence of an NFS to the implication problem for the class of FDs in the presence of an NFS. Indeed, Σ = Ts X Y if and only if Y XΣ[FD],T s, and Σ = Ts card(x) b if and only if Y XΣ[FD],T s for some card(y ) b Σ such that b b and Y XT s. Therefore, the implication problem under consideration has been reduced to the computation of the closure XΣ[FD],T s of a given set X of column headers with respect to a given FD set Σ[FD]. This, however, has been done in previous work [3]. For reasons of completeness, we re-state the algorithm here.

10 174 S. Hartmann et al. Algorithm 2 (NFSClosure(X,Σ,T s,t )) Input: set X of column headers, FD set Σ, NFSnfs(T s )overtableschemat Output: closure XΣ,T s of X with respect to Σ and nfs(t s ) Method: (A0) CLOSURE := X; (A1) repeat OLDCLOSURE := CLOSURE; for all U V Σ do if U CLOSURE XT s then CLOSURE := CLOSURE V ; endif; enddo; until OLDCLOSURE = CLOSURE; (A2) return CLOSURE; Theorem 3. The implication problem Σ = Ts decided in time O( T Σ ). ϕ over table schema T can be Example 5. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. We have shown in Example 4 that Σ = Ts card(emp, Dept) 1. Alternatively, we could confirm this fact by using the second part of Lemma 2 and Algorithm 2. Indeed, it is true that card(emp, Mgr) 1 Σ and {Emp, Mgr} is a subset of the union of {Emp, Dept} and {Emp, Mgr}, aswellasasubsetofthe closure of {Emp, Dept} under Σ[FD] and nfs(t s ). In fact, {Emp, Dept} Σ[FD],T s = {Emp, Dept, Mgr}. 5 Armstrong Tables In this section we explore the concept of Armstrong databases for the combined class of FDs, CCs and NOT NULL constraints over partial bags. C-Armstrong databases are sample data that perfectly represent the set Σ of constraints from the class C currently perceived meaningful. Indeed, they satisfy Σ and violate every constraint in C not implied by Σ. As such, Armstrong databases are an effective means to consolidate and communicate the current perceptions of an application domain s semantics between various stake-holders of the database [11,21]. We will now extend recent results on Armstrong tables for the combined class of FDs and NOT NULL constraints over partial bags [11] by the class of cardinality constraints. Note that these results also extend early work on Armstrong relations for the class of FDs, pioneered by Demetrovics, Mannila, Räihä, Beeri, Dowd, Fagin and Statman [4,7,21]. 5.1 Central Concepts In a first step we will fix various notions required to establish results on the structural and computational properties of Armstrong tables. We begin with the concept most central to this section.

11 Functional Dependencies and Cardinality Constraints over Partial Bags 175 Definition 1. Let T be a table schema, nfs(t s ) an NFS, and Σ asetoffds and CCs over T.Atablet over T is said to be Armstrong for Σ and nfs(t s ) if and only if for every FD and CC ϕ over T : t satisfies ϕ if and only if Σ = Ts ϕ,and for every nfs(t s ) over T : t satisfies nfs(t s ) if and only if T s T s. Example 6. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then the following table Emp Dept Mgr Sisyphus ni Trump Sisyphus ni Gates Sisyphus ni Jobs Sisyphus ni Zuckerberg Gödel Computer Science Hilbert Church Computer Science Hilbert Newton Physics Gauss Leibniz Mathematics Gauss is an Armstrong table for Σ and nfs(t s ). For characterising the structure of Armstrong tables we need different notions of agreement between rows of a table. The different versions are motivated by the potential occurrence of null markers on the one hand, and the different classes of constraints we consider on the other hand. For functional dependencies it suffices to compare all pairs of distinct rows. Cardinality constraints, however, require us to compare any finite number of distinct rows, essentially up to the maximum bound that occurs in the given set of constraints. Definition 2. Let T be a table schema, t a table over T,andr 1,r 2 two rows over T. The agree set of r 1 and r 2 is defined as ag(r 1,r 2 ) = (X, Y ) where X = {H T r 1 (H) =r 2 (H) r 1 (H) ni}, andy = {H T r 1 (H) = r 2 (H)}. Thestrong agree set of r 1 and r 2 is defined as ag s (r 1,r 2 )=X where ag(r 1,r 2 )=(X, Y ). Theweak agree set of r 1 and r 2 is defined as ag w (r 1,r 2 )=Y where ag(r 1,r 2 )=(X, Y ). Theagree set of t is defined as ag(t) ={ag(r 1,r 2 ) r 1,r 2 t r 1 r 2 }.Thestrong agree set of t is defined as ag s (t) ={X (X, Y ) ag(t)}. Theweak agree set of t is defined as ag w (t) ={Y (X, Y ) ag(t)}. For X ag s (t) we define w(x) = {Y (X, Y ) ag(t)}. For every positive integer b>1 we define ag s b (t) ={ 1 i<j b ags (r i,r j ) r 1,...,r b t( 1 i<j b(r i r j ))}, ag s 1 (t) ={T } and ags (t) =. Example 7. Let t denote the table from Example 6 that is Armstrong for the set Σ and the NFS nfs(t s )overtableschemat from Example 2. Let r 1,r 2 denote the first two rows of t, respectively. Then ag(r 1,r 2 )=({Emp}, {Emp, Dept}), w(emp) ={Emp, Dept} and ag s (t) =ag s 2(t) ={, {Emp}, {Mgr}, {Dept, Mgr}}. Furthermore, ag s 3(t) =ag s 4(t) ={, {Emp}}.

12 176 S. Hartmann et al. An Armstrong table must violate all the cardinality constraints not implied by the given set. It suffices, however, for any non-empty set X to violate the cardinality constraint card(x) b X 1whereb X denotes the minimum positive integer for which card(x) b X is implied. Moreover, if there are two cardinality constraints card(x) b X and card(y ) b Y such that b X = b Y and Y X,then it suffices to violate card(x) b X 1. This motivates the following definitions. Definition 3. Let T be a table schema, nfs(t s ) an NFS, and Σ asetoffds and CCs over T.For X T let { min{b N Σ =Ts card(x) b}, if {b N Σ = b X = Ts card(x) b},else. The set dup Σ,Ts (T ) of duplicate sets is defined as dup Σ,Ts (T )={X T H T X(b XH <b X )}. Note that by Lemma 2 we have b X = min{b card(y ) b Σ Y XT s X + Σ[FD],T s },if{b N Σ = Ts card(x) b}. Example 8. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then we have b Emp = 4, b Dept = b Mgr = b Dept,Mgr = 2, b Emp,Dept = b Emp,Mgr = b Emp,Dept,Mgr = 1. Therefore, dup Σ,Ts (T ) = {{Emp}, {Dept, Mgr}, {Emp, Dept, Mgr}}. An Armstrong table must also violate all the functional dependencies not implied by the given set. However, it suffices for each column header H to violate all FDs X H where X is maximal with the property that X H is not implied. Furthermore, if X is maximal for some H in this sense, X dup Σ,Ts (T )andx is not maximal for any H T T s, then we can violate card(x) b X 1such that for all H T s X, X H is also violated. These arguments motivate the following definitions. Definition 4. Let Σ be a set of FDs and let nfs(t s ) be an NFS over table schema T. For a column header H T we define the maximal sets max Σ,Ts (H) of H with respect to Σ and nfs(t s ) as follows: max Σ,Ts (H) :={ X T Σ = Ts X H H T X(Σ = Ts XH H)}. The maximal sets of T with respect to Σ and nfs(t s ) are defined as max Σ,Ts (T )= H T max Σ,T s (H). IfΣ and nfs(t s ) are clear from the context we may simply write max(h) and max(t ), respectively. Finally, max red Σ,T s (T ):=max Σ,Ts (T ) {X dup Σ,Ts (T ) H T T s (X/ max Σ,Ts (H))}. Example 9. Consider the set Σ with NFS nfs(t s ) over table schema T from Example 2. Recall from Example 8 that dup Σ,Ts (T ) = {{Emp}, {Dept, Mgr}, {Emp, Dept, Mgr}}. As maximal sets we compute max Σ,Ts (Emp) = {{Dept, Mgr}}, max Σ,Ts (Dept) = {{Mgr}} and max Σ,Ts (Mgr) ={{Emp}}. Therefore, max red Σ,T s (T )={{Mgr}}.

13 Functional Dependencies and Cardinality Constraints over Partial Bags Characterization We are now in a position to establish sufficient and necessary conditions when a given table is Armstrong for a given set Σ of FDs, CCs and an NFS nfs(t s ). This generalises recent work from the special case where Σ consists of FDs only [11]. In turn, that result had generalised a well-known result by Mannila, Räihä, Beeri, Dowd, Fagin and Statman for FDs over total relations [4,21]. In the following theorem, the first (third) condition ensures that all FDs (CCs) not implied by the given set are violated; and the second (fourth) condition ensures that all implied FDs (CCs) are satisfied. The final condition handles the NFS. Theorem 4. Let T be a table schema, Σ a set of standard FDs and standard CCs, and nfs(t s ) an NFS over T. Then for all tables t over T, t is an Armstrong table for Σ and nfs(t s ) if and only if all of the following conditions hold: 1. H T X max Σ,Ts (H)(X ag s (t) H/ w(x)), 2. X ag s (t)(xσ,t s w(x)), 3. X dup Σ,Ts (T ) Z ag s b X (t)(x Z), 4. card(x) b Σ Z ag s b+1 (t)(x Z), 5. total(t) =T s. Example 10. The previous examples show that the table t in Example 6 is Armstrong for the set Σ and the NFS nfs(t s )overtableschemat from Example 2. Indeed, the conditions of Theorem 4 are all satisfied by t. 5.3 Computation For the computation of an Armstrong table, Theorem 4 suggests that the maximal and duplicate sets need to be computed. The computation of the maximal sets with respect to a set of standard FDs and an NFS nfs(t s )overtableschema T has been studied in [11]. For the computation of duplicate sets we now outline an algorithm that is exponential in the size of Σ. While we leave optimizations of this algorithm for future work, we note that there are sets of FDs and there are sets of CCs, respectively, where every Armstrong table for this set is exponential in the size of Σ, cf.theorem7. Let Σ denote a set of standard FDs and standard CCs, nfs(t s )annfs, and X a set of column headers over table schema T. We first compute b X.We start with b X := and compute XΣ[FD],T s using Algorithm 2. Then, for each card(y ) b Σ such that Y XT s XΣ[FD],T s and b<b X we redefine b X := b. Next we compute the duplicate sets dup Σ,Ts (T ). We start with dup Σ,Ts (T )= {X X T }. ThenforeachX dup Σ,Ts (T )andeachh T X such that b XH = b X we redefine dup Σ,Ts (T ):=dup Σ,Ts (T ) {X}. Therefore, the time to compute dup Σ,Ts (T )andb X for each X dup Σ,Ts (T )isino(2 T T Σ ). Algorithm 5 shows the computation of an Armstrong table. The first two steps consist of the computations of the duplicate sets X and their associated b X,and the computation of the maximal sets covered in previous work. In step (A4), the algorithm generates for each duplicate set X ablockofb X rows that satisfies

14 178 S. Hartmann et al. card(x) b X, violates card(x) b X 1, and violates every FD X H where H T s X. Note that in this case there cannot be any H T s X such that X H is implied by Σ and nfs(t s ). Otherwise, since X is a duplicate set it wouldholdthatcard(xh) b<b X is an implied cardinality constraint. Due to the soundness of the null pullback rule, card(x) b<b X would be implied, too. This is a contradiction. In step (A5), the algorithm computes for each maximal set X the set Z of column headers in T T s for which X is maximal, and produces two rows which strongly agree on X and disagree on each column header in Z; unless X is also a duplicate set and each column header for which X is maximal is in T s. Finally, step (A7) introduces null marker occurrences in every column that is not in T s, unless such a column already features a null marker occurrence. Algorithm 5 (Armstrong table computation) Input: set Σ of standard FDs and standard CCs, an NFS nfs(t s )overtable schema T such that H T b N(Σ = Ts card(h) b) Output: Armstrong table t for Σ and nfs(t s ) Method: let c H,1,c H,2,... dom(h) be distinct (A0) compute dup Σ,Ts (T ) by the procedure outlined above; (A1) compute H T (max Σ[FD],Ts (H)) by Algorithm 8 in [11]; (A2) t := ; (A3) i := 1; (A4) for all X dup Σ,Ts (T )whereb X > 1 do t := t {r i,...,r i + b X 1} where j = i,...,i+ b X 1and H T c H,i, if H X r j (H) := c H,j, if H T s X ; ni, else i := i + b X ; (A5) for all X max red Σ,T s (T ) do Z := {H T T s X max Σ,Ts (H)}; t := t {r i,r{ i+1 } where H T ch,i, if H XZT r i (H) := s ;and ni, else c H,i, if H X r i+1 (H) := c H,i+1, if H Z(T s X) ; ni, else i := i +2; (A6) total(t) :={H T r t(r(h) ni)}; if total(t) T s, then return { t := t {r i } where for all H T, ni, if H total(t) Ts r i (H) := ; c H,i, else else return t; endif; Algorithm 5 works correctly.

15 Functional Dependencies and Cardinality Constraints over Partial Bags 179 Theorem 6. For every input (T,Σ,nfs(T s )), whereσ is a set of standard FDs and standard CCs, and nfs(t s ) is an NFS over table schema T such that for all H T there is some b N such that Σ = Ts card(h) b, Algorithm 5 computes an Armstrong table for Σ and nfs(t s ). Corollary 2. Let Σ be a set of standard FDs and CCs, and nfs(t s ) an NFS over table schema T. Then there is a table over T that is Armstrong for Σ and nfs(t s ) if and only if for all H T there is some b N such that Σ = Ts card(h) b. Proof. We show first that the conditions is necessary for the existence of some Armstrong table. Assume, to the contrary, that there is some H T such that for all b N we have Σ = Ts card(h) b. Then there is some H T such that b H =. Notethatdup Σ,Ts (T ) since T dup Σ,Ts (T ), and ag s (t) =. That is, the third condition of Theorem 4 is always violated. Hence, no Armstrong table over T exists for Σ and nfs(t s ). The condition is also sufficient. Indeed, under the hypothesis that the condition holds, Algorithm 5 produces an Armstrong table for Σ and nfs(t s ), as verified by Theorem 6. Example 11. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2 as input to Algorithm 5. Then the algorithm generates the following Armstrong table Emp Dept Mgr c Emp,1 ni c Mgr,1 c Emp,1 ni c Mgr,2 c Emp,1 ni c Mgr,3 c Emp,1 ni c Mgr,4 c Emp,5 c Dept,5 c Mgr,5 c Emp,6 c Dept,5 c Mgr,5 c Emp,7 c Dept,7 c Mgr,7 c Emp,8 c Dept,8 c Mgr,7 for Σ and nfs(t s ). Note that after suitable substitutions, this is the Armstrong table given in Example Complexity Considerations Corollary 3. Let Σ be a set of standard FDs and CCs, and nfs(t s ) an NFS over table schema T. It can be decided in time O( T 2 Σ ) whether there is an Armstrong table for Σ and nfs(t s ). Proof. For each H T we need to check that there is some card(x) b Σ such that X HT s HΣ[FD],T s, by Lemma 2. This condition can be verified in time O( T Σ ). Now we will analyse how well Algorithm 5 does in terms of how well one could potentially do in general. We say that an Armstrong table t for Σ and nfs(t s )

16 180 S. Hartmann et al. is said to be minimum-sized if there is no Armstrong table t for Σ and nfs(t s ) such that t < t. First of all, the problem of computing an Armstrong table for a given set Σ of standard FDs and standard CCs and an NFS over some table schema is precisely exponential in the size of Σ. IfΣ consists of a set of standard FDs only, then this result is known, cf. [11, Proposition 2]. We recall what we mean by precisely exponential [4]. Firstly, it means that there is an algorithm for computing an Armstrong table, given a set Σ of standard FDs and standard CCs and an NFS nfs(t s ), where the running time of the algorithm is exponential in Σ. Secondly, it means that there is a set Σ of standard FDs and CCs and an NFS nfs(t s )in which the number of rows in each minimum-sized Armstrong table for Σ and nfs(t s ) is exponential in Σ thus, an exponential amount of time is required in this case simply to write down the table. Theorem 7. The problem of computing an Armstrong table for a given set Σ of standard CCs and an NFS nfs(t s ) over table schema T is precisely exponential in the size of Σ. For the remainder of this paper we show that Algorithm 5 is quite conservative in its use of time and space, despite the problem of computing Armstrong tables is computationally hard. We will show that Algorithm 5 always computes an Armstrong tables whose number of rows is at most quadratic in the number of rows of a minimum-sized Armstrong table and the cardinality of the given constraint set. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s )annfs over table schema T.Wesaythatasets X of rows over T is X-agreeing if all rows in s X strongly agree on X and s X = b X. It follows from Theorem 4 that for every Armstrong table t over T for Σ and nfs(t s ) and every duplicate set X dup Σ,Ts (T )thereisanx-agreeing set s X t. Lemma 3. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be a table over T satisfying Σ, andx, Y dup Σ,Ts (T ) with X Y and b X = b Y = b X Y.Letfurthers X,s Y be X- and Y -agreeing subsets of t, respectively. Then s X s Y =. Proof. Assume r s X s Y.Thens X,s Y both strongly agree with r on X Y, so s X s Y strongly agrees on X Y.Sinceb X = b Y = b X Y and t satisfies Σ we must have s X s Y b X = b Y, and hence s X = s Y. This in turn implies that s X = s Y strongly agrees on X Y, and using again that t satisfies Σ we get b X Y b X = b Y.FromX Y it follows that X Y is a proper superset of X and/or Y. Together with b X Y b X = b Y this contradicts X, Y dup Σ,Ts (T ). Let card(y ) b Σ and X Y Σ[FD] T s with Y XT s.inparticular, card(x) b can be derived using the null-pullback rule. We say that card(y ) b is a source of card(x) b. ForasetX T we call card(y ) b asourceofx if card(y ) b is a source of card(x) b X.

17 Functional Dependencies and Cardinality Constraints over Partial Bags 181 Corollary 4. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T. Every cardinality constraint over T of the form card(x) b X has a source in Σ. Proof. By Lemma 2 there is some card(y ) b Σ with b b X and Y XT s XΣ[FD],T s. The latter condition is equivalent to X Y Σ[FD] T s and Y XT s. From this and card(y ) b Σ we can derive card(x) b using the null-pullback rule, and by definition of b X we have b b X. This shows b = b X, so card(y ) b X is a source of X in Σ. We denote by dup Σ,Ts (card(y ) b) the set of all duplicate sets for which card(y ) b is a source dup Σ,Ts (card(y ) b) := { X dup Σ,Ts (T ) card(y ) b is a source of X }. Lemma 4. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be an Armstrong table for Σ and nfs(t s ), and card(y ) b Σ. Then t dup Σ,Ts (card(y ) b) b. Proof. (1) For each X dup Σ,Ts (card(y ) b) wehavey XT s and X Y Σ[FD] T s. This implies XY XT s and X XY Σ[FD] T s. By definition of b XY we have Σ = Ts card(xy ) b XY. Using null-pullback we can derive card(x) b XY,sob X b XY.SinceX is a duplicate set, we must have Y X. (2) Table t contains an X-agreeing set s X for every X dup Σ,Ts (T )by Theorem 4, so in particular for every X dup Σ,Ts (card(y ) b). For every pair of distinct duplicate sets X 1,X 2 dup Σ,Ts (card(y ) b) wehavey X 1 X 2 by (1), and hence b X1 = b X2 = b Y = b X1 X 2.Thus,s X1 and s X2 are disjoint by Lemma 3. This gives us dup Σ,Ts (card(y ) b) disjoint sets s X t, eachof which contains b tuples, and shows the bound on t. Corollary 5. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be an Armstrong table for Σ and nfs(t s ) over T and D := dup Σ,Ts (T ). Then X D t. Σ Proof. By Corollary 4 every X Dhas a source in Σ, so b X b X X D card(y ) b Σ b X X dup Σ,Ts (card(y ) b) By Lemma 4, t X dup Σ,Ts (card(y ) b) b X for any card(y ) b Σ, and thus t Σ b X. X D card(y ) b Σ X dup Σ,Ts (card(y ) b) b X

18 182 S. Hartmann et al. Theorem 8. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be an Armstrong table for Σ and nfs(t s ) and t c the Armstrong table for Σ and nfs(t s ) constructed in Algorithm 5. Then t c t ( t + Σ ). Proof. Denote by t A4, t A5 and t A6 the subsets of t c constructed in steps (A4), (A5) and (A6) of Algorithm 5, respectively. By Corollary 5 we have t A4 = b X t Σ. X dup Σ,Ts (T ) Steps (A5) and (A6) together construct a sub-table of that computed by Algorithm 10 in [11], thus giving us the bound (Corollary 5 in [11]) t A5 t A6 t 2. Combining these results yields the theorem. Corollary 6. Algorithm 5 computes an Armstrong table for Σ and nfs(t s ) whose number of rows is at most quadratic in the number of rows of a minimumsized Armstrong table for Σ and nfs(t s ) and the cardinality of Σ. Finally, we show that, in general, there is no most concise way of representing the information inherent in a set of standard CCs and a null-free subschema. In fact, there are cases in which the size of a minimum-sized Armstrong table is exponential in the size of the constraint set, and there are other cases in which the size of an optimal cover of a constraint set is exponential in the size of a minimum-sized Armstrong table. Theorem 9. Let C denote the class of FDs and CCs. There is some table schema T,somesetΣ of CCs and some NFS nfs(t s ) over T such that Σ has size O(n), and the size of a minimum-sized C-Armstrong table for Σ and nfs(t s ) is O(2 n ). There is some table schema T,somesetΣ of CCs and some NFS T s over T such that there is a C-Armstrong table for Σ and nfs(t s ) where the number of rows is in O(n), and the optimal cover of Σ with respect to nfs(t s ) has size O(2 n ). Proof. Let T = H 1 H 2n, T s = T and let Σ consist of the following standard CCs: for all i =1,...,n, card(h 2i 1 H 2i ) 1, and for all i =1,...,2n, card(h i ) 2. Then dup Σ,Ts (T ) contains the 2 n sets X T where for each i =1,...,n either H 2i 1 X or H 2i X. According to Theorem 4 every Armstrong table for Σ and nfs(t s ) contains a number of rows that is exponential in Σ. A similar construction was used in [4] to show that the size of a minimum-sized Armstrong relation can be exponential in the size of a given FD set. Let T = H 1 H 1 H nh n, T s = T, and let Σ consist of the following standard CCs: for all i =1,...,n, card(h i ) 3andcard(H i ) 3, and for all X = X 1 X n where X i {H i,h i }, card(x) 2. Then Σ is its own optimal cover,

19 Functional Dependencies and Cardinality Constraints over Partial Bags 183 i.e. there is no equivalent set Σ of standard FDs and standard CCs such that Σ < Σ. Thesize Σ is in O(2 n ). Furthermore, dup Σ,Ts (T ) consists of the n sets T H i H i for i =1,...,n,andthesetT,andmax Σ,T s (T ) consists of the 2n sets T H i and T H i for i =1,...,n. Thus, Algorithm 5 computes an Armstrong table for Σ and nfs(t s ) whose number of rows is in O(n). For these reasons we recommend the use of both representations. Indeed, the representation in form of constraint sets enables design teams to identify constraints they currently incorrectly perceive as semantically meaningful; and the representation in form of an Armstrong table enables design teams to identify constraints they currently incorrectly perceive as semantically meaningless. 6 Conclusion and Future Work We have investigated the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over partial bags. This framework applies to the structure of SQL tables. We have characterized the associated implication problem of this class axiomatically and algorithmically. Our results show how reasoning about this expressive class of constraints can be done effectively and efficiently. Moreover, we have established several structural and computational properties of Armstrong tables for this class of constraints. Our results show how Armstrong tables can be used effectively to consolidate the semantics of an application domain expressed by the class of constraints studied. In future work we would like to implement our algorithms within a design tool. Such a tool may also be used to conduct empirical studies on the usefulness of Armstrong tables for the acquisition of semantically meaningful constraints in our class studied, very much along the lines of [16]. It seems desirable to extend our results to even more expressive classes of constraints, e.g. classes of multivalued and inclusion dependencies. Another challenging problem would be an extension to classes of cardinality constraints that also enforce lower bounds. Properties of Armstrong databases should also be studied in probabilistic and graph databases, and the concept of informative Armstrong databases should be investigated in non-relational models [6]. It would also be interesting to analyse interactions of cardinality constraints and functional dependencies under different interpretations of the null marker [12,18,23]. Acknowledgement. This research is supported by the Marsden fund council from Government funding, administered by the Royal Society of New Zealand. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Armstrong, W.W.: Dependency structures of database relationships. Information Processing 74, (1974)

20 184 S. Hartmann et al. 3. Atzeni, P., Morfuni, N.: Functional dependencies and constraints on null values in database relations. Information and Control 70(1), 1 31 (1986) 4. Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), (1984) 5. Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), (1970) 6. De Marchi, F., Petit, J.-M.: Semantic sampling of existing databases through informative Armstrong databases. Inf. Syst. 32(3), (2007) 7. Demetrovics, J.: On the equivalence of candidate keys with Sperner systems. Acta Cybern. 4, (1980) 8. Diederich, J., Milton, J.: New methods and fast algorithms for database normalization. ACM Trans. Database Syst. 13(3), (1988) 9. Fagin, R.: Armstrong databases. Technical Report RJ3440(40926), IBM Research Laboratory, San Jose, California, USA (1982) 10. Hartmann, S.: On the implication problem for cardinality constraints and functional dependencies. Ann. Math. Art. Intell. 33, (2001) 11. Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. The VLDB Journal (2011), doi: / s Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. The Computer Journal 54(7), (2011) 13. Hartmann, S., Link, S.: Numerical constraints on XML data. Inf. Comput. 208(5), (2010) 14. Hartmann, S., Link, S.: When data dependencies over SQL tables meet the Logics of Paradox and S-3. In: PODS Conference (2010) 15. Imielinski, T., Lipski Jr., W.: Incomplete information in relational databases. J. ACM 31(4), (1984) 16. Langeveldt, W.-D., Link, S.: Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies. Inf. Syst. 35(3), (2010) 17. Lenzerini, M., Nobili, P.: On the satisfiability of dependency constraints in entityrelationship schemata. Inf. Syst. 15(4), (1990) 18. Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206(1-2), (1998) 19. Liddle, S., Embley, D., Woodfield, S.: Cardinality constraints in semantic data models. Data Knowl. Eng. 11, (1993) 20. Lien, E.: On the equivalence of database models. J. ACM 29(2), (1982) 21. Mannila, H., Räihä, K.-J.: Design by example: An application of Armstrong relations. J. Comput. Syst. Sci. 33(2), (1986) 22. Sali, A., Schewe, K.-D.: Keys and Armstrong databases in trees with restructuring. Acta Cybern. 18(3), (2008) 23. Thalheim, B.: On semantic issues connected with keys in relational databases permitting null values. Elektronische Informationsverarbeitung und Kybernetik 25(1-2), (1989) 24. Thalheim, B.: Dependencies in relational databases. Teubner (1991) 25. Thalheim, B.: Fundamentals of Cardinality Constraints. In: Pernul, G., Tjoa, A.M. (eds.) ER LNCS, vol. 645, pp Springer, Heidelberg (1992) 26. Thalheim, B.: Entity-Relationship modeling. Springer, Heidelberg (2000)

Design by Example for SQL Tables with Functional Dependencies

Design by Example for SQL Tables with Functional Dependencies VLDB Journal manuscript No. (will be inserted by the editor) Design by Example for SQL Tables with Functional Dependencies Sven Hartmann Markus Kirchberg Sebastian Link Received: date / Accepted: date

More information

On the logical Implication of Multivalued Dependencies with Null Values

On the logical Implication of Multivalued Dependencies with Null Values On the logical Implication of Multivalued Dependencies with Null Values Sebastian Link Department of Information Systems, Information Science Research Centre Massey University, Palmerston North, New Zealand

More information

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies Mark Levene Department of Computer Science University College London Gower Street London WC1E 6BT, U.K.

More information

Journal of Computer and System Sciences

Journal of Computer and System Sciences Journal of Computer and System Sciences 78 (2012) 1026 1044 Contents lists available at SciVerse ScienceDirect Journal of Computer and System Sciences www.elsevier.com/locate/jcss Characterisations of

More information

On a problem of Fagin concerning multivalued dependencies in relational databases

On a problem of Fagin concerning multivalued dependencies in relational databases Theoretical Computer Science 353 (2006) 53 62 www.elsevier.com/locate/tcs On a problem of Fagin concerning multivalued dependencies in relational databases Sven Hartmann, Sebastian Link,1 Department of

More information

A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN. Millist W. Vincent ABSTRACT

A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN. Millist W. Vincent ABSTRACT A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN Millist W. Vincent Advanced Computing Research Centre, School of Computer and Information Science, University of South Australia, Adelaide, Australia

More information

Relational Database Design

Relational Database Design Relational Database Design Jan Chomicki University at Buffalo Jan Chomicki () Relational database design 1 / 16 Outline 1 Functional dependencies 2 Normal forms 3 Multivalued dependencies Jan Chomicki

More information

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds Plan of the lecture G53RDB: Theory of Relational Databases Lecture 10 Natasha Alechina School of Computer Science & IT nza@cs.nott.ac.uk Logical implication for functional dependencies Armstrong closure.

More information

Sebastian Link University of Auckland, Auckland, New Zealand Henri Prade IRIT, CNRS and Université de Toulouse III, Toulouse, France

Sebastian Link University of Auckland, Auckland, New Zealand Henri Prade IRIT, CNRS and Université de Toulouse III, Toulouse, France CDMTCS Research Report Series Relational Database Schema Design for Uncertain Data Sebastian Link University of Auckland, Auckland, New Zealand Henri Prade IRIT, CNRS and Université de Toulouse III, Toulouse,

More information

Conceptual Treatment of Multivalued Dependencies

Conceptual Treatment of Multivalued Dependencies Conceptual Treatment of Multivalued Dependencies Bernhard Thalheim Computer Science Institute, Brandenburg University of Technology at Cottbus, PostBox 101344, D-03013 Cottbus thalheim@informatik.tu-cottbus.de

More information

From Constructibility and Absoluteness to Computability and Domain Independence

From Constructibility and Absoluteness to Computability and Domain Independence From Constructibility and Absoluteness to Computability and Domain Independence Arnon Avron School of Computer Science Tel Aviv University, Tel Aviv 69978, Israel aa@math.tau.ac.il Abstract. Gödel s main

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

Constraints: Functional Dependencies

Constraints: Functional Dependencies Constraints: Functional Dependencies Fall 2017 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Functional Dependencies 1 / 42 Schema Design When we get a relational

More information

arxiv: v1 [cs.db] 17 Apr 2014

arxiv: v1 [cs.db] 17 Apr 2014 On Independence Atoms and Keys Miika Hannula 1, Juha Kontinen 1, and Sebastian Link 2 arxiv:14044468v1 [csdb] 17 Apr 2014 1 University of Helsinki, Department of Mathematics and Statistics, Helsinki, Finland

More information

Armstrong Relations for Ontology Design and Evaluation

Armstrong Relations for Ontology Design and Evaluation Armstrong Relations for Ontology Design and Evaluation Henriette Harmse 1, Katarina Britz 2, and Aurona Gerber 1 1 CSIR Meraka Institute and Department of Informatics, University of Pretoria, South Africa

More information

On Inferences of Weak Multivalued Dependencies

On Inferences of Weak Multivalued Dependencies Fundamenta Informaticae 92 (2009) 83 102 83 DOI 10.3233/FI-2009-0067 IOS Press On Inferences of Weak Multivalued Dependencies Sven Hartmann Department of Informatics, Clausthal University of Technology

More information

On Multivalued Dependencies in Fixed and Undetermined Universes

On Multivalued Dependencies in Fixed and Undetermined Universes On Multivalued Dependencies in Fixed and Undetermined Universes Sebastian Link Information Science Research Centre, Dept. of Information Systems, Massey University, Palmerston North, New Zealand s.link@massey.ac.nz

More information

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. Chapter 19 Schema Refinement and Normal Forms Chapter 19 1 Review: Database Design Requirements Analysis user needs; what must the database do? Conceptual Design high level descr. (often done w/er model) Logical

More information

Schema Refinement: Other Dependencies and Higher Normal Forms

Schema Refinement: Other Dependencies and Higher Normal Forms Schema Refinement: Other Dependencies and Higher Normal Forms Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Higher Normal Forms 1 / 14 Outline 1

More information

arxiv: v1 [cs.db] 21 Sep 2016

arxiv: v1 [cs.db] 21 Sep 2016 Ladan Golshanara 1, Jan Chomicki 1, and Wang-Chiew Tan 2 1 State University of New York at Buffalo, NY, USA ladangol@buffalo.edu, chomicki@buffalo.edu 2 Recruit Institute of Technology and UC Santa Cruz,

More information

Schema Refinement & Normalization Theory

Schema Refinement & Normalization Theory Schema Refinement & Normalization Theory Functional Dependencies Week 13 1 What s the Problem Consider relation obtained (call it SNLRHW) Hourly_Emps(ssn, name, lot, rating, hrly_wage, hrs_worked) What

More information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information Relational Database Design Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design

More information

Functional Dependencies and Normalization

Functional Dependencies and Normalization Functional Dependencies and Normalization There are many forms of constraints on relational database schemata other than key dependencies. Undoubtedly most important is the functional dependency. A functional

More information

5 Set Operations, Functions, and Counting

5 Set Operations, Functions, and Counting 5 Set Operations, Functions, and Counting Let N denote the positive integers, N 0 := N {0} be the non-negative integers and Z = N 0 ( N) the positive and negative integers including 0, Q the rational numbers,

More information

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013 Functional Dependencies Functional Dependency Functional dependency describes the relationship between attributes in a relation. Eg. if A and B are attributes of relation R, B is functionally dependent

More information

RQL: a Query Language for Implications

RQL: a Query Language for Implications RQL: a Query Language for Implications Jean-Marc Petit (joint work with B. Chardin, E. Coquery and M. Pailloux) INSA Lyon CNRS and Université de Lyon Dagstuhl Seminar 12-16 May 2014 Horn formulas, directed

More information

Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform. and Comput. 122 (1995) ]

Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform. and Comput. 122 (1995) ] Information and Computation 204 (2006) 1847 1851 www.elsevier.com/locate/ic Corrigendum Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform.

More information

Trichotomy Results on the Complexity of Reasoning with Disjunctive Logic Programs

Trichotomy Results on the Complexity of Reasoning with Disjunctive Logic Programs Trichotomy Results on the Complexity of Reasoning with Disjunctive Logic Programs Mirosław Truszczyński Department of Computer Science, University of Kentucky, Lexington, KY 40506, USA Abstract. We present

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

Global Database Design based on Storage Space and Update Time Minimization

Global Database Design based on Storage Space and Update Time Minimization Journal of Universal Computer Science, vol. 15, no. 1 (2009), 195-240 submitted: 11/1/08, accepted: 15/8/08, appeared: 1/1/09 J.UCS Global Database Design based on Storage Space and Update Time Minimization

More information

On the Intractability of Computing the Duquenne-Guigues Base

On the Intractability of Computing the Duquenne-Guigues Base Journal of Universal Computer Science, vol 10, no 8 (2004), 927-933 submitted: 22/3/04, accepted: 28/6/04, appeared: 28/8/04 JUCS On the Intractability of Computing the Duquenne-Guigues Base Sergei O Kuznetsov

More information

2. Prime and Maximal Ideals

2. Prime and Maximal Ideals 18 Andreas Gathmann 2. Prime and Maximal Ideals There are two special kinds of ideals that are of particular importance, both algebraically and geometrically: the so-called prime and maximal ideals. Let

More information

Constraints: Functional Dependencies

Constraints: Functional Dependencies Constraints: Functional Dependencies Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Functional Dependencies 1 / 32 Schema Design When we get a relational

More information

Introduction to Metalogic

Introduction to Metalogic Philosophy 135 Spring 2008 Tony Martin Introduction to Metalogic 1 The semantics of sentential logic. The language L of sentential logic. Symbols of L: Remarks: (i) sentence letters p 0, p 1, p 2,... (ii)

More information

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein Reference: A First Course in Database Systems, 3 rd edition, Chapter 3 Important Notices CMPS 180 Final Exam

More information

Equational Logic. Chapter Syntax Terms and Term Algebras

Equational Logic. Chapter Syntax Terms and Term Algebras Chapter 2 Equational Logic 2.1 Syntax 2.1.1 Terms and Term Algebras The natural logic of algebra is equational logic, whose propositions are universally quantified identities between terms built up from

More information

KRIPKE S THEORY OF TRUTH 1. INTRODUCTION

KRIPKE S THEORY OF TRUTH 1. INTRODUCTION KRIPKE S THEORY OF TRUTH RICHARD G HECK, JR 1. INTRODUCTION The purpose of this note is to give a simple, easily accessible proof of the existence of the minimal fixed point, and of various maximal fixed

More information

Enhancing the Updatability of Projective Views

Enhancing the Updatability of Projective Views Enhancing the Updatability of Projective Views (Extended Abstract) Paolo Guagliardo 1, Reinhard Pichler 2, and Emanuel Sallinger 2 1 KRDB Research Centre, Free University of Bozen-Bolzano 2 Vienna University

More information

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1 Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1 Background We started with schema design ER model translation into a relational schema Then we studied relational

More information

Handbook of Logic and Proof Techniques for Computer Science

Handbook of Logic and Proof Techniques for Computer Science Steven G. Krantz Handbook of Logic and Proof Techniques for Computer Science With 16 Figures BIRKHAUSER SPRINGER BOSTON * NEW YORK Preface xvii 1 Notation and First-Order Logic 1 1.1 The Use of Connectives

More information

Axiomatizing Conditional Independence and Inclusion Dependencies

Axiomatizing Conditional Independence and Inclusion Dependencies Axiomatizing Conditional Independence and Inclusion Dependencies Miika Hannula University of Helsinki 6.3.2014 Miika Hannula (University of Helsinki) Axiomatizing Conditional Independence and Inclusion

More information

Axiomatic set theory. Chapter Why axiomatic set theory?

Axiomatic set theory. Chapter Why axiomatic set theory? Chapter 1 Axiomatic set theory 1.1 Why axiomatic set theory? Essentially all mathematical theories deal with sets in one way or another. In most cases, however, the use of set theory is limited to its

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational

More information

A Unit Resolution Approach to Knowledge Compilation. 2. Preliminaries

A Unit Resolution Approach to Knowledge Compilation. 2. Preliminaries A Unit Resolution Approach to Knowledge Compilation Arindama Singh and Manoj K Raut Department of Mathematics Indian Institute of Technology Chennai-600036, India Abstract : Knowledge compilation deals

More information

CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield

CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield CMPS 4420 Advanced Database Systems Dr. Chengwei Lei CEECS California State University, Bakersfield CHAPTER 15 Relational Database Design Algorithms and Further Dependencies Slide 15-2 Chapter Outline

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

Schema Refinement. Feb 4, 2010

Schema Refinement. Feb 4, 2010 Schema Refinement Feb 4, 2010 1 Relational Schema Design Conceptual Design name Product buys Person price name ssn ER Model Logical design Relational Schema plus Integrity Constraints Schema Refinement

More information

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic Mathematics 114L Spring 2018 D.A. Martin Mathematical Logic 1 First-Order Languages. Symbols. All first-order languages we consider will have the following symbols: (i) variables v 1, v 2, v 3,... ; (ii)

More information

Relational Design Theory

Relational Design Theory Relational Design Theory CSE462 Database Concepts Demian Lessa/Jan Chomicki Department of Computer Science and Engineering State University of New York, Buffalo Fall 2013 Overview How does one design a

More information

Functional Dependencies & Normalization. Dr. Bassam Hammo

Functional Dependencies & Normalization. Dr. Bassam Hammo Functional Dependencies & Normalization Dr. Bassam Hammo Redundancy and Normalisation Redundant Data Can be determined from other data in the database Leads to various problems INSERT anomalies UPDATE

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd. The Evils of Redundancy Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Redundancy is at the root of several problems associated with relational

More information

Relational-Database Design

Relational-Database Design C H A P T E R 7 Relational-Database Design Exercises 7.2 Answer: A decomposition {R 1, R 2 } is a lossless-join decomposition if R 1 R 2 R 1 or R 1 R 2 R 2. Let R 1 =(A, B, C), R 2 =(A, D, E), and R 1

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram Schema Refinement and Normal Forms Chapter 19 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational

More information

Topology Proceedings. COPYRIGHT c by Topology Proceedings. All rights reserved.

Topology Proceedings. COPYRIGHT c by Topology Proceedings. All rights reserved. Topology Proceedings Web: http://topology.auburn.edu/tp/ Mail: Topology Proceedings Department of Mathematics & Statistics Auburn University, Alabama 36849, USA E-mail: topolog@auburn.edu ISSN: 0146-4124

More information

Math 4603: Advanced Calculus I, Summer 2016 University of Minnesota Notes on Cardinality of Sets

Math 4603: Advanced Calculus I, Summer 2016 University of Minnesota Notes on Cardinality of Sets Math 4603: Advanced Calculus I, Summer 2016 University of Minnesota Notes on Cardinality of Sets Introduction In this short article, we will describe some basic notions on cardinality of sets. Given two

More information

Boolean Algebras. Chapter 2

Boolean Algebras. Chapter 2 Chapter 2 Boolean Algebras Let X be an arbitrary set and let P(X) be the class of all subsets of X (the power set of X). Three natural set-theoretic operations on P(X) are the binary operations of union

More information

TORIC WEAK FANO VARIETIES ASSOCIATED TO BUILDING SETS

TORIC WEAK FANO VARIETIES ASSOCIATED TO BUILDING SETS TORIC WEAK FANO VARIETIES ASSOCIATED TO BUILDING SETS YUSUKE SUYAMA Abstract. We give a necessary and sufficient condition for the nonsingular projective toric variety associated to a building set to be

More information

Herbrand Theorem, Equality, and Compactness

Herbrand Theorem, Equality, and Compactness CSC 438F/2404F Notes (S. Cook and T. Pitassi) Fall, 2014 Herbrand Theorem, Equality, and Compactness The Herbrand Theorem We now consider a complete method for proving the unsatisfiability of sets of first-order

More information

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago COSC 430 Advanced Database Topics Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago Learning objectives and references You should be able to: define the elements of the relational

More information

Constraint Acquisition You Can Chase but You Cannot Find

Constraint Acquisition You Can Chase but You Cannot Find Constraint Acquisition You Can Chase but You Cannot Find Sven Hartmann Sebastian Link Thu Trinh Department of Information Systems, Information Science Research Centre Massey University, Palmerston North,

More information

Data Dependencies in the Presence of Difference

Data Dependencies in the Presence of Difference Data Dependencies in the Presence of Difference Tsinghua University sxsong@tsinghua.edu.cn Outline Introduction Application Foundation Discovery Conclusion and Future Work Data Dependencies in the Presence

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd. The Evils of Redundancy Schema Refinement and Normal Forms INFO 330, Fall 2006 1 Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update

More information

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19 Schema Refinement and Normal Forms [R&G] Chapter 19 CS432 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update

More information

SCHEMA NORMALIZATION. CS 564- Fall 2015

SCHEMA NORMALIZATION. CS 564- Fall 2015 SCHEMA NORMALIZATION CS 564- Fall 2015 HOW TO BUILD A DB APPLICATION Pick an application Figure out what to model (ER model) Output: ER diagram Transform the ER diagram to a relational schema Refine the

More information

Characterization of Semantics for Argument Systems

Characterization of Semantics for Argument Systems Characterization of Semantics for Argument Systems Philippe Besnard and Sylvie Doutre IRIT Université Paul Sabatier 118, route de Narbonne 31062 Toulouse Cedex 4 France besnard, doutre}@irit.fr Abstract

More information

AN EXTENSION OF THE PROBABILITY LOGIC LP P 2. Tatjana Stojanović 1, Ana Kaplarević-Mališić 1 and Zoran Ognjanović 2

AN EXTENSION OF THE PROBABILITY LOGIC LP P 2. Tatjana Stojanović 1, Ana Kaplarević-Mališić 1 and Zoran Ognjanović 2 45 Kragujevac J. Math. 33 (2010) 45 62. AN EXTENSION OF THE PROBABILITY LOGIC LP P 2 Tatjana Stojanović 1, Ana Kaplarević-Mališić 1 and Zoran Ognjanović 2 1 University of Kragujevac, Faculty of Science,

More information

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Introduction to Data Management. Lecture #6 (Relational Design Theory) Introduction to Data Management Lecture #6 (Relational Design Theory) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Announcements v HW#2 is

More information

On Incomplete XML Documents with Integrity Constraints

On Incomplete XML Documents with Integrity Constraints On Incomplete XML Documents with Integrity Constraints Pablo Barceló 1, Leonid Libkin 2, and Juan Reutter 2 1 Department of Computer Science, University of Chile 2 School of Informatics, University of

More information

Exercises 1 - Solutions

Exercises 1 - Solutions Exercises 1 - Solutions SAV 2013 1 PL validity For each of the following propositional logic formulae determine whether it is valid or not. If it is valid prove it, otherwise give a counterexample. Note

More information

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004 Schema Refinement and Normal Forms CIS 330, Spring 2004 Lecture 11 March 2, 2004 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage,

More information

Metainduction in Operational Set Theory

Metainduction in Operational Set Theory Metainduction in Operational Set Theory Luis E. Sanchis Department of Electrical Engineering and Computer Science Syracuse University Syracuse, NY 13244-4100 Sanchis@top.cis.syr.edu http://www.cis.syr.edu/

More information

Chapter 2 Axiomatic Set Theory

Chapter 2 Axiomatic Set Theory Chapter 2 Axiomatic Set Theory Ernst Zermelo (1871 1953) was the first to find an axiomatization of set theory, and it was later expanded by Abraham Fraenkel (1891 1965). 2.1 Zermelo Fraenkel Set Theory

More information

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms Database Group http://liris.cnrs.fr/ecoquery/dokuwiki/doku.php?id=enseignement: dbdm:start March 1, 2017 Exemple

More information

1. Propositional Calculus

1. Propositional Calculus 1. Propositional Calculus Some notes for Math 601, Fall 2010 based on Elliott Mendelson, Introduction to Mathematical Logic, Fifth edition, 2010, Chapman & Hall. 2. Syntax ( grammar ). 1.1, p. 1. Given:

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

SPECIAL ATTRIBUTES FOR DATABASE NORMAL FORMS DETERMINATION

SPECIAL ATTRIBUTES FOR DATABASE NORMAL FORMS DETERMINATION STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 1, 2012 SPECIAL ATTRIBUTES FOR DATABASE NORMAL FORMS DETERMINATION VITALIE COTELEA Abstract. The article deals with the relational schemes defined

More information

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Functional Dependencies. Getting a good DB design Lisa Ball November 2012 Functional Dependencies Getting a good DB design Lisa Ball November 2012 Outline (2012) SEE NEXT SLIDE FOR ALL TOPICS (some for you to read) Normalization covered by Dr Sanchez Armstrong s Axioms other

More information

BCNF revisited: 40 Years Normal Forms

BCNF revisited: 40 Years Normal Forms Full set of slides BCNF revisited: 40 Years Normal Forms Faculty of Computer Science Technion - IIT, Haifa janos@cs.technion.ac.il www.cs.technion.ac.il/ janos 1 Full set of slides Acknowledgements Based

More information

Restricted versions of the Tukey-Teichmüller Theorem that are equivalent to the Boolean Prime Ideal Theorem

Restricted versions of the Tukey-Teichmüller Theorem that are equivalent to the Boolean Prime Ideal Theorem Restricted versions of the Tukey-Teichmüller Theorem that are equivalent to the Boolean Prime Ideal Theorem R.E. Hodel Dedicated to W.W. Comfort on the occasion of his seventieth birthday. Abstract We

More information

A New and Useful Syntactic Restriction on Rule Semantics for Tabular Data

A New and Useful Syntactic Restriction on Rule Semantics for Tabular Data A New and Useful Syntactic Restriction on Rule Semantics for Tabular Data Marie Agier 1,2, Jean-Marc Petit 3 1 DIAGNOGENE SA, 15000 Aurillac, FRANCE 2 LIMOS, UMR 6158 CNRS, Univ. Clermont-Ferrand II, FRANCE

More information

Equivalence of SQL Queries In Presence of Embedded Dependencies

Equivalence of SQL Queries In Presence of Embedded Dependencies Equivalence of SQL Queries In Presence of Embedded Dependencies Rada Chirkova Department of Computer Science NC State University, Raleigh, NC 27695, USA chirkova@csc.ncsu.edu Michael R. Genesereth Department

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

Schema Refinement and Normalization

Schema Refinement and Normalization Schema Refinement and Normalization Schema Refinements and FDs Redundancy is at the root of several problems associated with relational schemas. redundant storage, I/D/U anomalies Integrity constraints,

More information

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica. Final exam Logic & Set Theory (2IT61) (correction model)

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica. Final exam Logic & Set Theory (2IT61) (correction model) TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica Final exam Logic & Set Theory (2IT61) (correction model) Thursday November 4, 2016, 9:00 12:00 hrs. (2) 1. Determine whether the abstract

More information

Generalized hashing and applications to digital fingerprinting

Generalized hashing and applications to digital fingerprinting Generalized hashing and applications to digital fingerprinting Noga Alon, Gérard Cohen, Michael Krivelevich and Simon Litsyn Abstract Let C be a code of length n over an alphabet of q letters. An n-word

More information

Composing Schema Mappings: Second-Order Dependencies to the Rescue

Composing Schema Mappings: Second-Order Dependencies to the Rescue Composing Schema Mappings: Second-Order Dependencies to the Rescue RONALD FAGIN IBM Almaden Research Center PHOKION G. KOLAITIS IBM Almaden Research Center LUCIAN POPA IBM Almaden Research Center WANG-CHIEW

More information

CONSTRUCTION OF THE REAL NUMBERS.

CONSTRUCTION OF THE REAL NUMBERS. CONSTRUCTION OF THE REAL NUMBERS. IAN KIMING 1. Motivation. It will not come as a big surprise to anyone when I say that we need the real numbers in mathematics. More to the point, we need to be able to

More information

Products, Relations and Functions

Products, Relations and Functions Products, Relations and Functions For a variety of reasons, in this course it will be useful to modify a few of the settheoretic preliminaries in the first chapter of Munkres. The discussion below explains

More information

Computability Theoretic Properties of Injection Structures

Computability Theoretic Properties of Injection Structures Computability Theoretic Properties of Injection Structures Douglas Cenzer 1, Valentina Harizanov 2 and Jeffrey B. Remmel 3 Abstract We study computability theoretic properties of computable injection structures

More information

First-Order Theorem Proving and Vampire

First-Order Theorem Proving and Vampire First-Order Theorem Proving and Vampire Laura Kovács 1,2 and Martin Suda 2 1 TU Wien 2 Chalmers Outline Introduction First-Order Logic and TPTP Inference Systems Saturation Algorithms Redundancy Elimination

More information

Schema Refinement and Normal Forms Chapter 19

Schema Refinement and Normal Forms Chapter 19 Schema Refinement and Normal Forms Chapter 19 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh Database Management

More information

Chapter 2 Background. 2.1 A Basic Description Logic

Chapter 2 Background. 2.1 A Basic Description Logic Chapter 2 Background Abstract Description Logics is a family of knowledge representation formalisms used to represent knowledge of a domain, usually called world. For that, it first defines the relevant

More information

Chapter 2. Assertions. An Introduction to Separation Logic c 2011 John C. Reynolds February 3, 2011

Chapter 2. Assertions. An Introduction to Separation Logic c 2011 John C. Reynolds February 3, 2011 Chapter 2 An Introduction to Separation Logic c 2011 John C. Reynolds February 3, 2011 Assertions In this chapter, we give a more detailed exposition of the assertions of separation logic: their meaning,

More information

Probabilistic and Truth-Functional Many-Valued Logic Programming

Probabilistic and Truth-Functional Many-Valued Logic Programming Probabilistic and Truth-Functional Many-Valued Logic Programming Thomas Lukasiewicz Institut für Informatik, Universität Gießen Arndtstraße 2, D-35392 Gießen, Germany Abstract We introduce probabilistic

More information

Nested Epistemic Logic Programs

Nested Epistemic Logic Programs Nested Epistemic Logic Programs Kewen Wang 1 and Yan Zhang 2 1 Griffith University, Australia k.wang@griffith.edu.au 2 University of Western Sydney yan@cit.uws.edu.au Abstract. Nested logic programs and

More information

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram The Evils of Redundancy Schema Refinement and Normalization Chapter 1 Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus Redundancy is at the root of several problems

More information

Functional Dependencies

Functional Dependencies Functional Dependencies Functional Dependencies Framework for systematic design and optimization of relational schemas Generalization over the notion of Keys Crucial in obtaining correct normalized schemas

More information

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products Chapter 3 Cartesian Products and Relations The material in this chapter is the first real encounter with abstraction. Relations are very general thing they are a special type of subset. After introducing

More information

Information Flow on Directed Acyclic Graphs

Information Flow on Directed Acyclic Graphs Information Flow on Directed Acyclic Graphs Michael Donders, Sara Miner More, and Pavel Naumov Department of Mathematics and Computer Science McDaniel College, Westminster, Maryland 21157, USA {msd002,smore,pnaumov}@mcdaniel.edu

More information