Armstrong Databases and Reasoning for Functional Dependencies and Cardinality Constraints over Partial Bags

Size: px

Start display at page:

Download "Armstrong Databases and Reasoning for Functional Dependencies and Cardinality Constraints over Partial Bags"

Marjorie Cummings
6 years ago
Views:

1 Armstrong Databases and Reasoning for Functional Dependencies and Cardinality Constraints over Partial Bags Sven Hartmann 1, Henning Köhler 2, Sebastian Link 3, and Bernhard Thalheim 4 1 Institut für Informatik, Technische Universität Clausthal, Germany 2 N-Squared Software, Palmerston North, New Zealand 3 Department of Computer Science, University of Auckland, New Zealand 4 Institut für Informatik, Christian-Albrechts-University Kiel, Germany Abstract. Data dependencies capture meaningful information about an application domain within the target database. The theory of data dependencies is largely a theory over relations. To make data processing more efficient in practice, partial bags are permitted as database instances to accommodate partial and duplicate information. However, data dependencies interact differently over partial bags than over the idealized special case of relations. In this paper, we study the implication problem of the combined class of functional dependencies and cardinality constraints over partial bags. We establish an axiomatic and an algorithmic characterization of the implication problem. These findings have important applications in database design and data processing. Finally, we investigate structural and computational properties of Armstrong databases for the class of data dependencies under consideration. These results can be utilized to consolidate and communicate the understanding of the application domain between different stake-holders of a database. 1 Introduction Quality database schemata must capture both the structure and semantics of the underlying application domain. Data dependencies are classes of first-order formulae that can model semantically meaningful information in the target database. In the relational model of data, approximately 100 different classes of data dependencies have been studied [24]. Among those, functional dependencies and cardinality constraints represent two classes of data dependencies that are popular in database practice and theory. Cardinality constraints, in particular, have been studied extensively in Chen s Entity-Relationship model. In practice, however, relations represent idealized special cases in which all information is always available and no duplicate information can occur. In relational database management systems, database instances are partial bags. That is, duplicate rows can occur and columns may contain partial information in the form of null marker occurrences, unless they have been specified as NOT NULL. Inthis paper we are concerned with the implication problem of the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over T. Lukasiewicz and A. Sali (Eds.): FoIKS 2012, LNCS 7153, pp , c Springer-Verlag Berlin Heidelberg 2012

2 166 S. Hartmann et al. partial bags. The implication problem is to decide whether every partial bag that satisfies a given set of data dependencies also satisfies another given data dependency. The problem is essential in database design, and has found numerous applications in almost all data processing tasks. While different classes of data dependencies co-occur in practice, this co-occurrence is often the source for the intractability or even infeasibility of the associated implication problem. It is therefore a challenge to identify combined classes of data dependencies that can be reasoned about effectively and efficiently. Example 1. Suppose that in designing an information system for a company the team of data engineers has established the following SQL table definition: CREATE TABLE Employment ( Emp VARCHAR NOT NULL, Dept VARCHAR, Mgr VARCHAR NOT NULL); Here, employees (Emp) work within a department (Dept) under a manager (Mgr). Null marker occurrences are only permitted in the column Dept. Asinterpretation of the null marker we choose the most primitive one as no information, i.e. a total value may not exist or may exist but is currently unknown. The team of data engineers has started to think about the semantics of the application domain. So far, they have acquired the following business rules. Employees can work for at most one department, and departments have at most one manager. Moreover, every employee can be associated with at most 4 combinations of any department and any manager, every manager can be associated with at most 2 combinations of any employee and any department, and every combination of any employee and any manager must be unique. These business rules can be expressed as functional dependencies and cardinality constraints. The team of engineers would like to consult the experts of the application domain to find out whether their current perceptions about the semantics captures all the requirements necessary. In order to validate their own understanding of the application domain and to facilitate the knowledge acquisition from the domain experts the team plans to create test data, in particular. Example 1 illustrates how quality database designs require a deep understanding of the application domain s semantics. In particular, it is necessary to comprehend the interactions between different classes of data dependencies in the presence of partial and duplicate information. Such an advanced understanding can also lead to more efficient ways of data processing. For example, suppose that we want to retrieve all distinct combinations of an employee and a department from the current database instance. Since the business rules above are enforced on all instances, and since the constraint that every combination of an employee and a department is unique is implied by these business rules, it follows that the distinct clause in our query is superfluous. Query optimizers with built-in reasoning abilities for these constraints can therefore detect such opportunities effectively, and depending on the complexity of the associated implication

3 Functional Dependencies and Cardinality Constraints over Partial Bags 167 problem, even efficiently. For these reasons, an in-depth investigation of associated implication problems are both challenging and in high demand. Contributions. So far, the combined class of functional dependencies and cardinality constraints has only been considered over relations, i.e., in the idealized special case where no duplicate rows and no null marker occurrences are permitted. In this paper we make three major contributions. Firstly, we characterize axiomatically the implication problem for the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over partial bags. Secondly, we characterize the implication problem also algorithmically. Our results show how reasoning about this combined class of constraints over partial bags can be done effectively and efficiently. For our third and final contribution we investigate the concept of Armstrong databases for the combined class under discussion. We establish structural and computational properties of Armstrong tables. In particular, we characterize the structure of Armstrong tables, i.e., provide sufficient and necessary conditions that allow us to test whether a given partial bag is Armstrong with respect to a given set of constraints in this class. This characterization enables us to derive further properties. For example, we characterize for which sets of constraints in this class Armstrong tables exist. We show that the problem of computing Armstrong tables for a given set of constraints in this class is precisely exponential in the size of the given set. Nevertheless, we are able to establish an algorithm that always computes an Armstrong table for a given set of constraints whenever it exists and whose number of rows is at most quadratic in the number of rows of a minimum-sized Armstrong table and the number of given constraints. Organization. We discuss related work in Section 2. Subsequently, we introduce the data model in Section 3 which includes a definition of the syntax and semantics used in this paper. In Section 4 we characterize the implication problem axiomatically and algorithmically. The structural and computational properties of Armstrong tables are established in Section 5. Finally, we conclude in Section 6 where we also comment briefly on future work. Due to space limitations we have moved some of the proofs into the appendix. 2 Related Work Data dependencies and Armstrong databases have been studied thoroughly in the relational model of data, cf. [1,9]. Dependencies are essential to the design of the target database, the maintenance of the database during its lifetime, and all major data processing tasks [1,26]. Armstrong databases are a useful design aid for data engineers that can help with the consolidation of data dependencies [16], the design of databases [21] and the creation of concise test data [6]. Armstrong [2] established the first axiomatization for functional dependencies. In general, axiomatizations can be applied by designers and administrators to validate the specification of explicit knowledge, to design and fine-tune databases or to optimize queries. An axiomatization ensures that all opportunities of utilizing implicit knowledge have been exploited. An analysis of the completeness

4 168 S. Hartmann et al. argument can provide invaluable hints for finding algorithms that efficiently decide the implication problem. The implication problem of functional dependencies can be decided in time linear in the input [8]. For relations, the structural and computational properties of Armstrong relations for the class of functional dependencies are well-studied [4,21]. Cardinality constraints have mostly been investigated in conceptual models under a relational semantics [10,17,19,25], and recently in XML [13,22]. One of the most important extensions of the basic relational model [5] is incomplete information [15]. This is mainly due to the high demand for the correct handling of such information in real-world applications. Approaches to deal with incomplete information comprise incomplete relations, or-relations or fuzzy relations. In this paper we focus on incomplete relations. In the literature many kinds of null makers have been proposed; for example, missing or value unknown at present, non-existence, inapplicable, no information and open. Several works on functional dependencies in incomplete relations exist. Levene and Loizou studied classes of functional dependencies with a weak and strong possible world semantics [18]. Atzeni and Morfuni established an axiomatization of functional dependencies in the presence of NOT NULL constraints under the no information interpretation [3]. In this context, Hartmann and Link established an equivalence of the implication problem for this class of functional dependencies and NOT NULL constraints to that of propositional Horn clauses in Cadoli and Schaerf s family of S-3 logics [14]. Both articles consider only instances where functional dependencies subsume uniqueness constraints, but do not consider neither tables with duplicate rows nor cardinality constraints. In [11] structural and computational properties of Armstrong databases have been established for the combined class of functional dependencies and NOT NULL constraints. In the present paper, we draw from this body of research and establish fundamental results for the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over partial bags. 3 The Data Model Let H = {H 1,H 2,...} be a countably infinite set of symbols, called column headers or headers for short. A table schema is a finite non-empty subset T of H. Each header H of a table schema T is associated with a countably infinite domain dom(h) of the possible values that can occur in column H. To encompass partial information every column can have a null marker, denoted by ni dom(h). The intention of ni is to mean no information. We would like to stress that a null marker is different from a domain value. The inclusion of ni into the domain is a syntactic convenience. For header sets X and Y we may write XY for X Y.IfX = {H 1,...,H m }, then we may write H 1 H m for X. In particular, we may write simply H to represent the singleton {H}. Arow over T (T -row or simply row, if T is understood) is a function r : T H T dom(h) with r(h) dom(h) for all H T. The null marker occurrence r(h) =ni associated with a header H in

5 Functional Dependencies and Cardinality Constraints over Partial Bags 169 arowr means that there is no information about r(h). That is, r(h) maynot exist or r(h) exists but is unknown. For X T let r(x) denote the restriction of the row r over T to X. Atable t over T is a finite multi-set (bag) of rows over T. We sometimes use the phrase partial bag to indicate that these bags can contain partial information in the form of null marker occurrences. In this paper, the terms table and partial bag can be used interchangeably. For a row r over T and a set X T, r is said to be X-total if for all H X, r(h) ni. Similar, a table t over T is said to be X-total, if every row r of t is X-total. A table t over T is said to be a total table if it is T -total. Following Atzeni and Morfuni [3] a null-free subschema (NFS) over the table schema T is a an expression nfs(t s )wheret s T.TheNFST s over T is satisfied by a table t over T, denoted by = t nfs(t s ), if and only if t is T s -total. SQL allows the specification of column headers as NOT NULL. NFSs occur in everyday database practice: the set of headers declared NOT NULL forms the single NFS over the underlying table schema. Following Lien [20] a functional dependency (FD) over the table schema T is a statement X Y where X, Y T.TheFDX Y over T is satisfied by a table t over T, denoted by = t X Y, if and only if for all r 1,r 2 t the following holds: if r 1 (X) =r 2 (X) andr 1,r 2 are X-total, then r 1 (Y )=r 2 (Y ). FDs of the form Y are called non-standard, otherwise FDs are called standard. The size σ of an FD σ = X Y is defined as X + Y. We now introduce the concept of a cardinality constraint into databases with partial information. Let N denote the positive integers. A cardinality constraint (CC) over the table schema T is a statement card(x) b where X T and b N. TheCCcard(X) b over T is satisfied by a table t over T, denoted by = t card(x) b, if and only if for all r 1,r 2,...,r b+1 t the following holds: if i, j {1,...,b+1}(r i (X) =r j (X)) and i {1,...,b+1}(r i (X) isx-total), then i, j {1,...,b +1}(r i = r j ). CCs of the form card( ) b are called non-standard, otherwise CCs are called standard. CCs subsume the concept of uniqueness constraints for the special case where card(x) 1. The size σ of a CC σ = card(x) b is defined as X +logb. For a set Σ of constraints over some table schema T, we say that a table t over T satisfies Σ, denoted by = t Σ,ift satisfies every element of Σ. Ifforsome σ Σ the table t does not satisfy σ we sometimes say that t violates σ (in which case t also violates Σ) and write = t σ ( = t Σ). The size Σ of a set Σ of FDs and CCs is defined as the sum of sizes over all elements of Σ. The cardinality Σ of a finite set Σ is defined as the number of its elements. Example 2. The SQL table definition from Example 1 can be captured in our data model as follows. The table schema T = Employment consists of the column headers Emp, Dept and Mgr. TheNFSnfs(T s ) is defined by T s = {Emp, Mgr}. ThesetΣ consists of the FDs Emp Dept and Dept Mgr, and the CCs card(emp) 4, card(mgr) 2andcard(Emp, Mgr) 1. For the design, maintenance and applications of a relational database, data dependencies are identified as semantic constraints on the relations which are intended to be instances of the database schema. During the design process or

6 170 S. Hartmann et al. lifetime of a database one usually needs to determine further dependencies which are logically implied by the given ones. In line with the literature of database constraints, we restrict our attention to the implication of constraints in some fixed class C: FDs and CCs in the presence of an NFS. Let T be a table schema, let nfs(t s ) denote an NFS over T, and let Σ {ϕ} be a set of FDs and CCs over T.WesaythatΣ implies ϕ in the presence of nfs(t s ), denoted by Σ = Ts ϕ,ifeveryt s -total table t over T that satisfies Σ also satisfies ϕ. IfΣ does not imply ϕ in the presence of nfs(t s )wemayalso write Σ = Ts ϕ. The implication problem for functional dependencies and cardinality constraints in the presence of a null-free subschema is to decide, given any table schema T,anyNFSnfs(T s )overt,andanysetσ {ϕ} of FDs and CCs over T, whether Σ = Ts ϕ. For the class of FDs and CCs, the sets Σ {ϕ} over a fixed table schema T are not necessarily always finite. While for a fixed T there are only finitely many FDs, there might be infinitely many CCs by taking arbitrarily large upper bounds b N. However, for a fixed X T only the least b N that occurs is relevant. Therefore, we assume without loss of generality that they are finite. Note that for FDs and CCs (in the presence of an NFS) it does not matter whether we restrict our tables to those that are finite, i.e., the implication problem coincides with the finite implication problem where only finite tables are considered. For this reason, we will only speak of the implication problem. For an FD set Σ over a table schema T and an NFS nfs(t s )overt,letthe FD set ΣT s = {ϕ Σ = Ts ϕ} denote the semantic closure of Σ and nfs(t s ), and for a set X T let XΣ,T s = {H T Σ = Ts X H} denote the closure of X under Σ and nfs(t s ). For a set Σ of FDs and CCs over T let Σ[FD] = {X Y X Y Σ} {X T card(x) 1 Σ}. Foraset Σ {ϕ} of FDs and CCs, an NFS nfs(t s ), and a set R of inference rules let Σ R ϕ denote an inference of ϕ from Σ by R. That is, there is some sequence γ =[σ 1,...,σ n ] of FDs and CCs such that σ n = ϕ and every σ i is an element of Σ or results from an application of an inference rule in R to some elements in {σ 1,...,σ i 1 }. For a finite set Σ of FDs and CCs let Σ + R = {ϕ Σ R ϕ} denote the syntactic closure of Σ under inferences by R. R is said to be sound (complete) for the implication of FDs and CCs in the presence of an NFS if for every table schema T, for every NFS nfs(t s )overt and for every set Σ of FDs and CCs over T we have Σ + R Σ T s (ΣT s Σ + R ). The (finite) set R is said to be a (finite) axiomatization for the implication of FDs and CCs in the presence of an NFS if R is both sound and complete for the implication of FDs and CCs in the presence of an NFS. Example 3. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then the following are examples of CCs implied by Σ in the presence of nfs(t s ): card(dept) 2andcard(Emp, Dept) 1. However, neither the CC card(emp) 2northeFDEmp Mgr are implied by Σ in the presence of nfs(t s ). Indeed, the T s -total table

7 Functional Dependencies and Cardinality Constraints over Partial Bags 171 Table 1. Axiomatization S of FDs and CCs in the presence of an NFS X YZ X Y X Z XY X X Y X YZ (reflexivity) (decomposition) (union) X Y Y Z card(x) b card(x) 1 Y XT s X Z card(x) b +1 X T (null transitivity) (weakening) (demotion) X Y card(y ) b Y XT s card(x) b (null pullback) Emp Dept Mgr Sisyphus ni Trump Sisyphus ni Gates Sisyphus ni Jobs satisfies Σ, but violates card(emp) 2andEmp Mgr. 4 Characterizations of the Implication Problem The first target in our analysis is the establishment of an axiomatization for the implication of FDs and CCs in the presence of an NFS. The insights from the completeness proof will enable us to characterize the implication problem algorithmically, subsequently. 4.1 Axiomatic Characterization Let S denote the set of inference rules from Table 1. It is our goal to show that S forms a finite axiomatization. In our proof we will use the result by Atzeni and Morfuni that the set M, consisting of the reflexivity axiom, and the decomposition, union and null transitivity rule, forms a finite axiomatization for the implication of FDs [3]. Lemma 1. The weakening, demotion and null pullback rules are sound for the implication of FDs and CCs in the presence of an NFS. Note that the soundness of the reflexivity axiom and the null pullback rule also card(x) b imply the soundness of the superset rule. Indeed, the trivial FD card(xy ) b XY Y and the CC card(x) b allow us to infer the CC card(xy ) b by an application of the null pullback rule since Y XYT s.

8 172 S. Hartmann et al. Example 4. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then the following are examples of inferences from Σ and nfs(t s ) by S. An application of the null pullback rule to Dept Mgr, card(mgr) 2, and Mgr T s results in the CC card(dept) 2. That is, Dept Mgr card(mgr) 2 card(dept) 2. We now outline an inference of card(emp, Dept) 1 from Σ and nfs(t s ) by S. Applications of the reflexivity axiom result in Emp,Dept Emp and Emp,Dept Dept. An application of the null transitivity rule to Emp,Dept Dept, anddept Mgr as well as Dept {Emp,Dept,Mgr} results in the FD Emp,Dept Mgr. An application of the union rule to Emp,Dept Emp and Emp,Dept Mgr results in the FD Emp,Dept Emp,Mgr. Finally, an application of the null pullback rule to Emp,Dept Emp,Mgr, card(emp,mgr) 1, and {Emp, Mgr} {Emp,Dept,Mgr} results in card(emp,dept) 1. The tree Emp,Dept Dept Dept Mgr Emp,Dept Emp Emp,Dept Mgr Emp,Dept Emp,Mgr card(emp,mgr) 1 card(emp, Dept) 1 illustrates this inference. Before we turn to the completeness argument, we want to emphasize that any set of FDs alone can never imply any cardinality constraint. Proposition 1. Let T be a table schema, nfs(t s ) an NFS, and Σ asetoffds over T. Then for all cardinality constraints card(x) b over T we have Σ = Ts card(x) b. Proof. Let t denote the table over T that consists of b + 1 rows which have for every column header of T the same non-null value, i.e., t consists of b+1 duplicate total rows. Clearly, t satisfies Σ and nfs(t s ). Since t violates card(x) b it follows that Σ = Ts card(x) b. Corollary 1. Let T be a table schema. Then the FD X T over T does not imply the cardinality constraint card(x) 1. For the completeness of S the following lemma is central. Lemma 2. Let Σ be a set of FDs and CCs, and nfs(t s ) be an NFS over table schema T. Then the following hold: 1. Σ = Ts X Y if and only if Σ[FD] = Ts X Y,and 2. Σ = Ts card(x) b if and only if there is some card(y ) b Σ such that b b and Y XT s XΣ[FD],T s.

9 Functional Dependencies and Cardinality Constraints over Partial Bags 173 For the second part of Lemma 2 consider the special case where Σ consists of FDs only. Then no cardinality constraint can be implied by Σ in the presence of the NFS, in consistency with Proposition 1. We have now the means to verify that S is a finite axiomatization for the implication of FDs and CCs in the presence of an NFS. Note that S is indeed finite, since the rules apply to any given table schema T,anygivensetsX, Y, Z, T s T of column headers, and any given b N. In particular, the weakening rule applies to every given b N. Theorem 1. The set S is a finite axiomatization for the implication of FDs andccsinthepresenceofannfs. Proof. The soundness of S follows from Lemma 1 and the soundness of the rules in M, established in previous work [3]. Let Σ {ϕ} denote a set of FDs and CCs, and nfs(t s ) denote an NFS over table schema T. For the completeness of S we need to show that Σ = Ts ϕ implies Σ S ϕ. We distinguish between two cases. Firstly, let ϕ denote the FD X Y.FromΣ = Ts X Y we conclude that Σ[FD] = Ts X Y holds by the first part of Lemma 2. The completeness of M for the implication of FDs in the presence of an NFS shows that Σ[FD] M X Y holds. Since the demotion rule is part of S it follows that Σ S σ holds for every σ Σ[FD]. From M S we therefore conclude that Σ S X Y holds indeed. Secondly, let ϕ denote the CC card(x) b. From the second part of Lemma 2 it follows that Σ[FD] = Ts X Y,andthatcard(Y ) b Σ for some Y XT s and some b b. The first case of this completeness proof shows that Σ S X Y. An application of the null pullback rule yields Σ S card(x) b. Finally, applications of the weakening rule result in Σ S card(x) b. 4.2 Algorithmic Characterization In many situations it is not necessary to compute the set of all constraints implied by a given set. Instead, the question is whether a given fixed candidate constraint is implied by the given set of constraints. We will now investigate an algorithmic characterization of the implication problem for the combined class of functional dependencies and cardinality constraints in the presence of an NFS. Lemma 2 reduces the implication problem for the combined class of FDs and CCs in the presence of an NFS to the implication problem for the class of FDs in the presence of an NFS. Indeed, Σ = Ts X Y if and only if Y XΣ[FD],T s, and Σ = Ts card(x) b if and only if Y XΣ[FD],T s for some card(y ) b Σ such that b b and Y XT s. Therefore, the implication problem under consideration has been reduced to the computation of the closure XΣ[FD],T s of a given set X of column headers with respect to a given FD set Σ[FD]. This, however, has been done in previous work [3]. For reasons of completeness, we re-state the algorithm here.

10 174 S. Hartmann et al. Algorithm 2 (NFSClosure(X,Σ,T s,t )) Input: set X of column headers, FD set Σ, NFSnfs(T s )overtableschemat Output: closure XΣ,T s of X with respect to Σ and nfs(t s ) Method: (A0) CLOSURE := X; (A1) repeat OLDCLOSURE := CLOSURE; for all U V Σ do if U CLOSURE XT s then CLOSURE := CLOSURE V ; endif; enddo; until OLDCLOSURE = CLOSURE; (A2) return CLOSURE; Theorem 3. The implication problem Σ = Ts decided in time O( T Σ ). ϕ over table schema T can be Example 5. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. We have shown in Example 4 that Σ = Ts card(emp, Dept) 1. Alternatively, we could confirm this fact by using the second part of Lemma 2 and Algorithm 2. Indeed, it is true that card(emp, Mgr) 1 Σ and {Emp, Mgr} is a subset of the union of {Emp, Dept} and {Emp, Mgr}, aswellasasubsetofthe closure of {Emp, Dept} under Σ[FD] and nfs(t s ). In fact, {Emp, Dept} Σ[FD],T s = {Emp, Dept, Mgr}. 5 Armstrong Tables In this section we explore the concept of Armstrong databases for the combined class of FDs, CCs and NOT NULL constraints over partial bags. C-Armstrong databases are sample data that perfectly represent the set Σ of constraints from the class C currently perceived meaningful. Indeed, they satisfy Σ and violate every constraint in C not implied by Σ. As such, Armstrong databases are an effective means to consolidate and communicate the current perceptions of an application domain s semantics between various stake-holders of the database [11,21]. We will now extend recent results on Armstrong tables for the combined class of FDs and NOT NULL constraints over partial bags [11] by the class of cardinality constraints. Note that these results also extend early work on Armstrong relations for the class of FDs, pioneered by Demetrovics, Mannila, Räihä, Beeri, Dowd, Fagin and Statman [4,7,21]. 5.1 Central Concepts In a first step we will fix various notions required to establish results on the structural and computational properties of Armstrong tables. We begin with the concept most central to this section.

11 Functional Dependencies and Cardinality Constraints over Partial Bags 175 Definition 1. Let T be a table schema, nfs(t s ) an NFS, and Σ asetoffds and CCs over T.Atablet over T is said to be Armstrong for Σ and nfs(t s ) if and only if for every FD and CC ϕ over T : t satisfies ϕ if and only if Σ = Ts ϕ,and for every nfs(t s ) over T : t satisfies nfs(t s ) if and only if T s T s. Example 6. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then the following table Emp Dept Mgr Sisyphus ni Trump Sisyphus ni Gates Sisyphus ni Jobs Sisyphus ni Zuckerberg Gödel Computer Science Hilbert Church Computer Science Hilbert Newton Physics Gauss Leibniz Mathematics Gauss is an Armstrong table for Σ and nfs(t s ). For characterising the structure of Armstrong tables we need different notions of agreement between rows of a table. The different versions are motivated by the potential occurrence of null markers on the one hand, and the different classes of constraints we consider on the other hand. For functional dependencies it suffices to compare all pairs of distinct rows. Cardinality constraints, however, require us to compare any finite number of distinct rows, essentially up to the maximum bound that occurs in the given set of constraints. Definition 2. Let T be a table schema, t a table over T,andr 1,r 2 two rows over T. The agree set of r 1 and r 2 is defined as ag(r 1,r 2 ) = (X, Y ) where X = {H T r 1 (H) =r 2 (H) r 1 (H) ni}, andy = {H T r 1 (H) = r 2 (H)}. Thestrong agree set of r 1 and r 2 is defined as ag s (r 1,r 2 )=X where ag(r 1,r 2 )=(X, Y ). Theweak agree set of r 1 and r 2 is defined as ag w (r 1,r 2 )=Y where ag(r 1,r 2 )=(X, Y ). Theagree set of t is defined as ag(t) ={ag(r 1,r 2 ) r 1,r 2 t r 1 r 2 }.Thestrong agree set of t is defined as ag s (t) ={X (X, Y ) ag(t)}. Theweak agree set of t is defined as ag w (t) ={Y (X, Y ) ag(t)}. For X ag s (t) we define w(x) = {Y (X, Y ) ag(t)}. For every positive integer b>1 we define ag s b (t) ={ 1 i<j b ags (r i,r j ) r 1,...,r b t( 1 i<j b(r i r j ))}, ag s 1 (t) ={T } and ags (t) =. Example 7. Let t denote the table from Example 6 that is Armstrong for the set Σ and the NFS nfs(t s )overtableschemat from Example 2. Let r 1,r 2 denote the first two rows of t, respectively. Then ag(r 1,r 2 )=({Emp}, {Emp, Dept}), w(emp) ={Emp, Dept} and ag s (t) =ag s 2(t) ={, {Emp}, {Mgr}, {Dept, Mgr}}. Furthermore, ag s 3(t) =ag s 4(t) ={, {Emp}}.

12 176 S. Hartmann et al. An Armstrong table must violate all the cardinality constraints not implied by the given set. It suffices, however, for any non-empty set X to violate the cardinality constraint card(x) b X 1whereb X denotes the minimum positive integer for which card(x) b X is implied. Moreover, if there are two cardinality constraints card(x) b X and card(y ) b Y such that b X = b Y and Y X,then it suffices to violate card(x) b X 1. This motivates the following definitions. Definition 3. Let T be a table schema, nfs(t s ) an NFS, and Σ asetoffds and CCs over T.For X T let { min{b N Σ =Ts card(x) b}, if {b N Σ = b X = Ts card(x) b},else. The set dup Σ,Ts (T ) of duplicate sets is defined as dup Σ,Ts (T )={X T H T X(b XH <b X )}. Note that by Lemma 2 we have b X = min{b card(y ) b Σ Y XT s X + Σ[FD],T s },if{b N Σ = Ts card(x) b}. Example 8. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2. Then we have b Emp = 4, b Dept = b Mgr = b Dept,Mgr = 2, b Emp,Dept = b Emp,Mgr = b Emp,Dept,Mgr = 1. Therefore, dup Σ,Ts (T ) = {{Emp}, {Dept, Mgr}, {Emp, Dept, Mgr}}. An Armstrong table must also violate all the functional dependencies not implied by the given set. However, it suffices for each column header H to violate all FDs X H where X is maximal with the property that X H is not implied. Furthermore, if X is maximal for some H in this sense, X dup Σ,Ts (T )andx is not maximal for any H T T s, then we can violate card(x) b X 1such that for all H T s X, X H is also violated. These arguments motivate the following definitions. Definition 4. Let Σ be a set of FDs and let nfs(t s ) be an NFS over table schema T. For a column header H T we define the maximal sets max Σ,Ts (H) of H with respect to Σ and nfs(t s ) as follows: max Σ,Ts (H) :={ X T Σ = Ts X H H T X(Σ = Ts XH H)}. The maximal sets of T with respect to Σ and nfs(t s ) are defined as max Σ,Ts (T )= H T max Σ,T s (H). IfΣ and nfs(t s ) are clear from the context we may simply write max(h) and max(t ), respectively. Finally, max red Σ,T s (T ):=max Σ,Ts (T ) {X dup Σ,Ts (T ) H T T s (X/ max Σ,Ts (H))}. Example 9. Consider the set Σ with NFS nfs(t s ) over table schema T from Example 2. Recall from Example 8 that dup Σ,Ts (T ) = {{Emp}, {Dept, Mgr}, {Emp, Dept, Mgr}}. As maximal sets we compute max Σ,Ts (Emp) = {{Dept, Mgr}}, max Σ,Ts (Dept) = {{Mgr}} and max Σ,Ts (Mgr) ={{Emp}}. Therefore, max red Σ,T s (T )={{Mgr}}.

13 Functional Dependencies and Cardinality Constraints over Partial Bags Characterization We are now in a position to establish sufficient and necessary conditions when a given table is Armstrong for a given set Σ of FDs, CCs and an NFS nfs(t s ). This generalises recent work from the special case where Σ consists of FDs only [11]. In turn, that result had generalised a well-known result by Mannila, Räihä, Beeri, Dowd, Fagin and Statman for FDs over total relations [4,21]. In the following theorem, the first (third) condition ensures that all FDs (CCs) not implied by the given set are violated; and the second (fourth) condition ensures that all implied FDs (CCs) are satisfied. The final condition handles the NFS. Theorem 4. Let T be a table schema, Σ a set of standard FDs and standard CCs, and nfs(t s ) an NFS over T. Then for all tables t over T, t is an Armstrong table for Σ and nfs(t s ) if and only if all of the following conditions hold: 1. H T X max Σ,Ts (H)(X ag s (t) H/ w(x)), 2. X ag s (t)(xσ,t s w(x)), 3. X dup Σ,Ts (T ) Z ag s b X (t)(x Z), 4. card(x) b Σ Z ag s b+1 (t)(x Z), 5. total(t) =T s. Example 10. The previous examples show that the table t in Example 6 is Armstrong for the set Σ and the NFS nfs(t s )overtableschemat from Example 2. Indeed, the conditions of Theorem 4 are all satisfied by t. 5.3 Computation For the computation of an Armstrong table, Theorem 4 suggests that the maximal and duplicate sets need to be computed. The computation of the maximal sets with respect to a set of standard FDs and an NFS nfs(t s )overtableschema T has been studied in [11]. For the computation of duplicate sets we now outline an algorithm that is exponential in the size of Σ. While we leave optimizations of this algorithm for future work, we note that there are sets of FDs and there are sets of CCs, respectively, where every Armstrong table for this set is exponential in the size of Σ, cf.theorem7. Let Σ denote a set of standard FDs and standard CCs, nfs(t s )annfs, and X a set of column headers over table schema T. We first compute b X.We start with b X := and compute XΣ[FD],T s using Algorithm 2. Then, for each card(y ) b Σ such that Y XT s XΣ[FD],T s and b<b X we redefine b X := b. Next we compute the duplicate sets dup Σ,Ts (T ). We start with dup Σ,Ts (T )= {X X T }. ThenforeachX dup Σ,Ts (T )andeachh T X such that b XH = b X we redefine dup Σ,Ts (T ):=dup Σ,Ts (T ) {X}. Therefore, the time to compute dup Σ,Ts (T )andb X for each X dup Σ,Ts (T )isino(2 T T Σ ). Algorithm 5 shows the computation of an Armstrong table. The first two steps consist of the computations of the duplicate sets X and their associated b X,and the computation of the maximal sets covered in previous work. In step (A4), the algorithm generates for each duplicate set X ablockofb X rows that satisfies

14 178 S. Hartmann et al. card(x) b X, violates card(x) b X 1, and violates every FD X H where H T s X. Note that in this case there cannot be any H T s X such that X H is implied by Σ and nfs(t s ). Otherwise, since X is a duplicate set it wouldholdthatcard(xh) b<b X is an implied cardinality constraint. Due to the soundness of the null pullback rule, card(x) b<b X would be implied, too. This is a contradiction. In step (A5), the algorithm computes for each maximal set X the set Z of column headers in T T s for which X is maximal, and produces two rows which strongly agree on X and disagree on each column header in Z; unless X is also a duplicate set and each column header for which X is maximal is in T s. Finally, step (A7) introduces null marker occurrences in every column that is not in T s, unless such a column already features a null marker occurrence. Algorithm 5 (Armstrong table computation) Input: set Σ of standard FDs and standard CCs, an NFS nfs(t s )overtable schema T such that H T b N(Σ = Ts card(h) b) Output: Armstrong table t for Σ and nfs(t s ) Method: let c H,1,c H,2,... dom(h) be distinct (A0) compute dup Σ,Ts (T ) by the procedure outlined above; (A1) compute H T (max Σ[FD],Ts (H)) by Algorithm 8 in [11]; (A2) t := ; (A3) i := 1; (A4) for all X dup Σ,Ts (T )whereb X > 1 do t := t {r i,...,r i + b X 1} where j = i,...,i+ b X 1and H T c H,i, if H X r j (H) := c H,j, if H T s X ; ni, else i := i + b X ; (A5) for all X max red Σ,T s (T ) do Z := {H T T s X max Σ,Ts (H)}; t := t {r i,r{ i+1 } where H T ch,i, if H XZT r i (H) := s ;and ni, else c H,i, if H X r i+1 (H) := c H,i+1, if H Z(T s X) ; ni, else i := i +2; (A6) total(t) :={H T r t(r(h) ni)}; if total(t) T s, then return { t := t {r i } where for all H T, ni, if H total(t) Ts r i (H) := ; c H,i, else else return t; endif; Algorithm 5 works correctly.

15 Functional Dependencies and Cardinality Constraints over Partial Bags 179 Theorem 6. For every input (T,Σ,nfs(T s )), whereσ is a set of standard FDs and standard CCs, and nfs(t s ) is an NFS over table schema T such that for all H T there is some b N such that Σ = Ts card(h) b, Algorithm 5 computes an Armstrong table for Σ and nfs(t s ). Corollary 2. Let Σ be a set of standard FDs and CCs, and nfs(t s ) an NFS over table schema T. Then there is a table over T that is Armstrong for Σ and nfs(t s ) if and only if for all H T there is some b N such that Σ = Ts card(h) b. Proof. We show first that the conditions is necessary for the existence of some Armstrong table. Assume, to the contrary, that there is some H T such that for all b N we have Σ = Ts card(h) b. Then there is some H T such that b H =. Notethatdup Σ,Ts (T ) since T dup Σ,Ts (T ), and ag s (t) =. That is, the third condition of Theorem 4 is always violated. Hence, no Armstrong table over T exists for Σ and nfs(t s ). The condition is also sufficient. Indeed, under the hypothesis that the condition holds, Algorithm 5 produces an Armstrong table for Σ and nfs(t s ), as verified by Theorem 6. Example 11. Consider the set Σ with NFS nfs(t s )overtableschemat from Example 2 as input to Algorithm 5. Then the algorithm generates the following Armstrong table Emp Dept Mgr c Emp,1 ni c Mgr,1 c Emp,1 ni c Mgr,2 c Emp,1 ni c Mgr,3 c Emp,1 ni c Mgr,4 c Emp,5 c Dept,5 c Mgr,5 c Emp,6 c Dept,5 c Mgr,5 c Emp,7 c Dept,7 c Mgr,7 c Emp,8 c Dept,8 c Mgr,7 for Σ and nfs(t s ). Note that after suitable substitutions, this is the Armstrong table given in Example Complexity Considerations Corollary 3. Let Σ be a set of standard FDs and CCs, and nfs(t s ) an NFS over table schema T. It can be decided in time O( T 2 Σ ) whether there is an Armstrong table for Σ and nfs(t s ). Proof. For each H T we need to check that there is some card(x) b Σ such that X HT s HΣ[FD],T s, by Lemma 2. This condition can be verified in time O( T Σ ). Now we will analyse how well Algorithm 5 does in terms of how well one could potentially do in general. We say that an Armstrong table t for Σ and nfs(t s )

16 180 S. Hartmann et al. is said to be minimum-sized if there is no Armstrong table t for Σ and nfs(t s ) such that t < t. First of all, the problem of computing an Armstrong table for a given set Σ of standard FDs and standard CCs and an NFS over some table schema is precisely exponential in the size of Σ. IfΣ consists of a set of standard FDs only, then this result is known, cf. [11, Proposition 2]. We recall what we mean by precisely exponential [4]. Firstly, it means that there is an algorithm for computing an Armstrong table, given a set Σ of standard FDs and standard CCs and an NFS nfs(t s ), where the running time of the algorithm is exponential in Σ. Secondly, it means that there is a set Σ of standard FDs and CCs and an NFS nfs(t s )in which the number of rows in each minimum-sized Armstrong table for Σ and nfs(t s ) is exponential in Σ thus, an exponential amount of time is required in this case simply to write down the table. Theorem 7. The problem of computing an Armstrong table for a given set Σ of standard CCs and an NFS nfs(t s ) over table schema T is precisely exponential in the size of Σ. For the remainder of this paper we show that Algorithm 5 is quite conservative in its use of time and space, despite the problem of computing Armstrong tables is computationally hard. We will show that Algorithm 5 always computes an Armstrong tables whose number of rows is at most quadratic in the number of rows of a minimum-sized Armstrong table and the cardinality of the given constraint set. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s )annfs over table schema T.Wesaythatasets X of rows over T is X-agreeing if all rows in s X strongly agree on X and s X = b X. It follows from Theorem 4 that for every Armstrong table t over T for Σ and nfs(t s ) and every duplicate set X dup Σ,Ts (T )thereisanx-agreeing set s X t. Lemma 3. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be a table over T satisfying Σ, andx, Y dup Σ,Ts (T ) with X Y and b X = b Y = b X Y.Letfurthers X,s Y be X- and Y -agreeing subsets of t, respectively. Then s X s Y =. Proof. Assume r s X s Y.Thens X,s Y both strongly agree with r on X Y, so s X s Y strongly agrees on X Y.Sinceb X = b Y = b X Y and t satisfies Σ we must have s X s Y b X = b Y, and hence s X = s Y. This in turn implies that s X = s Y strongly agrees on X Y, and using again that t satisfies Σ we get b X Y b X = b Y.FromX Y it follows that X Y is a proper superset of X and/or Y. Together with b X Y b X = b Y this contradicts X, Y dup Σ,Ts (T ). Let card(y ) b Σ and X Y Σ[FD] T s with Y XT s.inparticular, card(x) b can be derived using the null-pullback rule. We say that card(y ) b is a source of card(x) b. ForasetX T we call card(y ) b asourceofx if card(y ) b is a source of card(x) b X.

17 Functional Dependencies and Cardinality Constraints over Partial Bags 181 Corollary 4. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T. Every cardinality constraint over T of the form card(x) b X has a source in Σ. Proof. By Lemma 2 there is some card(y ) b Σ with b b X and Y XT s XΣ[FD],T s. The latter condition is equivalent to X Y Σ[FD] T s and Y XT s. From this and card(y ) b Σ we can derive card(x) b using the null-pullback rule, and by definition of b X we have b b X. This shows b = b X, so card(y ) b X is a source of X in Σ. We denote by dup Σ,Ts (card(y ) b) the set of all duplicate sets for which card(y ) b is a source dup Σ,Ts (card(y ) b) := { X dup Σ,Ts (T ) card(y ) b is a source of X }. Lemma 4. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be an Armstrong table for Σ and nfs(t s ), and card(y ) b Σ. Then t dup Σ,Ts (card(y ) b) b. Proof. (1) For each X dup Σ,Ts (card(y ) b) wehavey XT s and X Y Σ[FD] T s. This implies XY XT s and X XY Σ[FD] T s. By definition of b XY we have Σ = Ts card(xy ) b XY. Using null-pullback we can derive card(x) b XY,sob X b XY.SinceX is a duplicate set, we must have Y X. (2) Table t contains an X-agreeing set s X for every X dup Σ,Ts (T )by Theorem 4, so in particular for every X dup Σ,Ts (card(y ) b). For every pair of distinct duplicate sets X 1,X 2 dup Σ,Ts (card(y ) b) wehavey X 1 X 2 by (1), and hence b X1 = b X2 = b Y = b X1 X 2.Thus,s X1 and s X2 are disjoint by Lemma 3. This gives us dup Σ,Ts (card(y ) b) disjoint sets s X t, eachof which contains b tuples, and shows the bound on t. Corollary 5. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be an Armstrong table for Σ and nfs(t s ) over T and D := dup Σ,Ts (T ). Then X D t. Σ Proof. By Corollary 4 every X Dhas a source in Σ, so b X b X X D card(y ) b Σ b X X dup Σ,Ts (card(y ) b) By Lemma 4, t X dup Σ,Ts (card(y ) b) b X for any card(y ) b Σ, and thus t Σ b X. X D card(y ) b Σ X dup Σ,Ts (card(y ) b) b X

18 182 S. Hartmann et al. Theorem 8. Let Σ denote a set of standard FDs and standard CCs, and nfs(t s ) an NFS over table schema T.Lett be an Armstrong table for Σ and nfs(t s ) and t c the Armstrong table for Σ and nfs(t s ) constructed in Algorithm 5. Then t c t ( t + Σ ). Proof. Denote by t A4, t A5 and t A6 the subsets of t c constructed in steps (A4), (A5) and (A6) of Algorithm 5, respectively. By Corollary 5 we have t A4 = b X t Σ. X dup Σ,Ts (T ) Steps (A5) and (A6) together construct a sub-table of that computed by Algorithm 10 in [11], thus giving us the bound (Corollary 5 in [11]) t A5 t A6 t 2. Combining these results yields the theorem. Corollary 6. Algorithm 5 computes an Armstrong table for Σ and nfs(t s ) whose number of rows is at most quadratic in the number of rows of a minimumsized Armstrong table for Σ and nfs(t s ) and the cardinality of Σ. Finally, we show that, in general, there is no most concise way of representing the information inherent in a set of standard CCs and a null-free subschema. In fact, there are cases in which the size of a minimum-sized Armstrong table is exponential in the size of the constraint set, and there are other cases in which the size of an optimal cover of a constraint set is exponential in the size of a minimum-sized Armstrong table. Theorem 9. Let C denote the class of FDs and CCs. There is some table schema T,somesetΣ of CCs and some NFS nfs(t s ) over T such that Σ has size O(n), and the size of a minimum-sized C-Armstrong table for Σ and nfs(t s ) is O(2 n ). There is some table schema T,somesetΣ of CCs and some NFS T s over T such that there is a C-Armstrong table for Σ and nfs(t s ) where the number of rows is in O(n), and the optimal cover of Σ with respect to nfs(t s ) has size O(2 n ). Proof. Let T = H 1 H 2n, T s = T and let Σ consist of the following standard CCs: for all i =1,...,n, card(h 2i 1 H 2i ) 1, and for all i =1,...,2n, card(h i ) 2. Then dup Σ,Ts (T ) contains the 2 n sets X T where for each i =1,...,n either H 2i 1 X or H 2i X. According to Theorem 4 every Armstrong table for Σ and nfs(t s ) contains a number of rows that is exponential in Σ. A similar construction was used in [4] to show that the size of a minimum-sized Armstrong relation can be exponential in the size of a given FD set. Let T = H 1 H 1 H nh n, T s = T, and let Σ consist of the following standard CCs: for all i =1,...,n, card(h i ) 3andcard(H i ) 3, and for all X = X 1 X n where X i {H i,h i }, card(x) 2. Then Σ is its own optimal cover,

19 Functional Dependencies and Cardinality Constraints over Partial Bags 183 i.e. there is no equivalent set Σ of standard FDs and standard CCs such that Σ < Σ. Thesize Σ is in O(2 n ). Furthermore, dup Σ,Ts (T ) consists of the n sets T H i H i for i =1,...,n,andthesetT,andmax Σ,T s (T ) consists of the 2n sets T H i and T H i for i =1,...,n. Thus, Algorithm 5 computes an Armstrong table for Σ and nfs(t s ) whose number of rows is in O(n). For these reasons we recommend the use of both representations. Indeed, the representation in form of constraint sets enables design teams to identify constraints they currently incorrectly perceive as semantically meaningful; and the representation in form of an Armstrong table enables design teams to identify constraints they currently incorrectly perceive as semantically meaningless. 6 Conclusion and Future Work We have investigated the combined class of functional dependencies, cardinality constraints and NOT NULL constraints over partial bags. This framework applies to the structure of SQL tables. We have characterized the associated implication problem of this class axiomatically and algorithmically. Our results show how reasoning about this expressive class of constraints can be done effectively and efficiently. Moreover, we have established several structural and computational properties of Armstrong tables for this class of constraints. Our results show how Armstrong tables can be used effectively to consolidate the semantics of an application domain expressed by the class of constraints studied. In future work we would like to implement our algorithms within a design tool. Such a tool may also be used to conduct empirical studies on the usefulness of Armstrong tables for the acquisition of semantically meaningful constraints in our class studied, very much along the lines of [16]. It seems desirable to extend our results to even more expressive classes of constraints, e.g. classes of multivalued and inclusion dependencies. Another challenging problem would be an extension to classes of cardinality constraints that also enforce lower bounds. Properties of Armstrong databases should also be studied in probabilistic and graph databases, and the concept of informative Armstrong databases should be investigated in non-relational models [6]. It would also be interesting to analyse interactions of cardinality constraints and functional dependencies under different interpretations of the null marker [12,18,23]. Acknowledgement. This research is supported by the Marsden fund council from Government funding, administered by the Royal Society of New Zealand. References 1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Armstrong, W.W.: Dependency structures of database relationships. Information Processing 74, (1974)

20 184 S. Hartmann et al. 3. Atzeni, P., Morfuni, N.: Functional dependencies and constraints on null values in database relations. Information and Control 70(1), 1 31 (1986) 4. Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), (1984) 5. Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), (1970) 6. De Marchi, F., Petit, J.-M.: Semantic sampling of existing databases through informative Armstrong databases. Inf. Syst. 32(3), (2007) 7. Demetrovics, J.: On the equivalence of candidate keys with Sperner systems. Acta Cybern. 4, (1980) 8. Diederich, J., Milton, J.: New methods and fast algorithms for database normalization. ACM Trans. Database Syst. 13(3), (1988) 9. Fagin, R.: Armstrong databases. Technical Report RJ3440(40926), IBM Research Laboratory, San Jose, California, USA (1982) 10. Hartmann, S.: On the implication problem for cardinality constraints and functional dependencies. Ann. Math. Art. Intell. 33, (2001) 11. Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. The VLDB Journal (2011), doi: / s Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. The Computer Journal 54(7), (2011) 13. Hartmann, S., Link, S.: Numerical constraints on XML data. Inf. Comput. 208(5), (2010) 14. Hartmann, S., Link, S.: When data dependencies over SQL tables meet the Logics of Paradox and S-3. In: PODS Conference (2010) 15. Imielinski, T., Lipski Jr., W.: Incomplete information in relational databases. J. ACM 31(4), (1984) 16. Langeveldt, W.-D., Link, S.: Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies. Inf. Syst. 35(3), (2010) 17. Lenzerini, M., Nobili, P.: On the satisfiability of dependency constraints in entityrelationship schemata. Inf. Syst. 15(4), (1990) 18. Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci. 206(1-2), (1998) 19. Liddle, S., Embley, D., Woodfield, S.: Cardinality constraints in semantic data models. Data Knowl. Eng. 11, (1993) 20. Lien, E.: On the equivalence of database models. J. ACM 29(2), (1982) 21. Mannila, H., Räihä, K.-J.: Design by example: An application of Armstrong relations. J. Comput. Syst. Sci. 33(2), (1986) 22. Sali, A., Schewe, K.-D.: Keys and Armstrong databases in trees with restructuring. Acta Cybern. 18(3), (2008) 23. Thalheim, B.: On semantic issues connected with keys in relational databases permitting null values. Elektronische Informationsverarbeitung und Kybernetik 25(1-2), (1989) 24. Thalheim, B.: Dependencies in relational databases. Teubner (1991) 25. Thalheim, B.: Fundamentals of Cardinality Constraints. In: Pernul, G., Tjoa, A.M. (eds.) ER LNCS, vol. 645, pp Springer, Heidelberg (1992) 26. Thalheim, B.: Entity-Relationship modeling. Springer, Heidelberg (2000)

Design by Example for SQL Tables with Functional Dependencies

VLDB Journal manuscript No. (will be inserted by the editor) Design by Example for SQL Tables with Functional Dependencies Sven Hartmann Markus Kirchberg Sebastian Link Received: date / Accepted: date