Functional Dependencies

Similar documents
Design theory for relational databases

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Relational-Database Design

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

Relational Design Theory

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic

Schema Refinement & Normalization Theory

Relational Database Design

Functional Dependency and Algorithmic Decomposition

A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN. Millist W. Vincent ABSTRACT

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Grammars (part II) Prof. Dan A. Simovici UMB

CS54100: Database Systems

32 Divisibility Theory in Integral Domains

Warm-Up Problem. Is the following true or false? 1/35

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Equational Logic. Chapter Syntax Terms and Term Algebras

A MODEL-THEORETIC PROOF OF HILBERT S NULLSTELLENSATZ

Introduction to Metalogic

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds

CS411 Notes 3 Induction and Recursion

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

MTH 309 Supplemental Lecture Notes Based on Robert Messer, Linear Algebra Gateway to Mathematics

Partially commutative linear logic: sequent calculus and phase semantics

Schema Refinement and Normal Forms

3.3. Multivalued Dependencies 81 this last equality and from u[ω Y ] = t 2 [Ω Y ] it then follows that t 1 and t 2 agree on the intersection Z (Ω Y )=

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

KRIPKE S THEORY OF TRUTH 1. INTRODUCTION

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Mathematical Reasoning & Proofs

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Functional Dependencies & Normalization. Dr. Bassam Hammo

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies

CHAPTER 10. Gentzen Style Proof Systems for Classical Logic

Schema Refinement and Normal Forms Chapter 19

Design Theory for Relational Databases

Lectures 6. Lecture 6: Design Theory

INF1383 -Bancos de Dados

Schema Refinement and Normal Forms

Classical Propositional Logic

CSC 261/461 Database Systems Lecture 13. Spring 2018

0 Sets and Induction. Sets

Groups. 3.1 Definition of a Group. Introduction. Definition 3.1 Group

6 Lecture 6: More constructions with Huber rings

Functional Dependencies and Normalization

Schema Refinement and Normalization

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Numerical representations of binary relations with thresholds: A brief survey 1

INVERSE LIMITS AND PROFINITE GROUPS

Constraints: Functional Dependencies

Formal Epistemology: Lecture Notes. Horacio Arló-Costa Carnegie Mellon University

CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield

Functional Dependencies and Normalization

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Schema Refinement and Normal Forms. Chapter 19

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Trichotomy Results on the Complexity of Reasoning with Disjunctive Logic Programs

Chapter 3 Design Theory for Relational Databases

Functional Dependencies

Axiomatic set theory. Chapter Why axiomatic set theory?

Lecture 6. s S} is a ring.

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Constraints: Functional Dependencies

17.1 Correctness of First-Order Tableaux

SPECIAL ATTRIBUTES FOR DATABASE NORMAL FORMS DETERMINATION

Logic, Sets, and Proofs

Marketing Impact on Diffusion in Social Networks

Propositional and Predicate Logic - IV

2. Prime and Maximal Ideals

A SYSTEM OF AXIOMATIC SET THEORY PART VI 62

Logic via Algebra. Sam Chong Tay. A Senior Exercise in Mathematics Kenyon College November 29, 2012

Introduction to Metalogic 1

MATH 8253 ALGEBRAIC GEOMETRY WEEK 12

Tree sets. Reinhard Diestel

On the Complexity of the Reflected Logic of Proofs

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

A SEQUENT SYSTEM OF THE LOGIC R FOR ROSSER SENTENCES 2. Abstract

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Exercises 1 - Solutions

Deductive Characterization of Logic

Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform. and Comput. 122 (1995) ]

Boolean Algebras. Chapter 2

THE EQUATIONAL THEORIES OF REPRESENTABLE RESIDUATED SEMIGROUPS

5-valued Non-deterministic Semantics for The Basic Paraconsistent Logic mci

CHAPTER 4 CLASSICAL PROPOSITIONAL SEMANTICS

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

Solutions to odd-numbered exercises Peter J. Cameron, Introduction to Algebra, Chapter 2

Canonical Calculi: Invertibility, Axiom expansion and (Non)-determinism

Chapter 3 Design Theory for Relational Databases

Strongly chordal and chordal bipartite graphs are sandwich monotone

Transcription:

Chapter 7 Functional Dependencies 7.1 Introduction 7.2 Proofs and Functional Dependencies 7.3 Keys and Functional Dependencies 7.4 Covers 7.5 Tableaux 7.6 Exercises 7.7 Bibliographical Comments 7.1 Introduction This chapter is centered around functional dependencies, the first to be introduced and the most important class of integrity constraints. The central issue examined is the possibility of constructing effectively the set of logical consequences of a set of functional dependencies. We need to be aware of every nontrivial functional dependency that follows from the set of dependencies identified in the design process in order to guarantee that minimal data redundancy in the tables of the database and good behavior of these tables with respect to updates. 7.2 Proofs and Functional Dependencies Whenever we have a set, F, of functional dependencies, we can ask the question What other functional dependencies necessarily follow from F? In other words, what other functional dependencies have the property that any table τ that satisfies the functional dependencies of F also satisfy these other functional dependencies? 275

276 CHAPTER 7. FUNCTIONAL DEPENDENCIES To make this more precise, let H be a set of attributes. Recall that FD(H) denotes the set of all functional dependencies that can be written using the attributes of H; i.e., FD(H) = {X Y X, Y H}. Let F FD(H) be a set of functional dependencies. In section 6.2.2, we introduced the semantic notion ( logical consequence ) that corresponds to this question. In the current section, we explore a way of determining syntactically which other functional dependencies are satisfied by every table of the schema S = (H, F). So, we examine methods for obtaining logical consequences of a set of functional dependencies. These methods are known as inference rules. The first author to consider this topic was W. W. Armstrong [Arm74]. Although equivalent to the ones we introduce below, his rules differ from ours. Nevertheless, it is common practice to refer to such collections of rules as Armstrong rules. After introducing these rules, we show in section 7.2.3 that they are correct ( sound ) and that they allow us to find all functional dependencies that are logical consequences of F ( complete ). We denote functional dependencies using φ (the Greek letter phi ), with or without subscripts. Definition 7.2.1 An n-ary inference rule is a relation R (FD(H)) n FD(H). If R is an n-ary rule, then we write φ 1,...,φ n φ R to mean ((φ 1,..., φ n ), φ) R. We refer to the pair ((φ 1,..., φ n ), φ) as an instance of the rule R. The functional dependencies φ 1,..., φ n are the hypotheses or premises. The functional dependency φ is the conclusion of this instance of the rule R, and we say that φ is obtained by applying rule R to φ 1,...,φ n. Following established practice in formal logic, we use the phrase hypotheses of a rule of inference rather than hypotheses of an instance of a rule of inference and similarly for the terms premises and conclusion. To be correct, any inference rule R must lead from true hypotheses to a true conclusion. Thus, for a correct rule R, ((φ 1,..., φ n ), φ) R means that from the fact that a table τ satisfies the functional dependencies φ 1,...,φ n we may conclude that τ satisfies the functional dependency φ. Example 7.2.2 Suppose that a table τ = (T, H, ρ) satisfies the functional dependencies X Y and Y Z. We claim that it also satisfies the functional dependency X Z. Indeed, let u, v ρ be two tuples of τ such that u[x] = v[x]. Since τ satisfies X Y we have u[y ] = v[y ];

7.2. PROOFS AND FUNCTIONAL DEPENDENCIES 277 thus, we infer that u[z] = v[z], which allows us to conclude that τ satisfies the functional dependency X Z. This suggests the introduction of the transitivity rule ((X Y, Y Z), X Z) for every X, Y, Z. Definition 7.2.3 Let U be a set of attributes. The Armstrong rules of inference are: for every X, Y, Z U. X Y R if Y X, incl (Inclusion Rule) X Y XZ Y Z Raug, (Augmentation Rule) X Y, Y Z R X Z trans, (Transitivity Rule) Although the formal proof of the soundness of these rules is deferred to section 7.2.3, it may help to note the following. The inclusion rule is a formal statement of the fact that for any table τ = (T, H, ρ) such that Y X H, τ satisfies the trivial functional dependency X Y (see Theorem 6.2.23). The augmentation rule captures the fact that every table τ that satisfies a functional dependency X Y also satisfies the functional dependency XZ Y Z for any set of attributes Z H, as the reader can easily verify. Note that we do not distinguish between functional dependencies like U V W and U WV since V W = WV = V W. Also, we frequently use the fact that Y Y = Y, which is the idempotency of set union written in the common database notation. Using Armstrong rules we can formulate the notion of proof for a functional dependency. Definition 7.2.4 Let F a set of functional dependencies. A sequence (φ 1,..., φ n ) of functional dependencies is an F-proof if one of the following is true for each i, 1 i n: (i) φ i F, or (ii) there exist j 1,..., j m, each less than i, such that ((φ j1,..., φ jm ), φ i ) is an instance of an Armstrong rule R. In the first case, we say that φ i is an initial functional dependency; in the second case, we say that φ j1,..., φ jm are used in the application of rule R. The length of the proof (φ 1,...,φ n ) is n. An F-proof of the functional dependency φ is a proof whose last entry is φ.

278 CHAPTER 7. FUNCTIONAL DEPENDENCIES If there exists an F-proof of a functional dependency φ, we write F φ and we say that φ is provable from F. Definition 7.2.5 An F-proof (φ 1,...,φ n ) is nonredundant if it satisfies the following conditions: 1. Every step φ j (where 1 j n 1) is used in the application of a rule. 2. No functional dependency occurs more than once in the proof. Theorem 7.2.6 For every F-proof of functional dependency φ, there exists a nonredundant proof of φ. Proof. The argument by strong induction on the length of proofs is straightforward, and we leave it to the reader. Theorem 7.2.6 shows that, whenever needed, we can assume that if X Y is provable from F, the F-proof of X Y is nonredundant. Example 7.2.7 Let F = {A C, CD AE, BE A}. We have the following proof for F AD E: 1. A C initial functional dependency 2. AD CD R aug and (1) 3. CD AE initial functional dependency 4. AD AE R trans and (2),(3) 5. AE E R incl 6. AD E R trans and (4),(5). Thus, AD E is provable from F. 7.2.1 Derived Inference Rules The Armstrong rules we introduced are quite spartan; for providing actual proofs, it helps to have additional rules. The ones we introduce below may be thought of as proof macros. They are useful tools for simplifying the presentation of proofs of functional dependencies, but any use of one of these derived rules could be replaced by a suitable series of steps to make an F-proof that does not rely on the derived rule. Definition 7.2.8 An n-ary derived rule of inference is a relation R (FD(H)) n FD(H) such that if ((φ 1,...,φ n ), φ) R we have {φ 1,..., φ n } φ. Example 7.2.9 The additivity rule R add is defined by X Y, X Y X Y Y

7.2. PROOFS AND FUNCTIONAL DEPENDENCIES 279 for all subsets X, Y, Y of the set of attributes H. Indeed, we have the proof: 1. X Y initial functional dependency, 2. X Y initial functional dependency, 3. X XY applying R aug to (1), 4. XY Y Y applying R aug to (2), 5. X Y Y applying R trans to (3) and (4). Note that in step (3) of the proof we augment both sides of the functional dependency X Y by X and then use the fact that XX = X. Example 7.2.10 The projectivity rule R proj is given by X Y Z X Y for all subsets X, Y, Z of H. To verify this derived rule consider the proof: 1. X Y Z initial functional dependency, 2. Y Z Y applying R incl, 3. X Y by applying R trans to (1) and (2). The usefulness of derived rules in presenting proofs for functional dependencies can be seen in the following example. Example 7.2.11 Consider the following proof of X WY Z from the hypotheses X Y Z and Z W: 1. X Y Z initial functional dependency, 2. Z W initial functional dependency, 3. Y Z Z applying R incl, 4. X Z applying R trans to (1) and (3), 5. X W applying R trans to (4) and (2), 6. X XY Z applying R aug to (1), 7. XY Z WY Z applying R aug to (5), 8. X WY Z applying R trans to (6) and (7). Note that step (4) is obtained by an application of the same steps we used in Example 7.2.10. Therefore, we can replace this derivation with its shorter variant: 1. X Y Z initial functional dependency, 2. Z W initial functional dependency, 3. X Z applying R proj to (1), 4. X W applying R trans to (4) and (2), 5. X XY Z applying R aug to (1), 6. XY Z WY Z applying R aug to (4), 7. X WY Z applying R trans to (5) and (6).

280 CHAPTER 7. FUNCTIONAL DEPENDENCIES Further, notice that steps (5), (6) and (7) represent the final part of the proof of the additivity rule. This allows us to generate the still shorter proof: 1. X Y Z initial functional dependency, 2. Z W initial functional dependency, 3. X Z applying R proj to (1), 4. X W applying R trans to (4) and (2), 5. X WY Z applying R add to (1) and (4). Note that the argument presented in this example introduces a new derived rule: X Y Z, Z W X WY Z We will refer to this rule as the amplification rule, and we will denote it by R ampl. We will use derived rules from now on in the same way as the basic rules R incl, R aug and R trans. 7.2.2 The Closure of a Set of Attributes The notion of closure of a set of attributes under a set of functional dependencies F provides us with a syntactic method for deciding whether a functional dependency X Y is provable from F; that is, if F X Y. Let H be a finite set of attributes, and let F be a set of functional dependencies, F FD(H). Starting from H, F and X, we compute a set cl H,F (X) such that F X Y if and only if Y cl H,F (X). As Corollary 7.2.21 below shows, the notion of provability of a functional dependency (F X Y ) is equivalent to the semantic notion of logical consequence (F = X Y ). Hence, the notion of closure provides us with a syntactic device for deciding if the functional dependency, φ, is a logical consequence of a set of functional dependencies F. This is very useful in the design and analysis of relational databases. Definition 7.2.12 Let H be a finite set of attributes, and let X be a subset of H. If F FD(H), we denote by D H,F (X) the collection of sets that contains all sets of attributes Y such that Y H and F X Y. Theorem 7.2.13 Let H be a finite set of attributes, and let F be a set of functional dependencies on H. For every subset X of H, the collection D H,F (X) contains a unique largest set. Proof. Note that X D H,F (X), so D H,F (X) is always nonempty. Suppose that D H,F (X) = {Y 0, Y 1,...,Y m 1 } with Y 0 = X and m 1. Since

7.2. PROOFS AND FUNCTIONAL DEPENDENCIES 281 F X Y i, by applying the additivity rule we obtain F X Y 0 Y m 1, so W = Y 0... Y m 1 D H,F (X). Since every Y D H,F (X) is included in W, it follows that W is the largest set of D H,F (X). The previous theorem justifies the next definition. Definition 7.2.14 Let H be a finite set of attributes, and let F be a set of functional dependencies on H. If X is a subset of H, the closure of X under the set F of functional dependencies is the largest set of D H,F (X). We denote this set by cl H,F (X). If the set H is understood from the context, we may write cl F (X) instead of cl H,F (X). 1 Corollary 7.2.15 Let H be a finite set of attributes, and let F be a set of functional dependencies on H. For every subset X of H we have F X cl F (X). Proof. This statement follows immediately from Theorem 7.2.13. Theorem 7.2.16 Let H be a finite set of attributes, and let F be a set of functional dependencies on H. For every subset X of H we have F X Y if and only if Y cl F (X). Proof. If Y cl F (X) then, by Corollary 7.2.15, F X cl F (X). An application of the projectivity rule yields F X Y. Conversely, if F X Y, the definition of cl H,F (X) implies Y cl H,F (X). Theorem 7.2.17 Let F be a set of functional dependencies on the set of attributes H. We have 1. X cl H,F (X), 2. X 1 X 2 implies cl H,F (X 1 ) cl H,F (X 2 ), 3. cl H,F (cl H,F (X)) = cl H,F (X), for every X, X 1, X 2 H. Proof. From the proof of Theorem 7.2.13, the first inclusion follows immediately. Next, observe that if X 1 X 2, then we have F X 2 X 1. By Corollary 7.2.15, we have F X 1 cl H,F (X 1 ). Therefore, by the transitivity rule, we obtain F X 2 cl H,F (X 1 ). This implies cl H,F (X 1 ) cl H,F (X 2 ). Finally, note that by the first property we have cl H,F (cl H,F (X)) cl H,F (X). To prove the reverse inclusion, note that F X cl H,F (X) and F cl H,F (X) cl H,F (cl H,F (X)), by 1 We prefer this notation for the closure of a set of attributes under a set F of functional dependencies to the more popular notations X + F or X +, because it is clearly distinct from F +, the set of logical consequences of F, and avoids confusing the reader.

282 CHAPTER 7. FUNCTIONAL DEPENDENCIES Corollary 7.2.15. An application of the transitivity rule gives F X cl H,F (cl H,F (X)), and this implies cl H,F (cl H,F (X)) cl H,F (X). 7.2.3 Soundness and Completeness In this section we show the equivalence of = and. Thus, we prove that {φ F = φ} = {φ F φ} for every set of functional dependencies F. In other words, we show that the functional dependencies that are logical consequences of F are precisely those that are provable from F. We do this by proving that the existence of an F-proof of a functional dependency X Y guarantees that X Y is a logical consequence of F (the soundness of Armstrong rules) and that every functional dependency that is a logical consequence of F has an F-proof (the completeness of Armstrong rules). Soundness means that using the Armstrong rules we can generate only logical consequences, and completeness means that we can generate proofs for all such logical consequences. Theorem 7.2.18 (Soundness Theorem) If F X Y, then F = X Y. Proof. The argument is by induction on the length n of the proof of X Y in F. If n = 1, we have either X Y F or Y X. In either case, it is clear that F = X Y. Suppose that the statement holds for each proof of length less than n and that (φ 1,..., φ n ) is an F-proof of X Y. Then, φ n = X Y must fall into one of the following cases: 1. If X Y belongs to F, then, as in the base case, F = X Y. 2. If φ n = X Y is obtained from two predecessors φ j = X W and φ i = W Y (where i, j < n) by applying the transitivity rule, then, by the inductive hypothesis, F = X W and F = W Y. Let τ = (T, H, ρ) SAT (F), and let u, v ρ be two tuples of τ such that u[x] = v[x]. Since F = X W, we have u[w] = v[w]. In turn, since F = W Y we obtain u[y ] = v[y ], so τ satisfies X Y. Thus, F = X Y. 3. If X Y is obtained from a previous functional dependency X Y by applying the augmentation rule, then there exists a set of attributes Z such that X = X Z and Y = Y Z. By the inductive hypothesis, F = X Y. Now, if u, v ρ and u[x Z] = v[x Z] we have u[x ] = v[x ] and u[z] = v[z]. The first equality implies u[y ] = v[y ] because F = X Y, so u[y ] = u[y Z] = v[y Z] = v[y ]. This shows that F = X Y.

7.2. PROOFS AND FUNCTIONAL DEPENDENCIES 283 4. If X Y is obtained by applying R incl, then obviously F = X Y. To prove that F = X Y implies F X Y we need a preliminary result. Lemma 7.2.19 Let H be a finite set of attributes, and let F be a set of functional dependencies, F FD(H). For every nonempty set of attributes X, X H, there exists a table τ H,F,X = (T H,F,X, H, ρ) such that ρ consists of two tuples that coincide on X, and τ satisfies all functional dependencies of F. Proof. Let H = A 1... A n. Recall that Dom(A i ) 2, and let a i, b i be two distinct values in Dom(A i ) for 1 i n. Define the tuple u by u[a i ] = a i for 1 i n and the tuple v by { ai if A v[a i ] = i cl F (X) otherwise. b i Without loss of generality assume that cl F (X) = A 1... A k. We prove that the table τ H,F,X given by T H,F,X cl F (X) H cl F (X) A 1 A k A k+1 A n u a 1 a k a k+1 a n v a 1 a k b k+1 b n satisfies all functional dependencies of F. Suppose that Y Z is a functional dependency of F that τ H,F,X violates. Then, we have u[y ] = v[y ] and u[z] v[z]. By the construction of τ H,F,X, this implies Y cl F (X) (7.1) Z cl F (X) (7.2) By Theorem 7.2.17, inclusion (7.1) implies cl F (Y ) cl F (cl F (X)), and thus, by part 3 of the same theorem, cl F (Y ) cl F (X). Now, since Y Z F, we have Z cl F (Y ) cl F (X), which contradicts (7.2). Thus τ H,F,X cannot violate any functional dependency of F. We refer to τ H,F,X as the Armstrong table on X. Theorem 7.2.20 (Completeness Theorem) Let H be a finite set of attributes, and let F a set of functional dependencies, F FD(H). If F = X Y, then F X Y.

284 CHAPTER 7. FUNCTIONAL DEPENDENCIES Proof. Suppose that X W is a logical consequence of F, but X W is not provable from F. Then, W cl F (X). Let τ H,F,X be the Armstrong table on X. By Lemma 7.2.19, τ H,F,X satisfies all functional dependencies of F, and therefore, it satisfies X W. Since u[x] = v[x] and u[w] v[w], we have a contradiction. Therefore, X W must be provable from F. Corollary 7.2.21 Let H be a finite set of attributes, and let F a set of functional dependencies, F FD(H). F = X Y if and only if F X Y. Proof. This follows immediately from Theorems 7.2.18 and 7.2.20. We present an application of the notions discussed in this section that is useful in decomposing database schemas. Theorem 7.2.22 Let S = (H, F) be a table schema, and let U, V H, be two sets of attributes such that U V = H. Then, ρ = ρ[u] ρ[v ] for every table τ = (T, H, ρ) of the schema S if and only if at least one of the functional dependencies U V U or U V V belongs to F +. Proof. Suppose that we have ρ = ρ[u] ρ[v ] for every table τ = (T, H, ρ) of the table schemasand that neither U V U nor U V V belongs to F +. Choose τ to be an Armstrong table τ H,F,U V. Our assumption implies that U cl F (U V ) and V cl F (U V ). Therefore, τ H,F,U V violates both U V U and U V V. This means that τ H,F,U V has the form: T H,F,U V U cl F ((U V )) cl F (U V ) V cl F ((U V )) A 1 A p A p+1 A q A q+1 A n a 1 a p a p+1 a q a q+1 a n b 1 b p a p+1 a q b q+1 b n Accordingly, we have the projections: T H,F,U V [U] A 1 A p A p+1 A q a 1 a p a p+1 a q b 1 b p a p+1 a q and T H,F,U V [V ] A p+1 A q A q+1 A n a p+1 a q a q+1 a n a p+1 a q b q+1 b n The join T H,F,U V [U] T H,F,U V [V ] is

7.2. PROOFS AND FUNCTIONAL DEPENDENCIES 285 T H,F,U V U cl F ((U V )) cl F (U V ) V cl F ((U V )) A 1 A p A p+1 A q A q+1 A n a 1 a p a p+1 a q a q+1 a n a 1 a p a p+1 a q b q+1 b n b 1 b p a p+1 a q b q+1 b n a 1 a p a p+1 a q b q+1 b n and so ρ ρ[u] ρ[v ]. Conversely, assume that one of U V U or U V V belongs to F +, say U V U. Let τ = (T, H, ρ) be a table of the schema S; since τ satisfies all functional dependencies of F, it also satisfies U V U. If r ρ[u] ρ[v ], then there exist r ρ[u] and r ρ[v ] such that r and r are joinable and r r = r. In turn, this implies the existence of the tuples s, s ρ such that r = s [U] and r = s [V ]. The joinability of r and r implies s [U V ] = r [U V ] = r [U V ] = s [U V ] and, since ρ satisfies the functional dependency U V U we also obtain s [U] = s [U]. Since r = r r we have r[u] = r and r[v ] = r. We claim that r = s. Indeed, we have r[u] = r = s [U] = s [U] and r[v ] = r = s [V ]. Since U V = H, r and s coincide on all attributes of H, so r = s ρ. This proves that ρ[u] ρ[v ] ρ, so ρ[u] ρ[v ] = ρ. Corollary 7.2.23 If S = (H, F) and X Y F +, then for every table τ = (T, H, ρ) of this schema, we have ρ = ρ[xy ] ρ[xz], where Z = H XY. 7.2.4 Closure Computation It is helpful to be able to calculate cl F (X) to be able to compute F + ; this is essential for determining whether relational schemas satisfy certain conditions known as normal forms (see Section 8.2). Let H be a set of attributes, F be a set of functional dependencies, F FD(H), and X be a subset of H. The following algorithm computes the closure cl F (X). Algorithm 7.2.24 Algorithm for Computing cl F (X) Input: A finite set H of attributes, a set F of functional dependencies over H, and a subset X of H. Output: The closure cl F (X) of the set X. Method: Construct an increasing sequence CS F (X) of subsets of H: X 0 X k

286 CHAPTER 7. FUNCTIONAL DEPENDENCIES defined by Stage 0: X 0 = X Stage k + 1: X k+1 = X k {Z Y Z F and Y X k } If X k+1 = X k, then stop; we have cl F (X) = X k. Otherwise, continue with the next value of k. We refer to CS F (X) as the F-closure sequence of X. Let X, X be two subsets of H with CS F (X) = (X 0,..., X n ) and CS F (X ) = (X 0,...,X m). We write CS F (X) CS F (X ) if for every i, 1 i n, there exists j i such that 1 j i m and X i X j i. Note that X X implies CS F (X) CS F (X ). Also, CS F (X i ) is a suffix of the sequence CS F (X) for every X i in CS F (X). Therefore, if CS F (X) = (X 0,..., X k ), then CS F (X k ) = (X k ). Proof of Correctness: Note that the algorithm does indeed terminate, i.e. X n = X n+1 for some n N, because the members of the sequence are all subsets of the finite set H. To prove that the algorithm correctly computes cl F (X), suppose that there exists a proof F X Y of length n. We prove, by strong induction on n 1, that Y X k, where CS F (X) = (X 0,..., X k ). If n = 1, Y X = X 0 X k, so the basis case is obviously true. Suppose that this holds for proofs of length less than n, and let φ 1,..., φ n be a proof of length n, where φ n = X Y. We consider three cases: 1. If φ n was produced by the inclusion rule, we have Y X = X 0 X k. 2. Suppose that φ n was generated from φ p (where p < n) by applying the augmentation rule. In this case, φ p = U V, and X = UZ, Y = V Z for some subset Z of H. By the inductive hypothesis, V U h, where CS F (U) = (U 0, U 1,...,U h ). Since CS F (U) CS F (X), we have U h X k, so V X k ; thus, Y = V Z X k because Z X X k. 3. If φ n was obtained from φ p, φ q by transitivity, there exists a subset S of H such that φ p = X S and φ q = S Y. By the inductive hypothesis, S X k, and Y S m, where CS F (X) = (X 0,...,X k ) and CS F (S) = (S 0,..., S m ). Since CS F (S) CS F (X k ), and since CS F (X k ) = (X k ), we have S m X k. In turn, this implies Y X k. This proves that Y X k for every Y such that F X Y, so cl F (X) X k. The reverse inclusion can be immediately obtained by showing by induction on i that F X X i for every X i in CS F (X). This shows that X i cl F (X) for every X i. In particular, X k cl F (X).

7.3. KEYS AND FUNCTIONAL DEPENDENCIES 287 Example 7.2.25 Let H = ABCDE, and let F be the set of functional dependencies F = {AB C, CD E, AE B}. Suppose that we wish to compute cl F (AE). We build the sequence X 0 = AE X 1 = AEB X 2 = AEBC X 3 = AEBC The algorithm stops when we detect that X 2 = X 3. So, the closure of AE is AEBC. A similar computation shows that the closure of AD is AD and the closure of AED is ABCDE. 7.3 Keys and Functional Dependencies In Definition 2.1.12, we introduced a key of a table τ = (T, H, ρ) as a set of attributes K H that satisfies two conditions: 1. If u[k] = v[k], then u = v for all tuples u, v ρ (unique identification property). 2. There is no proper subset L of K that has the unique identification property (minimality property). The first condition requires the table τ to satisfy the functional dependency K H; the second requires K to contain no proper subset L such that τ would satisfy L H. Now, we formulate this notion in the context of table schemas. Definition 7.3.1 Let S = (H, F) be a table schema with functional dependencies. A key of the schema S is a set K that satisfies the following conditions: 1. K H F + (unique identification property). 2. There is no proper subset L of K such that L H F + (minimality property). Using Theorem 7.2.16, we obtain the following, which can serve as an alternate characterization for keys. Theorem 7.3.2 A set of attributes K is a key for a table schema with functional dependencies S = (H, F) if and only if cl F (K) = H, and for every attribute A of K, cl F (K {A}) H.

288 CHAPTER 7. FUNCTIONAL DEPENDENCIES Proof. The argument is straightforward and is left to the reader. Example 7.3.3 Let S = (ABCDE, F) be a table schema with functional dependencies, where F = {AB C, D C, AE BD}. We show how to determine the keys of this schema using F-closure sequences. Note that there is no functional dependency in F that has either A or E in its right member. Assume that X is a key of this schema; then, A X. If it were not, no set X k in CS F (X) will contain A. Similarly, E must be in X. Therefore, any key of this schema must contain A and E. The F-closure sequence of AE is: X 0 X 1 X 2 X 3 = AE = AEBD = AEBDC = AEBDC The first condition of Theorem 7.3.2 is clearly satisfied. To verify the second condition, note that cl F (A) = A and cl F (E) = E. Therefore, AE is a key. Moreover, since every key must contain AE it follows that AE is the only key of this schema. In general a table schema can have more than one key; in fact, it is possible to find table schemas that have a number of keys that is exponential in the number of attributes. Example 7.3.4 Consider the table schema S = (A 1 A n B 1 B n, F), where F = {A 1 B 1,..., A n B n, B 1 A 1,...,B n A n } Note that each set K of n attributes, K = C 1... C n, where C i {A i, B i } for 1 i n, is a key for S. Since there are 2 n such sets, the number of keys of this schema grows exponentially with the number of attributes. Definition 7.3.5 Each attribute A of a key of a table schema with functional dependencies S = (H, F) is referred to as a prime attribute. The notion of prime attribute is important in defining normal forms of table schemas. Example 7.3.6 The prime attributes of the schema considered in Example 7.3.3 are A and E, since AE is the single key of this schema. On the other hand, each attribute of the schema considered in Example 7.3.4 is prime.

7.4. COVERS 289 Example 7.3.7 Consider the schema S = (stno cno empno sem year grade, F), where the set F consists of the functional dependencies cno sem year empno stno cno sem year grade The table GRADES of the college database belongs to SAT (S). It is easy to see that the single key of this schema is stno cno sem year. So, the prime attributes of S are stno, cno, sem, year. 7.4 Covers Restricting and standardizing functional dependencies makes them easier to manipulate and compare. Definition 7.4.1 Let F, G be two sets of functional dependencies, F, G FD(H). F and G are equivalent if F + = G +. In this case, we call F a cover for G, and G a cover for F. 2 If F, G are equivalent sets of functional dependencies we write F G. Theorem 7.4.2 Let F, G be two sets of functional dependencies, F, G FD(H). The following three statements are equivalent: (i) F G + ; (ii) F + G + ; (iii) cl F (X) cl G (X) for every subset X of H. Proof. (i) implies (ii). Assume F G +. The first part of Theorem 6.2.21 gives F + (G + ) +. The second part of that theorem gives (G + ) + = G +, whence F + G +. (ii) implies (iii). Suppose that (ii) holds. Since X cl F (X) F + we have X cl F (X) G + so cl F (X) cl G (X) by the maximality of cl G (X). (iii) implies (i). If (iii) holds and X Y F, from Y cl F (X) cl G (X) it follows that X Y G +. Therefore, (i) holds. The next corollary gives us a useful instrument for proving equivalence of functional dependencies. Corollary 7.4.3 Let F, G be two sets of functional dependencies, F, G FD(H). The following three statements are equivalent: 2 The choice of the term cover is regretable because the usual English semantics of this word implies an asymmetry. Nevertheless, we use it here to adhere to standard terminology.

290 CHAPTER 7. FUNCTIONAL DEPENDENCIES 1. F G + and G F + ; 2. F, G are equivalent sets of functional dependencies; 3. cl F (X) = cl G (X) for every subset X of H. Proof. The Corollary is an immediate consequence of Theorem 7.4.2. Definition 7.4.4 A unit functional dependency is a functional dependency whose right member consists of a single attribute. Unit functional dependencies in FD(H) are, of course, of the form X A, where X is a subset of H and A is a member of H. Theorem 7.4.5 For every set F of functional dependencies, F FD(H) there exists an equivalent set G FD(H) such that all dependencies of G are unit functional dependencies. Proof. Define G as G = {X A X Y F and A Y }. The projectivity rule implies that X A F + for every X A G}. On the other hand, if X Y F and Y = A 1...A m, then X A 1,...,X A m G, and the additivity rule implies that X Y G +. Therefore, Corollary 7.4.3 implies the equivalence of F and G. Definition 7.4.6 A set F of functional dependencies is nonredundant if there is no proper subset G of F such that G F. Otherwise, F is a redundant set of functional dependencies. Clearly, a set F is nonredundant if for every X Y F, (F {X Y }) + F +. Also, any subset of a nonredundant set of functional dependencies is nonredundant. Given a set F of functional dependencies, it is possible that more than one nonredundant cover for F can be found. For instance, the set of unit functional dependencies: F = {A B, B A, B C, C B, A C, C A} is clearly redundant. However, F 1 = {A B, B A, B C, C B}, F 2 = {B C, C B, A C, C A}, and F 3 = {A B, B A, A C, C A} are each nonredundant and equivalent to F. Algorithm 7.4.7 (Computation of a Nonredundant Cover) Input: A finite set of attributes H and a set F of functional dependencies, F FD(H). Output: A nonredundant cover F of F.

7.4. COVERS 291 Method: Let φ 1,...,φ n Y n be a sequence that consists of all functional dependencies of F without any repetitions. Construct a sequence of sets of functional dependencies F 0, F 1,..., F n where F 0 = F and { Fi {φ F i+1 = i+1 } if F i {φ i+1 } F i otherwise F i for 0 i < n. Output the set F = F n. Proof of Correctness: It is immediate that the set F n is nonredundant and equivalent to F. The nonredundant set of functional dependencies obtained in Algorithm 7.4.7 depends on the order in which we consider the functional dependencies. This is not surprising in view of the remark that precedes the algorithm. Observe that even if F is a nonredundant set of functional dependencies, the set G of unit functional dependencies constructed in Theorem 7.4.5 may be redundant. For instance, starting from the nonredundant set F = {A BC, C B} the constructed set G = {A B, A C, C B} is redundant because G {A C, C B}. For reasons that are made apparent in Section 8.2, it is desirable to have table schemas containing functional dependencies with the property that the smallest possible set of attributes determines the largest possible number of remaining attributes. Among other benefits, this helps reduce storage requirements. Thus, we seek to minimize the size of X in any functional dependency X Y. The next definition formalizes this requirement. Definition 7.4.8 Let F be a set of functional dependencies, and let X Y be a functional dependency in F. X Y is F-reduced if there exists no proper subset X of X such that (F {X Y }) {X Y } F. The set F is reduced if it consists only of F-reduced functional dependencies. Lemma 7.4.9 Let F be a set of functional dependencies, F FD(H), and let X Y F. If X X, then F + ((F {X Y }) {X Y }) +. Proof. Let F = (F {X Y }) {X Y }. Observe that the definition of F implies that for every set of attributes W we have {V U V F, U W } {V U V F, U W }. (7.3) To show that F + (F ) + it suffices to show that cl F (U) cl F (U) for every set U H. Let CS F (U) = (U 0, U 1,..., U n ), and let CS F (U) = (U 0, U 1,..., U m ). To prove that CS F (U) CS F (U), consider a set U i

292 CHAPTER 7. FUNCTIONAL DEPENDENCIES from CS F (U). We show, by induction on i, that U i U m. For i = 0 this statement is immediate because U 0 = U 0 U m. Therefore, assume that U i U m. We have U i+1 = U i {V U V F and U U i } U m {V U V F and U U i } = U m+1 = U m, in view of inclusion 7.3. Since CS F (U) CS F (U), it follows that cl F (U) cl F (U). Theorem 7.4.10 For every finite set F of functional dependencies there exists an equivalent, reduced, finite set F of functional dependencies. Proof. The argument is constructive. For each functional dependency X Y of F and each attribute A X, determine if Y cl F (X A); if this is the case, replace X Y in F by (X A) Y. We claim that F is equivalent to F {X Y } {(X A) Y }. Note that F (F {X Y } {(X A) Y }) +. On the other hand, F {X Y } {(X A) Y } F + because Y cl F (X A), so F and F {X Y } {(X A) Y } are equivalent by Corollary 7.4.3. Since F is finite, the procedure can be applied only a finite number of times. At the end, the remaining set of functional dependencies consists of F-reduced functional dependencies. Example 7.4.11 Let H = ABC, and let F = {AB C, A B} FD(H). It is easy to verify the following equalities: cl F (A) = ABC,cl F (B) = B,cl F (C) = C, and cl F (AB) = cl F (AC) = ABC, cl F (BC) = BC. If we drop A from AB C we note that we cannot infer B C from F because cl F (B) = B. On the other hand, if we drop B from AB C, note that we can infer A C from F since cl F (A) = ABC. Therefore, {A C, A B} is an equivalent, reduced set of functional dependencies. Lemma 7.4.12 If F is a reduced set of functional dependencies, F FD(H) and F is a nonredundant set obtained from F by applying the algorithm 7.4.7, then F is a reduced set of functional dependencies. Proof. The argument is straightforward and is left to the reader.

7.4. COVERS 293 Definition 7.4.13 Let F be a set of functional dependencies, F FD(H). A canonical form of F is a nonredundant and reduced set G of unit functional dependencies that is equivalent to F. Theorem 7.4.14 For every finite set F of functional dependencies there exists a canonical form of F. Proof. Starting from F, construct an equivalent set F 1 of functional dependencies of the form X A as in Theorem 7.4.5. Next, from F 1 construct an equivalent set F 2 that is reduced and consists of unit functional dependencies. Finally, from F 2 construct an equivalent nonredundant set F 3 by applying Algorithm 7.4.7. Lemma 7.4.12 implies that F 3 is reduced. Example 7.4.15 Let H = ABCDE be a set of attributes, and let F be the set of functional dependencies given by F = {A BCD, AB DE, BE AC}. The set F 1 is F 1 = {A B, A C, A D, AB D, AB E, BE A, BE C}. To build the reduced set F 2 we need to examine functional dependencies in F 1 that have more than one attribute in their left members: AB D, AB E, BE A, BE C. Note that cl F1 (A) = ABCDE. Therefore, we can eliminate B in the left member of AB D. The resulting functional dependency is already in F 1. Since cl F1 (B) = B, note that A cannot be removed from AB D. Starting from AB E we obtain A E. Sincecl F1 (E) = E no more functional dependencies can be obtained. Thus, F 2 = {A B, A C, A D, A E, AB D, AB E, BE A, BE C}. Applying Algorithm 7.4.7, we obtain the set of unit functional dependencies F 3 = {A B, A C, A D, A E, BE A} that is a canonical cover for F. The following theorem plays an essential role in synthesizing database schemas that satisfy certain normal forms. We use it in Section 8.3. Theorem 7.4.16 Let S = (H, F) be a schema with functional dependencies, and let K be a key for S. If G = {X i A i 1 i n} is a canonical form for F, then 1. No set X i A i is included in K; 2. K {A i 1 i n} = H; 3. H = (K, X 1 A 1,..., X n A n ) is a lossless decomposition of every table of S.

294 CHAPTER 7. FUNCTIONAL DEPENDENCIES Proof. To prove the first part of the theorem observe that if X i A i were a subset of K, then K A i would also be a key, thereby contradicting the minimality of K. For the second part of the theorem, note that cl G (K) = cl F (K) = H because F, G are equivalent sets of functional dependencies and K is a key for F. Let CS G (K) = (K 0,..., K l,..., K m ) be the G-closure sequence of K, where K m = H. For each A H, define the number p A by p A = min{l 0 l m and A K l }. Note that p A exists because cl G (K) = H. If p A = 0, then A K. Otherwise, A K pa K pa 1 which means that there exists a functional dependency X i A i G such that X i K pa 1 and A i = A K pa. So, in any case, we have A K {A i 1 i n}. To prove the last part of the theorem, consider a table τ = (T, H, ρ) of the schemas. Let t, t 1,...,t n be n+1 joinable tuples such that t i ρ[x i A i ] for 1 i n and t ρ[k]. Then, ρ contains the tuples s, s 1,...,s n such that s i [X i A i ] = t i for 0 i n and s[k] = t. We assume that the attributes A 1,..., A n are listed such that p Ai p Aj implies i j. Let L 0 = K and L i = KA 1... A i for 1 i n, where L n = H. We have X i L i 1 for 1 i n. We prove by induction on i, 1 i n, that (t t 1 t i )[L i ] = s[l i ]. For i = 1, the joinability of t and t 1 implies that t[x 1 ] = t 1 [X 1 ], so s[x 1 ] = s 1 [X 1 ], which gives s[a 1 ] = s 1 [A 1 ]. Therefore, (t t 1 )[L 1 ] = s[l 1 ]. Suppose that (t t 1 t i )[L i ] = s[l i ]. We claim that (t t 1 t i t i+1 )[L i+1 ] = s[l i+1 ]. Note that X i+1 L i. The tuple t i+1 is joinable with (t t 1 t i ); this implies (t t 1 t i )[X i+1 ] = t i+1 [X i+1 ], so s i+1 [A i+1 ] = s[a i+1 ]. This gives the desired conclusion. For i = n we obtain t t 1 t n = s, which proves that H is a lossless decomposition. Example 7.4.17 Consider the schema S = (A 1... A 6, F), where A 1 A 2 is a key for F. Let G be a canonical form for F: G = {A 1 A 3, A 2 A 4, A 1 A 4 A 5, A 2 A 3 A 6 }. For any table of the schema S we have the lossless decomposition: H = (A 1 A 2, A 1 A 3, A 2 A 4, A 1 A 4 A 5, A 2 A 3 A 6 ).

7.5. TABLEAUX 295 Example 7.4.18 Let S = (H, F) be the table schema introduced in Example 7.3.7. Let K = stno cno sem year. The set F that consists of the functional dependencies cno sem year empno stno cno sem year grade is already in canonical form. Therefore, every table τ of S has the lossless decomposition H = (H 1, H 2, H 3 ), where H 1 H 2 H 3 = stno cno sem year = cno sem year empno = stno cno sem year grade Further, since H 1 H 3, we can drop H 1 from this decomposition. Thus, H = (H 2, H 3 ) is also a lossless decomposition of any table τ of S. In concluding this section, we stress that its results are independent of any specific table of a schema. In other words, they are applicable to all tables of a schema. Over time, tables change but schema properties remain constant throughout. 7.5 Tableaux The notion of tableau that we introduce in this section enables us to study properties of functional and multivalued dependencies in a more efficient manner. Let U be a set of relational attributes. For every attribute A U, consider a symbol d A called the distinguished symbol of the attribute A and a set V A = {n A 0,nA 1,...} of nondistinguished symbols. We refer to the set D A = {d A } V A as the pseudodomain of the attribute A. We assume that if A A, then D A D A =. The set D A is equipped with an order relation whose diagram is given in Figure 7.1: d A < n A 0 < n A 1 <. The notion of tableau is very similar to the notion of table. The major difference between tables and tableaux is that the values that occur in tableaux belong to the pseudodomains of the attributes rather than to their domains. Definition 7.5.1 A tableau is a triple θ = (T, H, σ), where T is a symbol called the tableau name, H = A 1... A n is a set of relational attributes called the heading of θ and denoted by heading(θ), and σ is a relation, σ D A1 D An called the extension of θ.

296 CHAPTER 7. FUNCTIONAL DEPENDENCIES.. n A 1 n A 0 d A Figure 7.1: Partial Order on the Set D A Note that no symbol, distinguished or nondistinguished, may occur in more that one column of a tableau. The set of symbols that occur in a tableau θ is denoted by VAR(θ). Example 7.5.2 The triple θ = (T, ABCD, σ) given by T A B C D is a tableau. d A d B d C n D 0 n A 1 d B d C d D d A n B 2 n C 3 d D Definition 7.5.3 A valuation is a mapping v : D U {Dom(A) A U} such that s D A implies v(s) Dom(A), for every symbol s D A and every A U. We assume that valuations are extended from symbols to rows componentwise and, then, to the relations of tableaux, row by row, as shown in the next example. Example 7.5.4 Let v a valuation such that v(d A ) = a 0 v(n D 0 ) = d 1 v(d B ) = b 1 v(n A 1 ) = a 1 v(d C ) = c 0 v(n B 2 ) = b 2 v(d D ) = d 2 v(n C 3 ) = c 1

7.5. TABLEAUX 297 The image of the tableau θ defined in Example 7.5.2 under the valuation v is the table: T A B C D a 0 b 1 c 0 d 1 a 1 b 1 c 0 d 2 a 0 b 2 c 1 d 2 We denote the table that results from the application of the valuation v to the tableau θ = (T, H, σ) by v(θ), where v(θ) = (T, H, v(σ)). Every tableau θ = (T, H, σ) that has a distinguished symbol in every column generates a function Φ θ that transforms a table in T (H) into another table in T (H) using the following definition. Definition 7.5.5 Let θ = (T, H, σ) be a tableau that has a distinguished symbol in every column. Assume that H = A 1... A n. A valuation v : D U {Dom(A) A U} is based on a tuple (a 1,...,a n ) tupl(h) if v(d Ai ) = a i for 1 i n. Since a valuation based on (a 1,..., a n ) depends only on the valued assigned to the specified distinguished symbols, many quite different valuations may be based on (a 1,..., a n ). Definition 7.5.6 Let θ = (T, H, σ) be a tableau, and let τ = (T, H, ρ) be a table. The relation ρ θ rel(h), given by ρ θ = {(a 1,..., a n ) there exists v that is based on (a 1,...,a n ) such that v(σ) ρ}, defines the mapping Φ θ : T (H) T (H) given by Φ θ (τ) = (T θ, H, ρ θ ). Here T θ is simply a symbol used to name the new table. Note that Φ θ (τ) is always defined, since, for every table τ, there exist only a finite number of tuples (a 1,...,a n ) such that v(σ) ρ for some valuation that is based on (a 1,..., a n ). Note also that Φ θ (ρ) is empty only if ρ =. Example 7.5.7 Consider the table τ = (T, ABCD, ρ) given by

298 CHAPTER 7. FUNCTIONAL DEPENDENCIES T A B C D a 1 b 2 c 1 d 1 a 1 b 1 c 0 d 0 a 1 b 2 c 0 d 1 a 2 b 2 c 1 d 0 a 2 b 2 c 0 d 1 a 2 b 1 c 1 d 1 A valuation v can map d A to either a 1 or a 2 ; similarly, d B can be mapped to b 1 or b 2, etc. Therefore, there are at most 16 rows (v(d A ), v(d B ), v(d C ), v(d D )) on which a valuation can be based. If θ is the tableau defined in Example 7.5.2, the reader can easily verify that the table Φ θ (τ) = (T θ, ABCD, ρ θ ) is: T θ A B C D a 1 b 2 c 1 d 1 a 1 b 1 c 0 d 0 a 1 b 2 c 0 d 1 a 2 b 2 c 1 d 0 a 2 b 2 c 0 d 1 a 2 b 1 c 1 d 1 a 1 b 2 c 1 d 0 a 2 b 2 c 1 d 1 Clearly, every row of τ generates a family of valuations based on that row such that the image of the tableau θ under any of these valuations is included in τ. Therefore, ρ ρ θ. 7.5.1 Project-Join Mappings and Tableaux Tableaux provide an alternate way of studying properties of project-join mappings that allows us to determine easily whether tables of certain schemas have information lossless decompositions. Definition 7.5.8 Let H = (H 1,...,H k ) be a sequence of subsets of H such that H = {H i 1 i k}. A tableau that describes the sequence H is a tableau θ H = (T, H, σ H ), where the relation σ H = {t 1,..., t k }, and t i is given by { d A j if A t i [A j ] = j H i a nondistinguished symbol otherwise

7.5. TABLEAUX 299 for 1 j n. Example 7.5.9 Let H = ABCD, and let H = (AB, BC, ACD) be a decomposition. A tableau θ H is given by T A B C D d A d B n C 0 n D 0 n A 0 d B d C n D 1 d A n B 0 d C d D Theorem 7.5.10 Let H = (H 1,...,H k ) be a sequence of sets of attributes. The project-join mapping pj H equals Φ θh, where θ H = (T, H, σ H ) is the tableau of the sequence H and H = {H i 1 i k}. Proof. We must prove that pj H (ρ) = Φ θh (ρ) for every τ = (T, H, ρ) T (H), where H = {H i 1 i k} = A 1... A n. Let t = (a 1,..., a n ) pj H (ρ). There exist k tuples t 1,..., t k such that t l ρ[h l ] and t[h l ] = t l for 1 l k. In turn, this implies that there exist u 1,..., u k ρ such that t[h l ] = u l [H l ] for 1 l k. Suppose that σ H, the set of rows of θ H consists of w 1,..., w k, where w l represents the set H l for 1 l k. Consider a valuation v such that v(d Ai ) = t[a i ] for 1 i n, and v(n Aq ) = u p [A q ] if the nondistinguished symbol n Aq occurs in the p-th row under the attribute A q in the tableau θ. The image of the row w l under the valuation v is the tuple u l of ρ. Indeed, consider the component w l [A q ] of the row w l of θ H. If w l [A q ] is the distinguished symbol d Aq, then A q belongs to H l, and v(w l [A q ]) = t[a q ] = t l [A q ] = u l [A q ]. On the other hand, if w l [A q ] is a nondistinguished symbol, then v(w l [A q ]) = u l [A q ], so in any case, v(w l ) = u l. Therefore, v(ρ 0 ) ρ, so (a 1,...,a n ) Φ θh (ρ). Conversely, let t = (a 1,..., a n ) Φ θh (ρ). There exists a valuation v such that v(d A q ) = a q for 1 q n, and v(σ H ) ρ. Let u l ρ be the image of the row w l of σ H under v. Observe that w l contains distinguished symbols for all attributes A q H l, so u l [A q ] = a q for every attribute A q H l. Therefore, we have t[h l ] = u l [H l ] for 1 l k, which implies that t pj H (ρ). Theorem 7.5.11 Let H = A 1...A k be a finite set of attributes, and let H = (H 1,..., H k ) be a sequence of subsets of H such that {H i 1 i k} = H. The following three statements are equivalent: (i) the set H occurs in the sequence H,

300 CHAPTER 7. FUNCTIONAL DEPENDENCIES (ii) pj H (ρ) = ρ for every relation ρ rel(h), and (iii) the tableau θ H contains a row of distinguished symbols. Proof. (i) implies (ii). If H occurs in H, then for any subset H i of H that occurs in H we have ρ[h i ] ρ[h] = ρ[h i ] ρ ρ. Therefore, using the idempotence, commutativity, and associativity of join, we obtain pj H (ρ) = (ρ[h 1 ] ρ) (ρ[h k ] ρ) ρ ρ = ρ. The reverse inclusion, ρ pj H (ρ), holds by Theorem 6.3.7. Consequently, pj H (ρ) = ρ. (ii) implies (iii). Suppose that pj H (ρ) = ρ for every relation ρ rel(h). Note that the satisfaction of the equality pj H (ρ) = ρ does not depend on the actual domains of the attributes in H. Therefore, pj H (σ H ) = σ H. Let r 0 be a row on A 1,..., A n defined by r 0 [A i ] = d Ai for 1 i k. If σ H = {r 1,...,r k }, note that r 0 [H i H j ] = r i [H i H j ] = r j [H i H j ] for every i j, 1 i, j k because all these projections consist of distinguished symbols. So, the tuples r 1,...,r k are joinable and their join is r 0. Thus r 0 pj H (σ H ), so r 0 σ H. (iii) implies (i). This implication is immediate in view of the definition of θ H. 7.5.2 Tableaux and Functional Dependencies In this section we show that tableaux provide an alternative to inference rules for finding the logical consequences of a set of functional dependencies. Since tableaux are tables over attributes whose domains have been replaced by pseudodomains, constraints may be applied to tableaux just as they are applied to tables. We denote by TX(H) the set of tableaux whose heading is H. If S = (H, Γ) is a table schema we denote by SATX (S) (or by SATX (H, Γ)) the set of all tableaux that have the heading H and satisfy all constraints of Γ. Recall that Theorem 7.4.5 states that for every set of functional dependencies there exists an equivalent set of functional dependencies that have exactly one attribute in their right member. For the remainder of this section we use only sets of functional dependencies in which each right member consists of one attribute. Definition 7.5.12 Let θ = (T, H, σ) be a tableau, and let X A be a functional dependency such that X H and A H. A violation of X A by θ is a 4-tuple (X, A, u, v), where u, v are rows of θ such that u[x] = v[x] and u[a] v[a].

7.5. TABLEAUX 301 T A B C D d A n B 0 n C 0 n D 0 d A n B 1 n C 0 n D 1 n A 1 n B 0 n C 1 n D 2 n A 2 n B 1 n C 1 n D 3 Figure 7.2: The Tableau θ = (T, ABCD, ρ) The tableau obtained from θ by reducing the violation (X, A, u, v) of X A is the tableau θ obtained from θ by replacing every occurrence of the larger of the symbols u[a], v[a] in the A-column of θ by the smaller one. If θ is obtained from θ through the reduction of a violation of a functional dependency from F we write θ F θ. Note that if θ is obtained from θ by reducing a violation of a functional dependency the number of distinct symbols of θ is strictly smaller than the similar number for θ. Also, the number of rows of θ is less or equal than the number of rows of θ. If θ 0, θ 1,...,θ q is a sequence of tableaux such that θ i θ i+1 for 0 F i q 1, then we write θ 0 q F θ q. If θ 0 F θ, we have θ = θ. Also, we write θ F θ if there exists q 0 such that θ q F θ. Example 7.5.13 Let θ = (T, ABCD, ρ) be the tableau given in Figure 7.2. and let F = {A B, BC D}. Note that θ contains no violation of BC D and that the first two rows of the tableau violate the functional dependency A B. If we reduce the violation A B, the resulting tableau θ 1 = (T 1, ABCD, ρ 1 ) is shown in Figure 7.3. The substitution of n B 1 by nb 0 affects not only the second, but also the forth row. The tableau θ 1 violates BC D. By reducing the violation involving the first two rows we obtain the tableau θ 2 = (T 2, ABCD, ρ 2 ) given in Figure 7.4. A new reduction of the same violation gives the tableau shown in Figure 7.5. Definition 7.5.14 A containment mapping between the tableaux θ and θ is a mapping f : D U D U such that every row of θ is mapped into a row of θ, and f(s) s for every s D U.

302 CHAPTER 7. FUNCTIONAL DEPENDENCIES T 1 A B C D d A n B 0 n C 0 n D 0 d A n B 0 n C 0 n D 1 n A 1 n B 0 n C 1 n D 2 n A 2 n B 0 n C 1 n D 3 Figure 7.3: The Tableau θ 1 = (T 1, ABCD, ρ 1 ) T 2 A B C D d A n B 0 n C 0 n D 0 n A 1 n B 0 n C 1 n D 2 n A 2 n B 0 n C 1 n D 3 Figure 7.4: The Tableau θ 2 = (T 2, ABCD, ρ 2 ) T 3 A B C D d A n B 0 n C 0 n D 0 n A 1 n B 0 n C 1 n D 2 n A 2 n B 0 n C 1 n D 2 Figure 7.5: The Tableau θ 3 = (T 3, ABCD, ρ 3 )

7.5. TABLEAUX 303 Containment mappings are extended to rows componentwise, and then, to sets of rows, elementwise. If θ, θ, θ are tableaux in TX(H) and f, g are containment mappings between θ, θ and θ, θ, respectively, then it is easy to verify that gf is a containment mapping between θ and θ (cf. Exercise 25. Note that if θ is obtained from θ by reducing a violation of a functional dependency, then there exists a containment mapping from θ to θ such that for every row t of θ we have t = f(t) for some row t of θ. We discuss an algorithm whose input is a table schema with functional dependencies S = (H, F) and a tableau θ and whose output is a tableau θ F that satisfies all functional dependencies of F such that a containment mapping exists from θ to θ F. The action of the algorithm consists of chasing violations of functional dependencies of F and successively reducing these violations. The algorithm is named the Chase Algorithm for Functional Dependencies. This algorithm is extremely important because, among other things, it can be used to determine whether a functional dependency φ is a logical consequence of a set F of functional dependencies without using inference rules or closures. Briefly, a tableau based on φ is created and the functional dependencies of F are chased on the tableau; the form of the resulting tableau determines whether or not F = φ. This is presented in detail in Theorem 7.5.21. This algorithm can also be used to ascertain whether the tables of SAT (H, F) have H as a lossless decomposition, by chasing the functional dependencies of F on the tableau θ H and by examining the resultant tableau (θ H ) F (see Theorem 7.5.20). There is a rich literature of other uses of the Chase Algorithm. Algorithm 7.5.15 The Chase Algorithm for Functional Dependencies Input: A table schema with functional dependencies S = (H, F) and a tableau θ. Output: A tableau θ F that satisfies all functional dependencies of F such that a containment mapping exist from θ to θ F. Method: Construct a sequence of tableaux θ 0,...,θ i, θ i+1,... defined by: Stage 0: Stage i + 1: θ 0 := θ θ i+1 is obtained from θ i by reducing a violation of a functional dependency from F if such a violation exists in θ i ; otherwise, that is, if no violation exists, stop, and let θ F = θ i.