See Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [ 1987].

Size: px
Start display at page:

Download "See Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [ 1987]."

Transcription

1 A Normal Form Redundancy in for Precisely Characterizing Nested Relations WAI YIN MOK, YIU-KAI NG, and DAVID W. EMBLEY Brigham Young University We givea straightforward definition for redundancy in individual nested relations and define a new normal form that precisely characterizes redundancy for nested relations. We base our definition of redundancy on an arbitrary set of functional and multivalued dependencies, and show that our definition of nested normal form generalizes standard relational normalization theory. In addition, we give a condition that can prevent an unwanted structural anomaly in nested relations, namely, embedded nested relations with at most one tuple, Like other normal forms, our nested normal form can serve as a guide for database design. Categories and Subject Descriptors: H.2. 1 [Databaae Management]: Logical Desigr-data models% normal forms General Terms: Design, Theory Additional Key Words and Phrases: Database design, data redundancy, functional and multivalued dependencies, nested normal form, nested relations, normalization theory, scheme trees 1. INTRODUCTION Although normalization theory for flat relations has a long research history, its extension to nested relations is much more recent. Partition Normal Form (F NF ) [Roth et al , which guarantees eqmctid properties for nesting and unnesting and for keys of nested relations, has been well accepted. indeed, nested relations are sometimes defined such that only PNF relations are allowed,l and for Abiteboul and Bidoit [ 1986], the definition predates PNF. A normal form for nested relation schemes that detects potential redundancy and the possible update anomalies that accompany redundancy, however, has not been widely accepted, even though some have been proposed [Ozsoyoglu and Yuan 1987, 1989; Roth and Korth 1987]. Although these earlier proposals provided guidance for the design of nested relation schemes, they did not succeed in precisely characterizing potential See Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [ 1987]. Much of the work on this paper was done while W. Y, Mok was at Hong Kong Polytechnic. Authors address: Department of Computer Science, Brigham Young University, Provo, UT Permissiontomake digital/hard copyofpartorallofthiswork for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. IQ 1996 ACM /96/0300-(3077 $03.50 ACMTransactions on Database Systems, Vol. 21, No. 1, March 1996, Pages

2 78. W. Y. Mok et al, redundancy. In this article we propose a new normal form for individual nested relation schemes that completely characterizes redundancy with respect to any given set of functional dependencies (FDs) and multivalued dependencies (MVDs). The result we present is a generalization of standard relational normalization theory. We proceed as follows. In Section 2, we provide our basic definitions for nested relations. Like Abiteboul and Bidoit [ 1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [1987], we define our nested relations to be in PNF. In Section 2 we also give carefully specified redundancy definitions. As illustrations for our redundancy definitions, we give examples, which we use later to show that none of the earlier definitions fully detects redundancy. In Section 3, we present our definition, which we call NNF (Nested Normal Form). As we illustrate our definition, we also compare it to earlier definitions and show that ours can provide greater flexibility in how attributes may be clustered in nested relation schemes. In Section 4, we present a theorem guaranteeing that NNF detects potential redundancy. In Section 5, we investigate the converse of this theorem. We show that a nested relation scheme that is not consistent with the given set of MVDs and FDs, as we define consistency, is automatically not in our normal form. In addition, we are able to show that if a nested relation scheme is consistent with the given set of MVDs and FDs and there is no potential redundancy, then the nested relation scheme satisfies our definition of NNF. In Section 6, we show that our definition of NNF is a generalization of standard relational normalization theory. In particular, we show that 4th Normal Form (4NF), as defined by Fagin [1977], is a special case of NNF, and that Boyce-Codd Normal Form (BCNF) is also a special case when we limit the dependencies to FDs. Thus, like other normal forms, our definition of NNF can provide a guide to database design. It also has the drawbacks of these other normal forms, and, in this sense, is not a panacea for database design. We therefore comment on what our definition does and does not provide for the designer. In Section 7, we present a condition that can prevent an unwanted structural characteristic of nested relations, which we call singleton buckets because a nested relation represented by a singleton bucket allows at most one tuple. We then prove that this condition does indeed prevent singleton buckets. Although this condition has nothing to do with redundancy, it is in harmony with earlier definitions [Ozsouyoglu and Yuan 1989; Roth and Korth 1987], that also disallow singleton buckets. In Section 8, we present our conclusions. 2. BASIC DEFINITIONS AND PROPERTIES 2.1. Nested Relations A nested relation allows each tuple component to be either atomic or another nested relation, which may itself be nested several levels deep. As in Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [1987, 1989], and Roth and Korth [1987], we are only interested in nested relations that are in PNF. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

3 Redundancy in Nested Relations. 79 Thus, in a nested relation, there can never be distinct tuples that agree on the atomic attributes of either the nested relation itself or of any nested relation embedded within it [Atzeni and DeAntcmellis 1993]. Definition Let U be a set of attributes. A nested relation scheme is recursively defined as follows: (1) If X is a nonempty subset of U, then X is a nested relation scheme over the set of attributes X. (2) If X, Xl,..., Xn are pairwise disjoint, nonempty subsets of U, and R ~,.... R. are nested relation schemes over Xl,,... X. respectively, then X( h!,)*... (R ~)* is a nested relation scheme over XXl... Xm. Definition Let R be a nested relation scheme over a nonempty set of attributes Z. Let the domain of an attribute A G Z be denoted by dom( A). A nested relation ouer R is recursively defined as follows: (1) If R has the form X where X is a set of attributes {Al,..., A.), n > 1, then r is a nested relation ouer R if r is a (possibly empty) set of functions {tl,.... tm) where each function t,, 1 < i < m, maps A, to an element in dom( A,), 1 s j < n. (2) If R has the form X(RI)*...(R~)*. m >1, where X is a set of attributes (A ~,..., A~), n > 1, then r is a nested relation over R if (a) r is a (possibly empty) set of functions {tl,...,tp}where each function t,, 1< i s p, maps Aj to an element in dom( A, ), 1 < j < n, and maps l?h to a nested relation over Rk, 1 < k < m, and (b) t,c r and t, G r and t,(x) = t,(x) implies t,= t,,1< i, j < p. Each function of a nested relation r over nested relation scheme R is a nested tuple of r Example Figure 1 shows a nested relation. Its scheme is Dept Chair ( Prof( Hobby)* (Matriculation(Student( Interest )* )*)*)*, and it contains two nested tuples. As in Abiteboul and Bidoit [ 1986], we draw a bucket for each embedded nested relation. Each bucket also contains nested tuples of ita own. For example, {Young, {Chess, Soccer)) and { Barker, {Skiing)) are nested tuples in the first bucket under the embedded nested relation scheme Student( Interest)*. Notice that, as required, PNF is satisfied. Thus the values for the atomic attributes Dept Chair differ, and in each bucket the atomic values differ. Definition Let R be a nested relation scheme. Let r be a nested relation on R. The total unnesting of r is recursively defined as follows: (1) If R has the form X, where X is a set of attributes, then r is the total unnesting of r. (2) If R has the form X(RI)*...(Rn)*, where X, is the set of attributes in R,, 1 s i s n, then the total unnesting of r = {t] there exists a ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

4 80. W. Y. Mok et al. Dept Chair ( Prof ( Hobby )* (Matriculation Student ( interest )*) ) ) CS Turing Jane I Skiing I L 1 Ph.D. I Young uchess Socoer s r2xal, Pat I Hitdng I I Ph.D. I Lee I Travel I I I ~ Math Polya Steve Ill&-l I s I caner m] I Fig. 1. Nested relation. nested tuple u c r such that t(x) = u(x) and t(xi) is a tuple in the total unnesting of u(l?i ), 1 S i < n.} Definition Let R be a nested relation scheme. Let r be a nested relation on R. Let t be a nested tuple of r. The total unnesting of t is defined as the total unnesting of q, where q is a nested relation containing the single nested tuple t. Example Figure 2 shows the total unnesting of the nested relation in Figure 1. Observe that the first two tuples in the total unnesting contain the total unnesting of the nested tuple (Young, {Chess, Soccer)) Scheme Trees We can graphically represent a nested relation scheme by a tree, called a scheme tree. A scheme tree captures the logical structure of a nested relation scheme and explicitly represents a set of MVDs. Scheme trees have been used for earlier normal form definitions for nested relations [Ozsoyogu and Yuan 1987, 1989; Roth and Korth 1987]. We use them here for the same purpose. Definition A scheme tree T corresponding to a nested relation scheme R is recursively defined as follows: (1) If R has the form X, then T is a single node scheme tree whose root node is the set of attributes X. (2) If R has the form X(RI)*...(R.)*, then the root node of T is the set of attributes X, and a child of the root of T is the root of the scheme tree Ti, where T, is the corresponding scheme tree for the nested relation scheme Ri, 1 s i s n. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

5 Redundancy in Nested Relations. 81 Dept Chair Prof Hobby Matriculation Student Interest Cs Turing Jane Skiing Ph.D. Young Cs Turing Jane Skiing Ph.D. Young Cs Turing Jane Skiing Ph.D. Barker Cs Turing Jane Skiing fd.s. Adams Cs Turing Pat Hiking Ph.D. Lee Math Polya Steve Dance M.S. Carter Math Polya Steve Dance M.S. Carter Math Polya Steve Hiking M.S. Carter Math Polya Steve Hiking M.S. Carter Chess Skiing Skiing Travel Travel Skiing Travel Skiing Fig. 2. Total unnesting of nested relation infig.1. The one-to-one correspondence between a scheme tree and a nested relation scheme along with the definition of a nested relation scheme impose several properties on a scheme tree. Let T be a scheme tree. We denote the set of attributes in T by Aset(T). Observe that the atomic attributes of a nested relation scheme, at any level of nesting, constitute a node in a scheme tree. Observe further that because Definition requires nonempty sets of attributes, every node in T consists of a nonempty set of attributes. Furthermore, because the sets of attributes corresponding to nodes in T are pairwise disjoint and include all the attributes of T, the nodes in T are pairwise disjoint, and their union is Aset(T). Let N be a node in T. Notationly, Ancestor(N) denotes the union of attributes in all ancestors of N, including N. Similarly, Descendant N ) denotes the union of attributes in all descendants of N, including N. In a scheme tree T each edge (V, W ), where V is the parent of W, denotes an MVD Ancestor(V) + Descendant(W). Notationly, we use MVD(T ) to denote the union of all the MVDs represented by the edges in T. By construction, each MVD in MVD(T) is satisfied in the total unnesting of any nested relation for T. Because FDs are also of interest, we use FD( T ) to denote any set of FDs equivalent to all FDs X -+ Y implied by a given set of FDs and MVDs over a set of attributes U such that Aset(T ) c U and XY L Aset(T ). (Note that because a set of FDs F together with a set of MVDs M can imply FDs not implied by F alone, FD(T), in general, is not equivalent to the set of FDs in F whose left-hand side is a subset of Aset (T ) and whose right-hand side is restricted to Aset (T ).) Example Figure 3 shows the scheme tree T for the scheme of the nested relation in Figure 1. Figure 3 also gives the set of attributes in Aset(T ) and the set of dependencies MVD(T). Observe that each of the MVDs in MVD(T ) is satisfied in the unnested relation in Figure Data Redundancy Data redundancy is a concern in database design. Redundant data can lead to higher storage and access cost. It can lead to update anomalies, forcing multiple copies of the same data value to be updated when one copy changes, and it can lead to data inconsistency if all copies do not agree. ACM Transactions on Database Systems, Vol 21, No. 1, March 1996.

6 82. W. Y. Mok et al. T = Dept Chair I Prof /\ Hobby Matriculation I student I Interest Aset(T) = Dept Chair Prof Hobby Matriculation Student Interest MVD(T) = {Dept Chair + Prof Hobby Matriculation Student Interest, Dept Chair Prof + Hobby, Dept Chair Prof + Matriculation Student Interest, Dept Chair Prof Matriculation + Student Interest, Dept Chair Prof Matriculation Student+ Interest) Fig. 3. Scheme tree Z, AaeKZ ), and MVD(Z ) for nested relation scheme in Fig. 1. Except in rare cases, such as Vincent and Srinivasan [1992], papers and textbooks on normalization fail to provide rigorous definitions for redundancy and thus also fail to prove that normalization removes redundancy as expected. Offered instead are motivating examples to illustrate redundancy removal. Thus in the vast body of research literature on normalization, we have mostly only rigorous syntactic justifications for normalization; what we are missing are rigorous semantic justifications. Besides only providing for syntactic characterizations, a danger of not treating redundancy formally is that the examples may be misleading. Indeed, as we show in the following, the definition for 4NF found in most textbooks does not detect potential redundancy for all cases even though some readers of these books are led to believe that it does. In creating definitions for redundancy, we should try to find simple and intuitive characterizations, but creating a simple and intuitive definition for redundancy is more difficult than one might at first think. Any definition will involve a sophisticated statement, and there are many possible approaches one might use. Our notion of redundancy is based on the idea that an atomic value u in a nested or flat relation is redundant if we can erase u, and then from what remains and from a single FD or MVD that holds, determine what u must have been. ACMTransactionsonDatabaseSystems,Vol.21,No. 1, March 1996.

7 Redundancy in Nested Relations. 83 U = {Dept, Chair, Prof, Hobby, Hobby-Equipment, Matriculation, Student, Interest) F = { Student + Matriculation, Student + Prof. Prof + Dept. Dept + Chair) M = ( Student + Interest, Prof + Hobby Hobby-Equipment, Hobby + Hobby-Equipment) Fig. 4. Some given constraints over a set of attribuks. The way we define holds is important. Here, we adapt Fagin s [ 1977] definition, and we explain it thoroughly before proceeding with our definition of redundancy. Definition Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. An MVD X + Y holds for T with respect to M and F if X c Aset(T ) and there exists a set of attributes Z & U such that Y = 2 n Aset(T) and M u F implies X + Z on U. An FD X - Y holds for T with respect to M and F if XY G Aset(T ) and M u F implies X + Y on U. This definition is motivated by the following Lemma, which is Theorem 5 in Fagin [ 1977]. LEMMA Let U be a set of attributes and R G U. Let M be a set of MVDS over U and F be a set of FDs over U. Let X G R, Z c U, and Y= Z~R. If MU Fimplies X+ Zon U,then MU Firnplies X+ Yon R. ROOF. Fagin [1977]. l Example Figure 4 shows a given set of attributes U and a given set of FDs F over U and a given set of MVDs M over U. All the FDs in F in Figure 4 hold for the scheme tree T in Figure 3, as do all the FDs implied by M u F. Not all the MVDs in M hold for T. In particular, neither Hobby + Hobby-Equipment nor Prof + Hobby Hobby-Equipment hold for T. Because Hobby Hobby-Equipment ~ Aset (T) = Hobby, however, Prof + Hobby does hold for T. Although Prof + Hobby holds for T, obsewe that it is not implied by Mu Fon U. As illustrated in Example 2.3.1, certain MVDs hold for a relation scheme even when they are not implied by a given set of FDs and MVDs. It is those that hold that are of interest to us. We now return to our task of defining redundancy. Because our definition depends on the validity of a nested relation, however, we must first define what it means for a relation to be valid for a given set of FDs and MVDs. Definition Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that ACM Transactions on Database Systems, Vol 21, No. 1, March 1996.

8 84. W. Y. Mok et al. Aset(Z ) c U. Let r be a nested relation on T. Nested relation r is valid with respect to lkf U 1 if in the total unnesting of r, every FD and every MVD that holds for T with respect tq M and F is satisfied. We now define redundancy. The definition has two parts: FD redundancy and MVD redundancy. Definition Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) G U. Let XY G Aset(T), and let X-* Y be an FD or an MVD that holds for T with respect to M and F and has an attribute A Y and A 6X. Let r be a nonempty nested relation on T that is valid with respect to M U F. Let S be a subtree of T that contains A as an atomic attribute, and Ietsl,..., Sn be the nested relations over S in r. Let u~ and u~ be distinct nested tuples of Si and Sj, respectively, 1 s i, j s n, such that Ul( A) = U, U2(A) = u, and u = u. (Nots that i = ~ is possible so that si and sj may either be the same nested relation under S or may be in different nested relations under S.) Let tl and t~ be distinct tuples in the total unnesting of r such that tl( Aset(S)) and tz( Aset(S )) are tuples in the total unnesting of U1 and u~, respectively. FD redundancy, when X-* Y is X + Y: If tl(x) = tz(x), then atomic value v is a redundant atomic value in r caused by X + Y. MVD redundancy, when X-* Y is X + Y: If tl(x) = tz(x), tl(y) = tz(y), and tl(z) # tz(z) where Z = Aset(T) (xy), then atomic value v is a redundant atomic value in r caused by X * Y. Example Let Student + Dept and consider the nested relation and its total unnesting in Figure 5. Both appearances of Math are redundant in both the nested and unnested relation. We can see this formally as follows. Let t~ be the third tuple and tz be the last tuple in the unnested relation. Now we have tjstudent) = tjstudent), and thus Math in the third tuple of the unnested relation is redundant. Because Math in the third tuple of the unnested relation comes from the first nested tuple in the nested relation, Math in the first nested tuple of the nested relation is redundant. By reversing t~ and t~, we can see formally that Math in the second nested tuple of the nested relation and in the last tuple of the unnested relation are also redundant. It is possible for a value not tq be redundant in a nested relation and yet be redundant in the total unnesting of the relation. h-ideed, this is often the reason we create nested relations h remove redundant values. Example Suppose Student + Interest and we allow students to have multiple majors. Now consider the nested relation and its total unnesting in Figure 6. Observe that Skiing is redundant in the unnested relation. However, in the nested relation, Skiing is not redundant because it appears only once. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

9 Redundancy in Nested Relations. 85 Student + Dept Interest (Dept (Student)*)* Skiing s w at 15!!ll Travel Cs I Lee I I I I Fig.5. Redundancy caused byan FD. Interest Dept Student Skiing CS Barker Skiing Cs Adams skiing Math Carter Travel s Lee Travel Math Carter Student + Interest Interest ( Dept (Student )*)* Skiing w Travel ~ Fig.6. Elimination ofredundancy by nesting. Interest Dept Student Skiing CS Barker Skiing CS Adams Skiing Math Barker Travel Math Catter ACM Transactions on Database Systems, Vol. 21, No. l, March 1996

10 86. W. Y. Mok et al. Example Let Prof + Dept Student and Prof + Hobby Hobby- Equipment and consider the flat relation in Figure 7. Because the scheme of the relation in Figure 7 is Prof Student Hobby and Prof + Dept Student and Prof + Hobby Hobby-Equipment, by Lemma Prof + Student and Prof + Hobby hold for the scheme Prof Student Hobby. But now all the data values under Student and Hobby are redundant, as can be seen formally by appropriately picking two distinct tuples and choosing which attribute and value to consider. For example, let tl be the first tuple and t~ be the second tuple, then Young in t~ is redundant because tj Prof ) = tjprof ), tl(student) = tz(student), and tl(hobby) # tjhobby). As an aside, we observe here that by the common definition of 4NF found in most textbooks (e.g., [Korth and Silberschatz 1991; Maier 1983]) the relation scheme in Figure 7 is in 4NF. This is because no nontrivial MVD, given or implied, applies to the scheme, where applies means that the set of attributes that constitutes the MVD is a subset of the scheme. In particular for Example 2.3.4, neither Prof + Student nor Prof + Hobby is implied by or is in the given set of MVDs {Prof + Dept Student, Prof + Hobby Hobby-Equipment). According to the original definition given by Fagin [1977], however, the relation scheme in Figure 7 is not in 4NF. Fagin s definition not only considers all nontrivial MVDs that are given or implied (without regard to the scheme under consideration), but also the MVDs that hold when the scheme is considered. Example To show an example of a nested relation (with embedded relations) that has redundancy caused by an MVD, we present one more example of redundancy. Let U = { Prof, Article-Title, Publication-Location) and let Prof - Article-Title and Article-Title + Prof be the MVDs. (Note that Example 2 in Beeri and Kifer [1986] has exactly the same characteristics). Consider the nested relation and its total unnesting in Figure 8. Based on the MVD Article-Title + Publication-Location, which holds for the nested relation scheme in Figure 8, all the values under Publication-Location in the nested relation are redundant. We can see formally, for example, that the last Hong Kong value under (Publication-Location)* is redundant by letting t~ be the last tuple and t~ be the 4th tuple in the unnested relation. Thus t1(article-title) = t2(article-title), t ~(Publication-Location) = t2(publica - tion-location), and tl(prof) + tjprof). Definition tells us what it means for an individual atomic value to be redundant in a nested relation for an FD or MVD that holds. Our next definition ties together the notion of a redundant data value in a nested relation and the notion of a nested relation scheme that permits valid nested relations that contain redundancy. It is this definition that allows us to later show that our normal form definition detects redundancy. Definition Let U be a set of attributes. Let M be a set of ~s over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. T is said to have potential redundancy with respect to M U F if there exists a redundant atomic value in any valid nested relation for T caused by either an FD or an MVD that holds for T with respect to M and F. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

11 Redundancy in Nested Relations. 87 Prof + Prof + Dept Student Hobby Hobby-Equipment Prof Student Hobby Fig. 7. A flat relation with redundancy Jane Young Reading Jane Young Skiing Jane Barker Reading Jane Barker Skiing Prof ~ Article-Title Article-Title + Prof Prof ( Article-Tfile )* (Publication-Location)* Steve Pat lx&xn&lw Programming in Ada W&&!9_l Prof Steve Steve Steve Steve Pat Pat Article-Tfile Programming in C++ Programming in Ada Programming in C++ Programming in Ada Programming in Ada Programming in Ada Publication-Location USA USA Hong Kong Hong Kong USA Hong Kong Fig.8. Redundancycausedby a MVD Example 2.3.6, Because the nested relations in Figures 5, 7, and 8 all have redundancy, the nested relation schemes in Figure 5, 7, and 8 are all said to have potential redundancy with respect to the FDs and MVDs given for the examples. 3. NESTED NORMAL FORM we motivate the need for a new normal-form definition for nested relations by making certain observations about the examples we have presented. First, if we are given the FDs and MVDs in Figure 4, none of the earlier normal-form definitions [Ozsoyoglu and Yuan 1987, 1989; Roth and Korth 1987] allow the ACM Transactionson DatabaseSystems,Vol 21, No. 1, March 1996.

12 88. W. Y. Mok et al. scheme of the nested relation in Figure 1, which is also equivalent to the scheme tree T in Figure 3. They therefore do not allow the nested relation in Figure 1 even though it is a good clustering for this application and has no redundancy. For a scheme tree T to satisfy the earlier normal-form definitions, T must satis& four conditions. It turns out that T in Figure 3 does not satisfy the fourth condition for any of these previous definitions. One requirement of the fourth condition insists that the root of a scheme tree be the left-hand side of a reduced nontrivial MVD, but all (implied) MVDs with Dept Chair as the left-hand side are trivial. In fact, this is not the only violation. In particular, Matriculation cannot be an inner node of T in Figure 3. For the definitions in Ozsoyoglu and Yuan [1987, 1989], there are even partial MVDs in T because of the edges (Prof, Hobby) and (Student, Interest). Because there are unnecessary conditions in these previous normal form definitions, they all restrict attribute clustering and design flexibility, as these examples show. In fact, these conditions can lead to unnecessary decompositions of schemes. Second, all the earlier definitions [Ozsoyoglu and Yuan 1987, 1989; Roth and Korth 1987] allow the scheme of the nested relation in Figure 8, but as pointed out in Example 2.3.5, the nested relation has redundancy. We can see that the earlier definitions allow the scheme of the nested relation in Figure 8 as follows. Let T be the scheme tree for the scheme of the nested relation in Figure 8, and assume we are given the set of MVDs, M = {Prof + Article- Title, Article-Title + Prof }, and the empty set of FDs. When there are no FDs, all three earlier definitions are equivalent. Now observe that MVD(T) = {Prof + Article-Title, Prof + Publication-Location}. Because ill implies MVD(T), the first condition of their definitions is satisfied. Because M does not imply an MVD with a left-hand side that is a proper subset of Prof, T has no partial MVDs, and thus their second condition is satisfied. Article-Title + Prof, T has no transitive MVDS, and thus their third condition is satisfied. Each node in the scheme tree for Figure 8 is a single attribute, therefore there can be no decomposition of nodes, and thus their fourth condition is satisfied. We now give our definition for Nested Normal Form. Definition 3.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) G U. T is in Nested Normal Form (NNF) with respect to M u F if the following conditions are satisfied. (1) If D is the set of MVDs and FDs that hold for T with respect to M u F, then D is equivalent to MVD(T) u FD(T) on Aset(T). (2) For each nontrivial FD X - A that holds for T with respect to M u F, X + Ancestor(N~) also holds with respect to M u F, where NA is the node in T that contains A. Example 3.1. Suppose we are given U, F, and M as in Figure 4. Then the scheme tree T in Figure 3 is in NNF. We can see this from our definition as follows. We have observed in Example that Hobby + Hobby-Equipment does not hold for Aset(T). The set of MVDs and FDs that do hold for T is ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

13 Redundancy in Nested Relations. 89 equivalent to F u {Student + Interest, Prof + Hobby} considered on Aset(T ). This set is thus equivalent to D in the NNF definition. MVD(T ) is the set of MVDS in Figure 3, and we can let F in Figure 4 be FD( T ). We can convince ourselves that Condition 1 is satisfied by applying a few standard MVD and FD derivation rules. For example, we can derive Prof + Hobby from MVD(T) and FD(T ) by using the FDs in FD(T) to obtain Prof + Dept Chair Prof, converting this derived FD into an MVD, and then applying transitivity with Dept Chair Prof + Hobby in MVD(T) to obtain Prof + Hobby. To convince ourselves that Condition 2 is satisfied, we consider Student + Matriculation, which holds for T, and observe that Ancestor( Matriculation) = Matriculation Prof Dept Chair and that Student + Matriculation Prof Dept Chair is implied. Hence Student + Ancestor( Matriculation). We similarly check each nontrivial FD in FD( T), which is suficient to ensure that Condition 2 is satisfied. Example 3.1 not only illustrates NNF in a nontrivial case, but also shows that our definition accepts the nested relation scheme in Figure 1, which we consider to be good, but which is rejected by all the earlier definitions as previously explained. We now continue by giving two more examples, one that violates Condition 1 of NNF and one that violates Condition 2. Our example that violates Condition 1 also shows that NNF detects the problem of the nested relation scheme in Figure 8. It therefore recognizes the scheme that allows redundancy, but is not detected by the earlier definitions. Example 3.2. Let U = {Prof, Article-Title, Publication-Location), M = {Prof + Article-Titie, Article-Title + Prof}, and F = 0. As in Figure 8, let T be Prof( Article-Title)* (Publication-Location)*. T does not satise Condition 1 because MVD(T) U FD(T), which is {Prof * Article-Title, Prof + Publica - tion-location), is not equivalent to the set of FDs and MVDs that hold for T. For example, we cannot derive Article-Title + Prof from {Prof + Article- Title, Prof + Publication-Location}. Thus Condition 1 is not satisfied. Example 3.3. Let U = {Interest, Dept, Student), M = 0, and F = {Student + Dept). As in Figure 5, let T = Interest(Dept(Student)* )*. T does not satisfy Condition 2 because Student + Dept is a nontrivial FD that holds for T and Ancestor( Dept ) = Dept Interest, but Student + Dept Interest. 4. NNF AND POTENTIAL REDUNDANCY In this section we prove one of our main results, In particular, we prove that a nested relation whose scheme is in NNF for a given set of FDs and MVDs cannot have redundancy with respect to the given FDs and MVDs. Many of the lemmas here depend on a set of FD and MVD derivation rules. We use the following rules, where X, Y, Z, V, W, and Z are all subsets of a set of attributes R: FD derivation rules: F1: (reflexivity) Y c X implies X + Y. F2: (augmentation) X + Y and V c W imply XW + YV. F3: (transitivity) X + Y and Y + Z imply X + Z. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

14 90. W. Y. Mok et al. MVD derivation rules: Ml: (reflexivity) Y c X implies X ~ Y. M2: (augmentation) X a Y and Z c Z imply X2 -B YZ. M3: (transitivity) X ~ Y and Y a Z imply X -P Z Y. M4: (complementation) X ~ Y implies X ~ R (xy). M5: (trivial complementation) X -+ R X. Combined FD and MVD derivation rules: Cl: (replication) X + Y implies X a Y. C2: (coalescence) X ~ Y, Z a W, W c Y, and Y n Z = 0 imply X + W. These FD and MVD derivation rules are sound and complete [Beeri et al. 1977], but not minimal. Indeed, part of what we show is that Ml (reflexivity) is not needed so that without it the derivation rules are sound and complete. The more common choice, of course, is to retain Ml and omit M5. For our proofs about scheme trees, however, it is often required that our MVDs stretch from root to leaf. We therefore use the alternative choice for trivial MVDs. Because this choice is not common, we prove in Lemma 4.1 that this is possible. In addition, we add a corollary that tailors the lemma only for the case of MVDs. LEMMA 4.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let Z -B W be an MVD implied by M U F on U. There exists an (M u F)-based derivation sequence for Z -B Won U that uses only FI F3, M2 M5, and C1 C!2. ~OOF. Because Z ~ W is implied by M U F on U and the derivation rules F1 F3, M 1 M5, C1, and C2 are sound and complete, there exists a derivation sequence S for Z - W on U using these rules. If S does not include Ml, we are done. Otherwise, we replace each usage of Ml as follows: X ~ Y, by Ml (reflexivity) where Y ~ X, by the following sequence of derivation rules: X ~ Y, by F1 (reflexivity) because Y c X. X ~ Y, by C 1 (replication). COROLLARY. If F = 0, there exists an M-based derivation sequence for Z + W that uses only the MVD rules M2 M5. ~OOF. Because M1 M4 are sound and complete when no FDs are given, there exists a derivation sequence S for Z -B W that uses only M1 M4. If S does not include Ml, we are done. Otherwise, we replace each usage of Ml by the following sequence of derivation rules: X ~ R X, by M5 (trivial complementation). X ~ R (X(R X)), by M4 (complementation). X ~ 0, because R (X(R X)) = 0. XY + Y, by M2 (augmentation). X ~ Y, because Y c X. l ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

15 Redundancy in Nested Relations. 91 Lemma 4.2 guarantees us that if an attribute of a node N in a scheme tree is in the right-hand side, but not the left-hand side, of a nontrivial MVD and if the MVD is implied by the MVDs of the scheme tree, then all the attributes in N are included in the MVD. LEMMA 4.2. Let U be a set of attributes. Let T be a scheme tree such that Aset(T) & U. Let XY G Aset(T). Let X + Y be an MVD in MVD(T) on Aset(T) such that A G Y and A E X. Let A be in node N of T, then N c XY. WOOF. Because X ~ Y is an MVD in MVD(Z ) on Aset(Z ) and MVD(T ) consists only of MVDS, there exists an ikfvd(t)-based derivation sequence S for X * Y on Aset(Z ), that by the Corollary to Lemma 2, uses only the MVD rules M2 M5. We show by induction on the number of MVDs n in S that for every MVD X + Y in S, if A is an attribute in node N of T such that A ~ Y and then N GX Y, Thus because X + Y is in S, N cxy. Basis: Suppose n = 1,because only rules M2-M5 are used and M2-M4 require antecedents, X + Y is either given or introduced by M5 (trivial complementation). If X ~ Y is given, then X + Y= iwyd(t), and thus Y zn. If X + Y is introduced by M5 (trivial complementation), XY = Aset (7 ), and thus N c XY. Induction: X + Y can be introduced by any of the MVD derivation rules M2-M5 or as a given MVD in MVD(T), and therefore we have five cases to consider. Because the cases for given MVDs and M5 (trivial complementation) have no antecedents, they are the same as in the basis. Therefore, we can reduce the cases to three. (1) X ~ Y is introduced by M2 (augmentation). Let V + W be the antecedent MVD and let Z G Z such that X = VZ and Y = WZ. If A Y X, then because Z c Z, A 6 W and V. By the induction hypothesis, N c VW and thus N G X Y. (2) X ~ Y is introduced by M3 (transitivity). Let V - W and W + Z be the antecedent MVDs. Thus X = V and Y = Z W. Now assume there exists an attribute A in node N of T such that A e Y and A E X, but N ~ X Y. Then there exists an attribute B such that B 6 N and X Y. Because X Y and X = V and Y = Z W, V and either B G W or B ~Aset(T) (VWZ). Suppose B E W, then because B G W and V, by the induction hypothesis, N c VW. A N, therefore A G VW. But because X and X =V, AEV; and because A~Y and Y =Z W, AEW. Thus VW. We therefore suppose that B G Aset(T ) (VWZ ). But now we have A G Y, Y = Z W, and therefore A c Z, A E W, and A = N. Therefore, by the induction hypothesis, we have N L WZ. Because B G N, B = WZ. However, B G Aset(T) ( VWZ) and thus, B # WZ a contradiction. (3) X ~ Y is introduced by M4 (complementation). Let V ~ W be the antecedent MVD. Thus X = V and Y = Aset(T) (VW). Now assume there exists an attribute A in node N of T such that A G Y ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

16 92. W. Y. Mok et al. and X, but N ~ X Y. Then there exists an attribute B such that B N and B Q!X Y. Hence B E V(Aset(T) (VW)) and thus B E W and B E V. Because B N, by the induction hypothesis, N G VW. A G N, therefore, A e VW. However, because A ~ Y and Y = Aset(T) (VW), VW-a contradiction. Lemma 4.3 extends Lemma 4.2 to not only guarantee us that a node is included, but also that all ancestors and descendants of the node are included. That is, Lemma 4.3 guarantees us that if an attribute of a node in a scheme tree is in the right-hand side of a nontrivial MVD (but not in the left-hand side), and if the MVD is implied by the MVDs of the scheme tree, then both the ancestors and the descendants of the node are included in the MvD. LEMMA 4.3. LQt U be a set of attributes. Let T be a scheme tree such that Aset(T) c U. Let XY c Aset(T). Let X + Y be a nontrivial MVD in MVD(T)+ on Aset(T). Let A be an attribute such that A = Y and A z X, and let A be in node N of T. Then both Ancestor(N) G XY and Descendant(N) c XY. ROOF. As in Lemma 4.2, we show by induction on the number of MVDs n in the &fvd(t)-based derivation sequence S for X a Y on Aset (T ) that for every MVD X ~ Y in S if A is an attribute such that A G Y and X, and if A is in node N of T, then both Ancestor(N) c X Y and Descendants(N) G X Y. Basis: Suppose n = 1. Because S has no MVD introduced by Ml (reflexivity) and there is only one MVD X ~ Y in S, X + Y is given or is introduced by M5 (trivial complementation). If X ~ Y is given, then X ~ Y G MVD(T). As argued in Lemma 4.2, Y 2 N, and since X ~ Y = MVD(T), therefore both Ancestor(N) c XY and Descendant(N) c XY. If X ~ Y is introduced by M5 (trivial complementation), XY = Aset(T). Hence every node is a subset of XY, and thus, both Ancestor(N) G XY and Descendant(N) c XY. Induction: As in Lemma 4.2, we have only three cases to consider. (1) X ~ Y is introduced by M2 (augmentation). The argument is similar to the proof of Case 1 in Lemma 4.2. (2) X ~ Y is introduced by M3 (transitivity). Let V + W and W + Z be the antecedent MVDs. Thus X = V and Y = Z W. Let A be an attribute in node N of T such that A G Y and A G X. Hence, by Lemma 4.2, N G X Y. We claim that Ancestor(N) g X Y. If not, then there exists an attribute B G Ancestor(N) such that X Y. Because X Y and X = V and Y = Z W, V(Z W). Thus V and either B G W or B G Aset(T) (VWZ). We first assume that B G W. Because B g V and B G W, by Lemma 4.2, B is in a node N such that N G VW. B G Ancestor(N) and B G N, thus N G Descendant(N ) and A G ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

17 Redundancy in Nested Relations. 93 Descendant(i V ). By the induction hypothesis, Descendant(N ) c VW. Thus A G Descendant(N ), and therefore A G VW. But because A~Y and Y =Z W, A@ W,and A@ Xand X = V, A E V. Thus A E VW-a contradiction. Thus we assume that B = Aset (T) (VWZ ). Because A G Y and Y = Z W, A = Z and A g W. Thus by the induction hypothesis, Ancestor(N) c WZ, 23 = Ancestor(N) and Ancestor(N) c WZ, and therefore B E WZ. Thus Aset(Z ) ( VWZ ) a contradiction. By an identical argument with Descendant replacing Ancestor and vice versa, Descendant N) G X Y. (3) X + Y is introduced by M4 (complementation). The argument is similar to the proof of Case 3 in Lemma 4.2. Lemma 4.4 tells us that the set D of dependencies that holds for a scheme tree T is the closure of itself on Aset(T). LEMMA 4.4. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs ol)er U. Let T be a scheme tree such that Aset(T ) c U. L-et D be the set of dependencies in (M u F)+ that hold for T. Then D+= D on Aset(T). PROOF. The strategy for proving this result is to show that none of the inference rules can add a new dependency that is not already in D. All the cases except for C2 are straightforward, therefore we just prove that C2 cannot add a new FD. Let X + Y be an MVD in D, and let Z - W be an FD in Dsuchthat WgYand Y nz=o. Thus X+ Yand Z +W hold for T. Because X + Y holds for T, XY G Aset(Z ) and there exits an MVD X + Y in (M u F)+ such that Y = Y n Aset(T). Z + W holds for T, and therefore Z + W is in (M u F)+ and Zw ~ Aset(T). Because W c Y and Y G Y, W G Y. Y = Y n Aset(T), Z CAset(T), and Y n Z = 0, therefore Y n Z = 0. Hence X + W is in (M U F)+. Because XW G Aset(T), X + W is already in D. u Lemma 4.5 provides an interesting result: the given FDs can be disregarded if we are only interested in certain implied MVDs. In particular, if MVD(T) and FD(T ) imply an MVD X + Y, then if we close the left-hand side of the MVD under MVD(T) and FD(T) on Aset (T ) to obtain X+, MVD(T) alone is sufficient to imply X++ Y, The converse also holds, and although we do not need the converse for Theorem 4.1, we provide it here because we need it later for a lemma required for Theorem 5.2. LEMMA 4.5. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T ) c U. Let XY Q Aset(T). If MVD(7 ) u FD(T) implies X + Y on Aset(T), then MVD(T) implies XL VD{T )b FD( T ) + Y on Aset(T) and conversely. PROOF. The result can be proved by using Theorem 1 in Beeri and Kifer [1986]. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

18 94. W. Y. Mok et al. Lemma 4.6 begins to directly address the redundancy issue in nested relations. We use it twice in Theorem 4.1, and thus we write it separately as a Lemma. Before stating and proving Lemma 4.6, we need a definition for a path in a scheme tree. Definition 4.1. A path of a scheme tree T is a sequence of nodes NI,..., N. where NI is the root of T and N. is a leaf node of T and Ni is the parent of Ni+l,l<i<n l. LEMMA 4.6. Let T be a scheme tree. Let r be a nested relation on T. Let A be an attribute in node N~ of T. If tl and tz are distinct tuples in the total unnesting of r such that tl( Ancestor(N~)) = tz(ancestor(n~)), then tl( A) and tz(a) are in the same nested tuple of a single nested relation under the nested relation scheme whose set of atomic attributes is NA. PROOF. Because we only allow PNF nested relations, this result can be proved easily by using Definition u THEOREM 4.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. T has no potential redundancy with respect to M ~ F if T is in NNF with respect to MuF. PROOF, Assume not, then T has potential redundance with respect to M u F. Thus by Definition 2.3.4, there exists a redundant atomic value u in a valid nested relation r on T caused by a dependency X * Y, which is either an FD or an MVD, that holds for T with respect to M and F. In either case, by Definition 2.3.3, we have the following: XY G Aset(T), there exists an attribute A such that A G Y and X, there exists a subtree S of T that contains A as an atomic attribute, there exist distinct nested tuples u ~ and u~ that are respectively in two (not necessarily distinct) nested relations on S, and there exists an atomic value v, such that UI(A) = U, UZ(A) = u, and v = v. Furthermore, there exist distinct tuples t~ and tz in the total unnesting of r such that ti(aset(s)) and tz( Aset(S)) are tuples in the total unnesting of u ~ and u ~, respectively. First assume that the redundancy is caused by an FD. Without loss of generality, we may assume that Y = A and thus that the FD is X + A. Because u is a redundant atomic value in r caused by X + A, by the FD part of Definition 2.3.3, tl( X) = tz(x). A is an atomic attribute in S, therefore A is in the root node of the subtree S of T. Let the node that contains A be denoted by NA. Now, either t1(ancestor(n~)) = tz(ancestor(na)) or t~(ancestor(n~)) # tz(ancestor(na)). Case 1. Suppose t1(ancestor(n~)) # tz(ancestor(n~)). Because X + A is nontrivial. Because ti(x) = tjx) but t1(ancestor(n~)) # tz(ancestor(na)), X + Ancestor(NA). Thus there exists a nontrivial FD X + A that holds for T, but X + Ancestor(NA), where NA is the node in T that contains A, but this contradicts Condition 2 of NNF. ACM Transactions on Databaas Systems, Vol. 21, No, 1, March 1996.

19 Redundancy in Nested Relations. 95 Case 2. Suppose tl( Ancestor( N~ )) = t2( Ancestor( N~ )). Because r is a nested relation on T, A is an attribute in node N~ of T, and t~ and t~ are distinct tuples in the total unnesting of r such that t I( Ancestor( NA)) = tz( Ancestor(N~)), by Lemma 4.6, tl(a) and t2(a) are in the same nested tuple of a single nested relation under the nested relation scheme whose set of atomic attributes is NA. S is the nested relation scheme whose set of atomic attributes is NA, therefore t 1(A) and t2( A) are in the same nested tuple under S, A G N~ and NA is the root node of subtree S, therefore A G Aset(S ). Because A G Aset(S ), ti(aset(s )) and tz( Aset(S )) are tuples in the total unnesting of u ~ and u~, respectively, t1(a) and t2(a ) are respectively in U1 and Uz. tl(a) is in u,, t2(a)) is in U2, and UI and Uz are distinct nested tuples under S, thus tl( A) and tz( A) are in distinct nested tuples under S, a contradiction to our earlier established fact that t I( A) and t2(a) are in the same nested tuple under S. The redundancy cannot be caused by an FD, and we thus assume that the dependency X * Y that causes the redundancy is the MVD X + Y. Because ~~is a redundant atomic value in r caused by X + Y, by the MVD part of Definition 2.3.3, tl(x) = t2(x), tl(y) = t2(y), and tl(z) # t2(z) where Z = Aset(T) (XY). Z = Aset(T) (XY) and tl(z) # tz(z), Z and thus XY # Aset(T). A = Y X, therefore Y ~ X. Because XY # Aset(T) and Y ~ X, X + Y is a nontrivial MVD on Aset(T). A G Aset(Z ), and A is in a node of T. Let NA be the node of T that contains A. X + Y holds for T given M U F, therefore X + Y E D and thus ~ implies X + Y. By Condition 1 of NNF, D is equivalent to MVD(T) U FD(T) on Aset(T), thus MVD(T ) u I D(T) implies X + Y on Aset(T). Because MVD(T) U I D(T) implies X + Y on Aset(T), by Lemma 4.5, MVD(T) implies XfVD(T)u FD(TI + y on Aset(T). Let X+ = X~VD(T) FD(T), then MVD(T) implies X++ Y and thus X++ Y G MVD(T) + on Aset(T). By Lemma 4.4, D = D on Aset(T) and by Condition 1, MVD(T) u FD(T) is equivalent to D on Aset(T ), therefore each MVD and FD in ( MVD(T ) u FD(T)) + on Aset(Z ) is in D. Because X + X+ is implied by MVD(T) U FD(T) on Aset(T) and each MVD and FD in (MVD(T) u F~(T))+ on Aset(T) is in D, X +X+ is in D. Every attribute of every MVD and FD in D is in Aset(T) and X + X+ is in D, thus XX+ G Aset(T). Because X + X+ and XX+ c Aset(T) and tl(x) = t2(x), tl(x+) = tz(x+). Also, X+; otherwise, X + A holds for T, and by an argument similar to the preceding FD case, we can arrive at a contradiction. Furthermore, Z ~ X+; otherwise, X + Z holds for T, which, because tl(x) = t2(x), makes tl(z) = t2(z), which contradicts tl(z) # t2(z). Now let Z = Aset(7 ) (X Y), Then Z because Z = Aset(T) (xy), Z g X+, Z + 0, and Z =Aset(T) (X+ Y). A G Y and X+, therefore Y ~ X+; and because Z (= Aset(T) (X+ Y)) # 0, X + Y is a nontrivial MVD on Aset(T). We now have X+ Y G Aset(T), X+ + Y is a nontrivial MVD in MVD(T)+ on Aset(T), A G Y, X+, and A is in node NA of T, thus by Lemma 4.3 Ancestor( N~) c X+ Y. Because tl(x+) = tz(x+) and tl(y) = tz(y), tl(x+y) = tz(x+y). tl(x+y) = tz( X Y), and Ancestor(N~) G X Y, therefore tl( Arzcestor(NA )) = t2( An- ACM Transactions on Database Systems, Vol. 21, No. 1, Msrch 1996

20 96. W. i. Mok et al. ce.stor( NA )). Because t I( Ancestor(NA )) = t2( &cestor(na )), we may use the argument in Case 2 for FDs to obtain a contradiction. 5. POTENTIAL REDUNDANCY AND NNF If we could prove the converse of Theorem 4.1, we would have a precise characterization of potential redundancy in nested relations in terms of NNF. Unfortunately, the converse is not true. With a small adjustment, however, we can malce it true. The problem is that we might have a scheme tree that is not consistent with the given FDs and MVDs. We define consistency as follows. Definition 5.1. Let U be a set of attributes. Let lf be a set of MVDs over U and F be a set of FDs over ~. Let T be a scheme tree such that Aset(T) c U. Let D be the set of MVDs and FDs that hold for T with respect to M and F. A scheme tree T is consistent with M and F if D implies MVD(T) on Aset(T). Example 5.1. As a counterexample to show that the converse of Theorem 4.1 does not hold and to motivate our desire for consistency, consider the following. Let U = ABC. Let M be the empty set of MVDs and F be the empty set of FDs. Let T be the scheme tree of the nested relation scheme A(B)*(C)*. Now iwvd(t) = {A + B, A + C), and we can let FD(T) = 0. In as much as both M and F are empty, the set of FDs and MVDs D that hold for T includes only trivial dependencies. Thus D is not equivalent to MVD(T) U FD(T) and hence T is not in NNF. However, the only constraints that apply are trivial, therefore there is no potential redundancy. Thus we have a scheme tree that has no potential redundancy, but is not in NNF. T, however, is not consistent because the trivial dependencies are insu~lcient to imply MVD(T ), which contains nontrivial dependencies. Intuitively, we should not want any scheme tree that implies nontrivial MVDs unless the implied nontrivial MVDs are given or implied by given constraints. That is, we should only be interested in consistent scheme trees. Therefore, the consistency requirement does not turn out to be a problem, and we can have what we want. If we assume that scheme trees are consistent with a given set of lnlls and FDs, we can obtain a precise characterization of potential redundancy in nested relation schemes. For then, as we show in this section, a consistent scheme tree T has no potential redundancy if and only if T is in NNF. We now proceed to prove this result. We first prove in Theorem 5.1 that our NNF definition not only implies that a scheme tree has no potential redundancy, but also that it is consistent. This result follows almost immediately from Theorem 4.1. THEOREM 5.1. Let U be a set of attributes. Let M be a set of MVDS over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. T is consistent with M and F and has no potential redundancy with respect to M u F if T is in NNF with respect to M u F. ACM Transactions DatabaseSystems,Vol. 21, No. 1, March 1996

21 Redundancy in Nested Relations. 97 PROOF. By Theorem 4.1, T has no potential redundancy with respect to ~ u F. By Condition 1 of NNF, ~ is equivalent to MVD(T) u FD(T) on Aset(T). Thus D implies MVD(T) on Aset(T) and hence T is consistent with M and F. Before proceeding with the proof of the converse of Theorem 5.1, we need a result about paths in scheme trees, which we prove in Lemma 5.1, and a result about potential redundancy, which we prove in Lemma 5.2. We then prove that if we allow only consistent scheme trees and some scheme tree does not satisfi Condition 1 of NNF, it has potential redundancy (Lemma 5.3) or if it does not satisfy Condition 2 of NNF, it has potential redundancy (Lemma 5.4). LEMMA 5.1. Let U be a set of attributes. Let T be a scheme tree such that Aset(T) c U. Let X + Y be an MVD on Aset (T). If MVD(T) does not imply X + Y on Aset(T), then there is a path p of T whose set of attributes is P such that (Y X) n P and (Aset(T) (XY)) n P are both nonempty proper subsets of P. PROOF. Throughout the proof, all implications (and thus also all closures of attributes) are taken with respect to Aset(T ). Furthermore, because by Lemma 4.4, D+= D on Aset(T), all implications on Aset(T ) remain on Aset (T ). We therefore omit references to Aset (T), leaving this understood without explicit mention. X + Y must be nontrivial; otherwise, MVD(T ) implies X + Y. Assume that the paths of T are p,,.... P., n ~ 1. Let the set of attributes Of Pi be P,, 1 < i s n. Let Z = Aset (T) ( XY ). We proceed by contradiction, and thus assume that for all i, 1 < i < n, either (Y X) n P, is not a nonempty proper subset of P, or Z n PCis not a nonempty proper subset of P,. Thus for all i, 1 s z s n, either P, GXYor p, cxz. Because P, c XY or P, c X2, for all i, 1< i < n, if there is no P, G X2, l<i<n, then P,& XY, l< i<n. Butif P, GXY, l<i< n, then because PI u... u P. = Aset(T), XY = Aset(T), and thus X + Y is trivial. Similarly, if there is no P, c XY, 1 s i < n, then X + Z is trivial, Thus (after reindexing, if necessary) there is an index q, 1 < q < n, such that P, c XY, for each i,l<i<q, and~, CXZ, for each j,q+l Sj<n. Let V= PIU.. U Pq and W= P~+l u... up.. We next show that MVD(T) implies (V n W) + V. For any scheme tree, there is a one-to-one correspondence between leaf nodes and paths. Thus if we let L, be the leaf node of path pi, 1 < i < n (under the possible reindexing), L, GVforl<i<q and L,g VnWforl<isq. Every path in any scheme tree includes the root, and the root of T is in V n W. Thus, for every path P,, 1 < i s q, we know that the leaf Li of p, is not in V n W and that the root of p, is in V n W. Hence for every path p,, 1 s i s q, there exists a lowest level node N,, 1< i < q, that is not the leaf and is in V n W, None of the nodes N,, 1 s i s q, is a leaf, therefore each has one or more children. Let N,, 1 < i < q, be the child of N, on path p,. Now by the definition of MVD(T), Ancestor( N,) + Descendant(N~) = MVD(T), 1< i < q. Because N, ACM Transactions on Database Systems, Vol 21, No. 1, March 1996

22 98. W. Y. Mok et al. G V n W, by the definition of V and W, Ancestor(Ni ) G V n W, 1< i s q. Ancestor (Ni) c V n W, 1 s i s q, by M2 (augmentation), thus MVD(T) implies V n W + Descendant(N~), 1 s i s q. By Ml (reflexivity), MVD(T) implies V n W + V n W. By our construction process (V n W ) u ~escendant(n{) U... u Ilescendant( NJ) = V. Thus by applying the MVD union rule to the MVDs V n W + Descendant(N~), 1 s i s q, and V n W + V n W and substituting V for the right-hand side of the result, MY.(T) implies V n W + v. To finish the proof, we prove that on Aset(Z ), X + Y follows from V n W + V and thus we will have a contradiction. We first prove V n W c X and V X=Y X. Because V CXYand W cxz, (V n W) G (XYn XZ). Z =Aset(T) - (xy), thus Z n XY = 0. Therefore, XY n XZ =X. Hence (V n W) G X. V c XY, therefore V X c XY X. However, XY X = Y X and thus V XCY X. Aasume Y- X$Z V-X; thenthere isanattribute A~Y Xsuchthat A@ V X. Because A= Y X, AZ X. A@ V Xand A@X, thus V. Let the path containing A be p and let P be the set of attributes of p. Hence A G P. P g V; otherwise A G V, which contradicts V. In addition, P g XY; otherwise P c V, as V is the union of paths that are subsets of XY. Hence there is an attribute B G P that is not in XY. Because A c Y X, XY, and both A and 1? are attributes in P, (Y X) n P and (Aset(T) (XY)) n P are both nonempty proper subsets of P, which contradicts the hypothesis we made at the beginning of this proof. There fore, Y- XgV-X. Thus because V XGY X, V X=Y X. MVD(Z ) implies V n W + V and V n W c X, therefore by M2 (augmentation), MVD(Z ) implies X + V. Because MVD(T) implies X + X and X -+ V, by M3 (transitivity), MVD(Z ) implies X + V X. Thus, because V X = Y X, MVD(Z ) implies X + Y X. Now by augmenting both sides of X + Y X with X n Y, we have MVD(T) implies X + Y a contradiction, o LEMMA 5.2. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U and T is consistent with M and F. Let D be the set of Ml?Ds and FDs that holds for T with respect to M and F. Let A and B be attributes in Aset(T), and let X and Z be sets of attributes in Aset(T). If A is in node N~ of T, B G Ancestor(NA), Z, A 64X, B G Z, Z G DEP~(X) z such that X ++Z, then T has potential redundancy with respect to M U F. ~OOF. As we have done before, we omit references to Aset(T), inasmuch all implications are taken with respect to and remain on Aset(T). To establish potential redundancy for T with respect to M U F, we must exhibit a valid nested relation for T with respect to M u F that has a redundant atomic value. Consider a flat relation r on Aset(T) that has two tuples tland t2.let tlbe a row of all 1s, tz(aset(t) Z) = tl(aset(t) Z), 2DEP~(X) is the dependency basis for X with respect to D on Aset(Z ) [Maier 1983], ACM Transactionson DatabaseSystems,Vol. 21, No. 1, March 1996.

23 Redundancy in Nested Relations. 99 and tz( Z ) be all 2s. Because Z = DEP~( X) and X + Z, r is valid with respect to M U F. We now establish the following groups of essential facts, and then conclude. (1) Because T is consistent with M and F, D implies MVD(T). r is a flat relation and is valid with respect to A4 u F, therefore r satisfies every FD and MVD that holds for T with respect to M and F, and thus r satisfies D. r satisfies D and D implies MVD( T ), therefore r satisfies MVD(T). r satisfies MVD(T) and r is defined on Aset(T), thus we can nest r according to T to obtain a nested relation q on T. Because r is the total unnesting of q and r satisfies every FD and MVD that holds for T with respect to M and F, q is valid with respect to Lf u F. (2) Because B c Z, tl(b) = 1 and tz(b) = 2. Z and X, therefore A = Aset(T) (X2). A G Aset(T) (X2), thus tl(a) = tz(a) and are both 1s. Let S be the nested relation scheme in T whose set of atomic attributes is N~. Then, because t I(B ) = 1 and tz( B ) = 2 and B = Ancestor( N~), t I(A) and t2( A) are in distinct nested tuples u, and Ux under S in q. (3) Let Y = Aset(T) (X2). Z G DEP~(X), thus D implies X + Y. Therefore, by Lemma 4.4, X + Y is in D. Because A Aset(T) (X2) and Y= Aset(Z ) (X2), A G Y and A EX. Z G DEPJX) and X+2, thus Zfl X=O. Zn X=@ and tl(aset(t) Z)= tz( Aset(T) Z), therefore tl( X) = tz(x). Because Y = Aset(T) (X2), t,(y) = tjy). Now, according to Definition 2.3.3, atomic value u,(a) is a redundant atomic value in q caused by X + Y. There exists a redundant atomic value in a valid nested relation for T caused by X + Y, which holds for T with respect to A4 and F, therefore T has potential redundancy with respect to MuF. LEMMA 5.3. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U and T is consistent with M and F. If D is the set of MVDS and FDs that holds for T with respect to M and F and D is not equivalent to MVD(T) U FD(T) on Aset( T), then T has potential redundancy with respect to M u F. PROOF. As we have done before, we omit references to Aset( T ). Because T is consistent with M and F, D implies MVD(T ). By definition of FD( T) and D, FD( T ) is equivalent to the set of FDs in D and thus D implies MVD(T ) u FD( T ). However, D is not equivalent to MVD(T) u I D(T ), therefore MVD(T ) u FD(T ) does not imply D. Thus there is an FD or MVD in D that is not implied by MVD(T) u FD( T ). Because FD(T ) is equivalent to the set of FDs in D, there is an MVD X * Y in D that is not implied by MVD(T) U FD(T). Let X+ = X~v~(~~, ~~~~). MVD(T) U FD(T ) does not imply X + Y, thus by Lemma 4.5, MVD(Z ) does not imply X++ Y. Therefore, by Lemma 5.1, there is a path p of T whose set of attributes is P ACM Transactions on Database Systems, Vol 21, No. 1, March 1996.

Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time

Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time Wai Yin Mok, Joseph Fong and David W. Embley Abstract Given a hypergraph and a set of embedded functional

More information

Relational Database Design

Relational Database Design Relational Database Design Jan Chomicki University at Buffalo Jan Chomicki () Relational database design 1 / 16 Outline 1 Functional dependencies 2 Normal forms 3 Multivalued dependencies Jan Chomicki

More information

Constraints: Functional Dependencies

Constraints: Functional Dependencies Constraints: Functional Dependencies Fall 2017 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Functional Dependencies 1 / 42 Schema Design When we get a relational

More information

A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN. Millist W. Vincent ABSTRACT

A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN. Millist W. Vincent ABSTRACT A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN Millist W. Vincent Advanced Computing Research Centre, School of Computer and Information Science, University of South Australia, Adelaide, Australia

More information

Functional Dependencies

Functional Dependencies Functional Dependencies Functional Dependencies Framework for systematic design and optimization of relational schemas Generalization over the notion of Keys Crucial in obtaining correct normalized schemas

More information

Schema Refinement: Other Dependencies and Higher Normal Forms

Schema Refinement: Other Dependencies and Higher Normal Forms Schema Refinement: Other Dependencies and Higher Normal Forms Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Higher Normal Forms 1 / 14 Outline 1

More information

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein Reference: A First Course in Database Systems, 3 rd edition, Chapter 3 Important Notices CMPS 180 Final Exam

More information

Relational Design Theory

Relational Design Theory Relational Design Theory CSE462 Database Concepts Demian Lessa/Jan Chomicki Department of Computer Science and Engineering State University of New York, Buffalo Fall 2013 Overview How does one design a

More information

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems Relational Database Design Theory Part II CPS 116 Introduction to Database Systems Announcements (October 12) 2 Midterm graded; sample solution available Please verify your grades on Blackboard Project

More information

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018 DESIGN THEORY FOR RELATIONAL DATABASES csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018 1 Introduction There are always many different schemas for a given

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

Functional Dependencies and Normalization

Functional Dependencies and Normalization Functional Dependencies and Normalization There are many forms of constraints on relational database schemata other than key dependencies. Undoubtedly most important is the functional dependency. A functional

More information

Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time

Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time Wai Yin Mok, Joseph Fong and David W. Embley Abstract Given a hypergraph and a set of embedded functional

More information

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds Plan of the lecture G53RDB: Theory of Relational Databases Lecture 10 Natasha Alechina School of Computer Science & IT nza@cs.nott.ac.uk Logical implication for functional dependencies Armstrong closure.

More information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information Relational Database Design Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design

More information

INF1383 -Bancos de Dados

INF1383 -Bancos de Dados INF1383 -Bancos de Dados Prof. Sérgio Lifschitz DI PUC-Rio Eng. Computação, Sistemas de Informação e Ciência da Computação Projeto de BD e Formas Normais Alguns slides são baseados ou modificados dos originais

More information

Constraints: Functional Dependencies

Constraints: Functional Dependencies Constraints: Functional Dependencies Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Functional Dependencies 1 / 32 Schema Design When we get a relational

More information

SCHEMA NORMALIZATION. CS 564- Fall 2015

SCHEMA NORMALIZATION. CS 564- Fall 2015 SCHEMA NORMALIZATION CS 564- Fall 2015 HOW TO BUILD A DB APPLICATION Pick an application Figure out what to model (ER model) Output: ER diagram Transform the ER diagram to a relational schema Refine the

More information

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms Database Group http://liris.cnrs.fr/ecoquery/dokuwiki/doku.php?id=enseignement: dbdm:start March 1, 2017 Exemple

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms UMass Amherst Feb 14, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke, Dan Suciu 1 Relational Schema Design Conceptual Design name Product buys Person price name

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Keys and Dependencies 18 January 2012 Prof. Chris Clifton Functional Dependencies X A = assertion about a relation R that whenever two tuples agree on all the attributes of X,

More information

Databases 2012 Normalization

Databases 2012 Normalization Databases 2012 Christian S. Jensen Computer Science, Aarhus University Overview Review of redundancy anomalies and decomposition Boyce-Codd Normal Form Motivation for Third Normal Form Third Normal Form

More information

Denotational Semantics

Denotational Semantics 5 Denotational Semantics In the operational approach, we were interested in how a program is executed. This is contrary to the denotational approach, where we are merely interested in the effect of executing

More information

Lectures 6. Lecture 6: Design Theory

Lectures 6. Lecture 6: Design Theory Lectures 6 Lecture 6: Design Theory Lecture 6 Announcements Solutions to PS1 are posted online. Grades coming soon! Project part 1 is out. Check your groups and let us know if you have any issues. We have

More information

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated

More information

Relational Database Design

Relational Database Design CSL 451 Introduction to Database Systems Relational Database Design Department of Computer Science and Engineering Indian Institute of Technology Ropar Narayanan (CK) Chatapuram Krishnan! Recap - Boyce-Codd

More information

On the logical Implication of Multivalued Dependencies with Null Values

On the logical Implication of Multivalued Dependencies with Null Values On the logical Implication of Multivalued Dependencies with Null Values Sebastian Link Department of Information Systems, Information Science Research Centre Massey University, Palmerston North, New Zealand

More information

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018 CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018 Announcement Read Chapter 14 and 15 You must self-study these chapters Too huge to cover in Lectures Project 2 Part 1 due tonight Agenda 1.

More information

Relational-Database Design

Relational-Database Design C H A P T E R 7 Relational-Database Design Exercises 7.2 Answer: A decomposition {R 1, R 2 } is a lossless-join decomposition if R 1 R 2 R 1 or R 1 R 2 R 2. Let R 1 =(A, B, C), R 2 =(A, D, E), and R 1

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational

More information

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies? Normalization Introduction What problems are caused by redundancy? UVic C SC 370 Dr. Daniel M. German Department of Computer Science What are functional dependencies? What are normal forms? What are the

More information

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Schema Refinement and Normal Forms Chapter 19 Quiz #2 Next Thursday Comp 521 Files and Databases Fall 2012 1 The Evils of Redundancy v Redundancy is at the root of several problems associated with relational

More information

Functional Dependencies

Functional Dependencies Chapter 7 Functional Dependencies 7.1 Introduction 7.2 Proofs and Functional Dependencies 7.3 Keys and Functional Dependencies 7.4 Covers 7.5 Tableaux 7.6 Exercises 7.7 Bibliographical Comments 7.1 Introduction

More information

CS 173: Induction. Madhusudan Parthasarathy University of Illinois at Urbana-Champaign. February 7, 2016

CS 173: Induction. Madhusudan Parthasarathy University of Illinois at Urbana-Champaign. February 7, 2016 CS 173: Induction Madhusudan Parthasarathy University of Illinois at Urbana-Champaign 1 Induction February 7, 016 This chapter covers mathematical induction, and is an alternative resource to the one in

More information

Chapter 8: Relational Database Design

Chapter 8: Relational Database Design Chapter 8: Relational Database Design Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 8: Relational Database Design Features of Good Relational Design Atomic Domains

More information

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004 Schema Refinement and Normal Forms CIS 330, Spring 2004 Lecture 11 March 2, 2004 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage,

More information

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17 Normalization October 5, 2017 Chapter 19 Pacific University 1 Description A Real Estate agent wants to track offers made on properties. Each customer has a first and last name. Each property has a size,

More information

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design Chapter 7: Relational Database Design Chapter 7: Relational Database Design! First Normal Form! Pitfalls in Relational Database Design! Functional Dependencies! Decomposition! Boyce-Codd Normal Form! Third

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L05: Functional Dependencies Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR, China

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd. The Evils of Redundancy Schema Refinement and Normal Forms INFO 330, Fall 2006 1 Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update

More information

On a problem of Fagin concerning multivalued dependencies in relational databases

On a problem of Fagin concerning multivalued dependencies in relational databases Theoretical Computer Science 353 (2006) 53 62 www.elsevier.com/locate/tcs On a problem of Fagin concerning multivalued dependencies in relational databases Sven Hartmann, Sebastian Link,1 Department of

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd. The Evils of Redundancy Schema Refinement and Normal Forms Chapter 19 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke 1 Redundancy is at the root of several problems associated with relational

More information

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram Schema Refinement and Normal Forms Chapter 19 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational

More information

MORE ON CONTINUOUS FUNCTIONS AND SETS

MORE ON CONTINUOUS FUNCTIONS AND SETS Chapter 6 MORE ON CONTINUOUS FUNCTIONS AND SETS This chapter can be considered enrichment material containing also several more advanced topics and may be skipped in its entirety. You can proceed directly

More information

Functional Dependencies & Normalization. Dr. Bassam Hammo

Functional Dependencies & Normalization. Dr. Bassam Hammo Functional Dependencies & Normalization Dr. Bassam Hammo Redundancy and Normalisation Redundant Data Can be determined from other data in the database Leads to various problems INSERT anomalies UPDATE

More information

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. Chapter 19 Schema Refinement and Normal Forms Chapter 19 1 Review: Database Design Requirements Analysis user needs; what must the database do? Conceptual Design high level descr. (often done w/er model) Logical

More information

Information Systems (Informationssysteme)

Information Systems (Informationssysteme) Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2015 c Jens Teubner Information Systems Summer 2015 1 Part VII Schema Normalization c Jens Teubner

More information

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19 Schema Refinement and Normal Forms [R&G] Chapter 19 CS432 1 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: redundant storage, insert/delete/update

More information

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design Chapter 7: Relational Database Design Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal

More information

Schema Refinement and Normal Forms Chapter 19

Schema Refinement and Normal Forms Chapter 19 Schema Refinement and Normal Forms Chapter 19 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh Database Management

More information

Schema Refinement & Normalization Theory

Schema Refinement & Normalization Theory Schema Refinement & Normalization Theory Functional Dependencies Week 13 1 What s the Problem Consider relation obtained (call it SNLRHW) Hourly_Emps(ssn, name, lot, rating, hrly_wage, hrs_worked) What

More information

Strongly chordal and chordal bipartite graphs are sandwich monotone

Strongly chordal and chordal bipartite graphs are sandwich monotone Strongly chordal and chordal bipartite graphs are sandwich monotone Pinar Heggernes Federico Mancini Charis Papadopoulos R. Sritharan Abstract A graph class is sandwich monotone if, for every pair of its

More information

Functional Dependency and Algorithmic Decomposition

Functional Dependency and Algorithmic Decomposition Functional Dependency and Algorithmic Decomposition In this section we introduce some new mathematical concepts relating to functional dependency and, along the way, show their practical use in relational

More information

CHAPTER 8: EXPLORING R

CHAPTER 8: EXPLORING R CHAPTER 8: EXPLORING R LECTURE NOTES FOR MATH 378 (CSUSM, SPRING 2009). WAYNE AITKEN In the previous chapter we discussed the need for a complete ordered field. The field Q is not complete, so we constructed

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies Mark Levene Department of Computer Science University College London Gower Street London WC1E 6BT, U.K.

More information

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers ALGEBRA CHRISTIAN REMLING 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers by Z = {..., 2, 1, 0, 1,...}. Given a, b Z, we write a b if b = ac for some

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

CHAPTER 7. Connectedness

CHAPTER 7. Connectedness CHAPTER 7 Connectedness 7.1. Connected topological spaces Definition 7.1. A topological space (X, T X ) is said to be connected if there is no continuous surjection f : X {0, 1} where the two point set

More information

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design Applied Databases Handout 2a. Functional Dependencies and Normal Forms 20 Oct 2008 Functional Dependencies This is the most mathematical part of the course. Functional dependencies provide an alternative

More information

Test Generation for Designs with Multiple Clocks

Test Generation for Designs with Multiple Clocks 39.1 Test Generation for Designs with Multiple Clocks Xijiang Lin and Rob Thompson Mentor Graphics Corp. 8005 SW Boeckman Rd. Wilsonville, OR 97070 Abstract To improve the system performance, designs with

More information

Design by Example for SQL Tables with Functional Dependencies

Design by Example for SQL Tables with Functional Dependencies VLDB Journal manuscript No. (will be inserted by the editor) Design by Example for SQL Tables with Functional Dependencies Sven Hartmann Markus Kirchberg Sebastian Link Received: date / Accepted: date

More information

Chapter 3 Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases 1 Chapter 3 Design Theory for Relational Databases Contents Functional Dependencies Decompositions Normal Forms (BCNF, 3NF) Multivalued Dependencies (and 4NF) Reasoning About FD s + MVD s 2 Our example

More information

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and 7 RC Simulates RA. We now show that DRC (and hence TRC) is at least as expressive as RA. That is, given an RA expression E that mentions at most C, there is an equivalent DRC expression E that mentions

More information

32 Divisibility Theory in Integral Domains

32 Divisibility Theory in Integral Domains 3 Divisibility Theory in Integral Domains As we have already mentioned, the ring of integers is the prototype of integral domains. There is a divisibility relation on * : an integer b is said to be divisible

More information

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009 Databases Lecture 8 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lecture 8 DB 2009 1 / 15 Lecture 08: Multivalued Dependencies

More information

Design Theory for Relational Databases

Design Theory for Relational Databases Design Theory for Relational Databases Keys: formal definition K is a superkey for relation R if K functionally determines all attributes of R K is a key for R if K is a superkey, but no proper subset

More information

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic Mathematics 114L Spring 2018 D.A. Martin Mathematical Logic 1 First-Order Languages. Symbols. All first-order languages we consider will have the following symbols: (i) variables v 1, v 2, v 3,... ; (ii)

More information

Schema Refinement. Feb 4, 2010

Schema Refinement. Feb 4, 2010 Schema Refinement Feb 4, 2010 1 Relational Schema Design Conceptual Design name Product buys Person price name ssn ER Model Logical design Relational Schema plus Integrity Constraints Schema Refinement

More information

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 CSC 261/461 Database Systems Lecture 8 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Agenda 1. Database Design 2. Normal forms & functional dependencies 3. Finding functional dependencies

More information

Functional Dependencies

Functional Dependencies Cleveland State University CIS 611 Relational Databases Prepared by Victor Matos Functional Dependencies Source: The Theory of Relational Databases D. Maier, Ed. Computer Science Press Available at: http://www.dbis.informatik.hu-berlin.de/~freytag/maier/

More information

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago COSC 430 Advanced Database Topics Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago Learning objectives and references You should be able to: define the elements of the relational

More information

Gap Embedding for Well-Quasi-Orderings 1

Gap Embedding for Well-Quasi-Orderings 1 WoLLIC 2003 Preliminary Version Gap Embedding for Well-Quasi-Orderings 1 Nachum Dershowitz 2 and Iddo Tzameret 3 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel Abstract Given a

More information

Information Flow on Directed Acyclic Graphs

Information Flow on Directed Acyclic Graphs Information Flow on Directed Acyclic Graphs Michael Donders, Sara Miner More, and Pavel Naumov Department of Mathematics and Computer Science McDaniel College, Westminster, Maryland 21157, USA {msd002,smore,pnaumov}@mcdaniel.edu

More information

CSC 261/461 Database Systems Lecture 11

CSC 261/461 Database Systems Lecture 11 CSC 261/461 Database Systems Lecture 11 Fall 2017 Announcement Read the textbook! Chapter 8: Will cover later; But self-study the chapter Everything except Section 8.4 Chapter 14: Section 14.1 14.5 Chapter

More information

Global Database Design based on Storage Space and Update Time Minimization

Global Database Design based on Storage Space and Update Time Minimization Journal of Universal Computer Science, vol. 15, no. 1 (2009), 195-240 submitted: 11/1/08, accepted: 15/8/08, appeared: 1/1/09 J.UCS Global Database Design based on Storage Space and Update Time Minimization

More information

CLIQUES IN THE UNION OF GRAPHS

CLIQUES IN THE UNION OF GRAPHS CLIQUES IN THE UNION OF GRAPHS RON AHARONI, ELI BERGER, MARIA CHUDNOVSKY, AND JUBA ZIANI Abstract. Let B and R be two simple graphs with vertex set V, and let G(B, R) be the simple graph with vertex set

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and

More information

Lecture Notes on The Curry-Howard Isomorphism

Lecture Notes on The Curry-Howard Isomorphism Lecture Notes on The Curry-Howard Isomorphism 15-312: Foundations of Programming Languages Frank Pfenning Lecture 27 ecember 4, 2003 In this lecture we explore an interesting connection between logic and

More information

Rose-Hulman Undergraduate Mathematics Journal

Rose-Hulman Undergraduate Mathematics Journal Rose-Hulman Undergraduate Mathematics Journal Volume 17 Issue 1 Article 5 Reversing A Doodle Bryan A. Curtis Metropolitan State University of Denver Follow this and additional works at: http://scholar.rose-hulman.edu/rhumj

More information

Cographs; chordal graphs and tree decompositions

Cographs; chordal graphs and tree decompositions Cographs; chordal graphs and tree decompositions Zdeněk Dvořák September 14, 2015 Let us now proceed with some more interesting graph classes closed on induced subgraphs. 1 Cographs The class of cographs

More information

BCNF revisited: 40 Years Normal Forms

BCNF revisited: 40 Years Normal Forms Full set of slides BCNF revisited: 40 Years Normal Forms Faculty of Computer Science Technion - IIT, Haifa janos@cs.technion.ac.il www.cs.technion.ac.il/ janos 1 Full set of slides Acknowledgements Based

More information

Schema Refinement and Normal Forms. Why schema refinement?

Schema Refinement and Normal Forms. Why schema refinement? Schema Refinement and Normal Forms Why schema refinement? Consider relation obtained from Hourly_Emps: Hourly_Emps (sin,rating,hourly_wages,hourly_worked) Problems: Update Anomaly: Can we change the wages

More information

2. Prime and Maximal Ideals

2. Prime and Maximal Ideals 18 Andreas Gathmann 2. Prime and Maximal Ideals There are two special kinds of ideals that are of particular importance, both algebraically and geometrically: the so-called prime and maximal ideals. Let

More information

TR : Tableaux for the Logic of Proofs

TR : Tableaux for the Logic of Proofs City University of New York (CUNY) CUNY Academic Works Computer Science Technical Reports Graduate Center 2004 TR-2004001: Tableaux for the Logic of Proofs Bryan Renne Follow this and additional works

More information

Exercises 1 - Solutions

Exercises 1 - Solutions Exercises 1 - Solutions SAV 2013 1 PL validity For each of the following propositional logic formulae determine whether it is valid or not. If it is valid prove it, otherwise give a counterexample. Note

More information

CS322: Database Systems Normalization

CS322: Database Systems Normalization CS322: Database Systems Normalization Dr. Manas Khatua Assistant Professor Dept. of CSE IIT Jodhpur E-mail: manaskhatua@iitj.ac.in Introduction The normalization process takes a relation schema through

More information

CS411 Notes 3 Induction and Recursion

CS411 Notes 3 Induction and Recursion CS411 Notes 3 Induction and Recursion A. Demers 5 Feb 2001 These notes present inductive techniques for defining sets and subsets, for defining functions over sets, and for proving that a property holds

More information

Finding Compact Scheme Forests in. P. Thanisch. Department of Computer Science, University of Edinburgh,

Finding Compact Scheme Forests in. P. Thanisch. Department of Computer Science, University of Edinburgh, Finding Compact Scheme Forests in Nested Normal Form is NP-Hard P. Thanisch Department of Computer Science, University of Edinburgh, King's Buildings, Mayeld Road, Edinburgh EH9 3JZ, Scotland G. Loizou

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

PETER A. CHOLAK, PETER GERDES, AND KAREN LANGE

PETER A. CHOLAK, PETER GERDES, AND KAREN LANGE D-MAXIMAL SETS PETER A. CHOLAK, PETER GERDES, AND KAREN LANGE Abstract. Soare [23] proved that the maximal sets form an orbit in E. We consider here D-maximal sets, generalizations of maximal sets introduced

More information

Measures and Measure Spaces

Measures and Measure Spaces Chapter 2 Measures and Measure Spaces In summarizing the flaws of the Riemann integral we can focus on two main points: 1) Many nice functions are not Riemann integrable. 2) The Riemann integral does not

More information

Ordinary Interactive Small-Step Algorithms, III

Ordinary Interactive Small-Step Algorithms, III Ordinary Interactive Small-Step Algorithms, III ANDREAS BLASS University of Michigan and YURI GUREVICH Microsoft Research This is the third in a series of three papers extending the proof of the Abstract

More information

Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform. and Comput. 122 (1995) ]

Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform. and Comput. 122 (1995) ] Information and Computation 204 (2006) 1847 1851 www.elsevier.com/locate/ic Corrigendum Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform.

More information

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

CS 186, Fall 2002, Lecture 6 R&G Chapter 15 Schema Refinement and Normalization CS 186, Fall 2002, Lecture 6 R&G Chapter 15 Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus Functional Dependencies (Review)

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 4: DESIGN THEORIES (FUNCTIONAL DEPENDENCIES) Design theory E/R diagrams are high-level design Formal theory for

More information

Math 421, Homework #6 Solutions. (1) Let E R n Show that = (E c ) o, i.e. the complement of the closure is the interior of the complement.

Math 421, Homework #6 Solutions. (1) Let E R n Show that = (E c ) o, i.e. the complement of the closure is the interior of the complement. Math 421, Homework #6 Solutions (1) Let E R n Show that (Ē) c = (E c ) o, i.e. the complement of the closure is the interior of the complement. 1 Proof. Before giving the proof we recall characterizations

More information

Incomplete version for students of easllc2012 only. 6.6 The Model Existence Game 99

Incomplete version for students of easllc2012 only. 6.6 The Model Existence Game 99 98 First-Order Logic 6.6 The Model Existence Game In this section we learn a new game associated with trying to construct a model for a sentence or a set of sentences. This is of fundamental importance

More information

k-blocks: a connectivity invariant for graphs

k-blocks: a connectivity invariant for graphs 1 k-blocks: a connectivity invariant for graphs J. Carmesin R. Diestel M. Hamann F. Hundertmark June 17, 2014 Abstract A k-block in a graph G is a maximal set of at least k vertices no two of which can

More information