See Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [ 1987].

Similar documents
Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time

Relational Database Design

Constraints: Functional Dependencies

A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN. Millist W. Vincent ABSTRACT

Functional Dependencies

Schema Refinement: Other Dependencies and Higher Normal Forms

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Relational Design Theory

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Tree sets. Reinhard Diestel

Functional Dependencies and Normalization

Extracting a Largest Redundancy-Free XML Storage Structure from an Acyclic Hypergraph in Polynomial Time

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

INF1383 -Bancos de Dados

Constraints: Functional Dependencies

SCHEMA NORMALIZATION. CS 564- Fall 2015

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Schema Refinement and Normal Forms

CS54100: Database Systems

Databases 2012 Normalization

Denotational Semantics

Lectures 6. Lecture 6: Design Theory

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Relational Database Design

On the logical Implication of Multivalued Dependencies with Null Values

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Relational-Database Design

Schema Refinement and Normal Forms

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Schema Refinement and Normal Forms

Functional Dependencies

CS 173: Induction. Madhusudan Parthasarathy University of Illinois at Urbana-Champaign. February 7, 2016

Chapter 8: Relational Database Design

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Chapter 7: Relational Database Design

CSIT5300: Advanced Database Systems

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

On a problem of Fagin concerning multivalued dependencies in relational databases

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

MORE ON CONTINUOUS FUNCTIONS AND SETS

Functional Dependencies & Normalization. Dr. Bassam Hammo

Schema Refinement and Normal Forms. Chapter 19

Information Systems (Informationssysteme)

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Schema Refinement and Normal Forms Chapter 19

Schema Refinement & Normalization Theory

Strongly chordal and chordal bipartite graphs are sandwich monotone

Functional Dependency and Algorithmic Decomposition

CHAPTER 8: EXPLORING R

Introduction to Real Analysis Alternative Chapter 1

Guaranteeing No Interaction Between Functional Dependencies and Tree-Like Inclusion Dependencies

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

CHAPTER 7. Connectedness

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Test Generation for Designs with Multiple Clocks

Design by Example for SQL Tables with Functional Dependencies

Chapter 3 Design Theory for Relational Databases

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

32 Divisibility Theory in Integral Domains

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Design Theory for Relational Databases

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic

Schema Refinement. Feb 4, 2010

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Functional Dependencies

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Gap Embedding for Well-Quasi-Orderings 1

Information Flow on Directed Acyclic Graphs

CSC 261/461 Database Systems Lecture 11

Global Database Design based on Storage Space and Update Time Minimization

CLIQUES IN THE UNION OF GRAPHS

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions

Lecture Notes on The Curry-Howard Isomorphism

Rose-Hulman Undergraduate Mathematics Journal

Cographs; chordal graphs and tree decompositions

BCNF revisited: 40 Years Normal Forms

Schema Refinement and Normal Forms. Why schema refinement?

2. Prime and Maximal Ideals

TR : Tableaux for the Logic of Proofs

Exercises 1 - Solutions

CS322: Database Systems Normalization

CS411 Notes 3 Induction and Recursion

Finding Compact Scheme Forests in. P. Thanisch. Department of Computer Science, University of Edinburgh,

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

PETER A. CHOLAK, PETER GERDES, AND KAREN LANGE

Measures and Measure Spaces

Ordinary Interactive Small-Step Algorithms, III

Corrigendum to On the undecidability of implications between embedded multivalued database dependencies [Inform. and Comput. 122 (1995) ]

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Math 421, Homework #6 Solutions. (1) Let E R n Show that = (E c ) o, i.e. the complement of the closure is the interior of the complement.

Incomplete version for students of easllc2012 only. 6.6 The Model Existence Game 99

k-blocks: a connectivity invariant for graphs

Transcription:

A Normal Form Redundancy in for Precisely Characterizing Nested Relations WAI YIN MOK, YIU-KAI NG, and DAVID W. EMBLEY Brigham Young University We givea straightforward definition for redundancy in individual nested relations and define a new normal form that precisely characterizes redundancy for nested relations. We base our definition of redundancy on an arbitrary set of functional and multivalued dependencies, and show that our definition of nested normal form generalizes standard relational normalization theory. In addition, we give a condition that can prevent an unwanted structural anomaly in nested relations, namely, embedded nested relations with at most one tuple, Like other normal forms, our nested normal form can serve as a guide for database design. Categories and Subject Descriptors: H.2. 1 [Databaae Management]: Logical Desigr-data models% normal forms General Terms: Design, Theory Additional Key Words and Phrases: Database design, data redundancy, functional and multivalued dependencies, nested normal form, nested relations, normalization theory, scheme trees 1. INTRODUCTION Although normalization theory for flat relations has a long research history, its extension to nested relations is much more recent. Partition Normal Form (F NF ) [Roth et al. 19881, which guarantees eqmctid properties for nesting and unnesting and for keys of nested relations, has been well accepted. indeed, nested relations are sometimes defined such that only PNF relations are allowed,l and for Abiteboul and Bidoit [ 1986], the definition predates PNF. A normal form for nested relation schemes that detects potential redundancy and the possible update anomalies that accompany redundancy, however, has not been widely accepted, even though some have been proposed [Ozsoyoglu and Yuan 1987, 1989; Roth and Korth 1987]. Although these earlier proposals provided guidance for the design of nested relation schemes, they did not succeed in precisely characterizing potential See Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [ 1987]. Much of the work on this paper was done while W. Y, Mok was at Hong Kong Polytechnic. Authors address: Department of Computer Science, Brigham Young University, Provo, UT 84602. Permissiontomake digital/hard copyofpartorallofthiswork for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. IQ 1996 ACM 0362-5915/96/0300-(3077 $03.50 ACMTransactions on Database Systems, Vol. 21, No. 1, March 1996, Pages 77-106

78. W. Y. Mok et al, redundancy. In this article we propose a new normal form for individual nested relation schemes that completely characterizes redundancy with respect to any given set of functional dependencies (FDs) and multivalued dependencies (MVDs). The result we present is a generalization of standard relational normalization theory. We proceed as follows. In Section 2, we provide our basic definitions for nested relations. Like Abiteboul and Bidoit [ 1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [1987], we define our nested relations to be in PNF. In Section 2 we also give carefully specified redundancy definitions. As illustrations for our redundancy definitions, we give examples, which we use later to show that none of the earlier definitions fully detects redundancy. In Section 3, we present our definition, which we call NNF (Nested Normal Form). As we illustrate our definition, we also compare it to earlier definitions and show that ours can provide greater flexibility in how attributes may be clustered in nested relation schemes. In Section 4, we present a theorem guaranteeing that NNF detects potential redundancy. In Section 5, we investigate the converse of this theorem. We show that a nested relation scheme that is not consistent with the given set of MVDs and FDs, as we define consistency, is automatically not in our normal form. In addition, we are able to show that if a nested relation scheme is consistent with the given set of MVDs and FDs and there is no potential redundancy, then the nested relation scheme satisfies our definition of NNF. In Section 6, we show that our definition of NNF is a generalization of standard relational normalization theory. In particular, we show that 4th Normal Form (4NF), as defined by Fagin [1977], is a special case of NNF, and that Boyce-Codd Normal Form (BCNF) is also a special case when we limit the dependencies to FDs. Thus, like other normal forms, our definition of NNF can provide a guide to database design. It also has the drawbacks of these other normal forms, and, in this sense, is not a panacea for database design. We therefore comment on what our definition does and does not provide for the designer. In Section 7, we present a condition that can prevent an unwanted structural characteristic of nested relations, which we call singleton buckets because a nested relation represented by a singleton bucket allows at most one tuple. We then prove that this condition does indeed prevent singleton buckets. Although this condition has nothing to do with redundancy, it is in harmony with earlier definitions [Ozsouyoglu and Yuan 1989; Roth and Korth 1987], that also disallow singleton buckets. In Section 8, we present our conclusions. 2. BASIC DEFINITIONS AND PROPERTIES 2.1. Nested Relations A nested relation allows each tuple component to be either atomic or another nested relation, which may itself be nested several levels deep. As in Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [1987, 1989], and Roth and Korth [1987], we are only interested in nested relations that are in PNF. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations. 79 Thus, in a nested relation, there can never be distinct tuples that agree on the atomic attributes of either the nested relation itself or of any nested relation embedded within it [Atzeni and DeAntcmellis 1993]. Definition 2.1.1. Let U be a set of attributes. A nested relation scheme is recursively defined as follows: (1) If X is a nonempty subset of U, then X is a nested relation scheme over the set of attributes X. (2) If X, Xl,..., Xn are pairwise disjoint, nonempty subsets of U, and R ~,.... R. are nested relation schemes over Xl,,... X. respectively, then X( h!,)*... (R ~)* is a nested relation scheme over XXl... Xm. Definition 2.1.2. Let R be a nested relation scheme over a nonempty set of attributes Z. Let the domain of an attribute A G Z be denoted by dom( A). A nested relation ouer R is recursively defined as follows: (1) If R has the form X where X is a set of attributes {Al,..., A.), n > 1, then r is a nested relation ouer R if r is a (possibly empty) set of functions {tl,.... tm) where each function t,, 1 < i < m, maps A, to an element in dom( A,), 1 s j < n. (2) If R has the form X(RI)*...(R~)*. m >1, where X is a set of attributes (A ~,..., A~), n > 1, then r is a nested relation over R if (a) r is a (possibly empty) set of functions {tl,...,tp}where each function t,, 1< i s p, maps Aj to an element in dom( A, ), 1 < j < n, and maps l?h to a nested relation over Rk, 1 < k < m, and (b) t,c r and t, G r and t,(x) = t,(x) implies t,= t,,1< i, j < p. Each function of a nested relation r over nested relation scheme R is a nested tuple of r Example 2.1.1. Figure 1 shows a nested relation. Its scheme is Dept Chair ( Prof( Hobby)* (Matriculation(Student( Interest )* )*)*)*, and it contains two nested tuples. As in Abiteboul and Bidoit [ 1986], we draw a bucket for each embedded nested relation. Each bucket also contains nested tuples of ita own. For example, {Young, {Chess, Soccer)) and { Barker, {Skiing)) are nested tuples in the first bucket under the embedded nested relation scheme Student( Interest)*. Notice that, as required, PNF is satisfied. Thus the values for the atomic attributes Dept Chair differ, and in each bucket the atomic values differ. Definition 2.1.3. Let R be a nested relation scheme. Let r be a nested relation on R. The total unnesting of r is recursively defined as follows: (1) If R has the form X, where X is a set of attributes, then r is the total unnesting of r. (2) If R has the form X(RI)*...(Rn)*, where X, is the set of attributes in R,, 1 s i s n, then the total unnesting of r = {t] there exists a ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

80. W. Y. Mok et al. Dept Chair ( Prof ( Hobby )* (Matriculation Student ( interest )*) ) ) CS Turing Jane I Skiing I L 1 Ph.D. I Young uchess Socoer s r2xal, Pat I Hitdng I I Ph.D. I Lee I Travel I I I ~ Math Polya Steve Ill&-l I s I caner m] I Fig. 1. Nested relation. nested tuple u c r such that t(x) = u(x) and t(xi) is a tuple in the total unnesting of u(l?i ), 1 S i < n.} Definition 2.1.4. Let R be a nested relation scheme. Let r be a nested relation on R. Let t be a nested tuple of r. The total unnesting of t is defined as the total unnesting of q, where q is a nested relation containing the single nested tuple t. Example 2.1.2. Figure 2 shows the total unnesting of the nested relation in Figure 1. Observe that the first two tuples in the total unnesting contain the total unnesting of the nested tuple (Young, {Chess, Soccer)). 2.2. Scheme Trees We can graphically represent a nested relation scheme by a tree, called a scheme tree. A scheme tree captures the logical structure of a nested relation scheme and explicitly represents a set of MVDs. Scheme trees have been used for earlier normal form definitions for nested relations [Ozsoyogu and Yuan 1987, 1989; Roth and Korth 1987]. We use them here for the same purpose. Definition 2.2.1. A scheme tree T corresponding to a nested relation scheme R is recursively defined as follows: (1) If R has the form X, then T is a single node scheme tree whose root node is the set of attributes X. (2) If R has the form X(RI)*...(R.)*, then the root node of T is the set of attributes X, and a child of the root of T is the root of the scheme tree Ti, where T, is the corresponding scheme tree for the nested relation scheme Ri, 1 s i s n. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations. 81 Dept Chair Prof Hobby Matriculation Student Interest Cs Turing Jane Skiing Ph.D. Young Cs Turing Jane Skiing Ph.D. Young Cs Turing Jane Skiing Ph.D. Barker Cs Turing Jane Skiing fd.s. Adams Cs Turing Pat Hiking Ph.D. Lee Math Polya Steve Dance M.S. Carter Math Polya Steve Dance M.S. Carter Math Polya Steve Hiking M.S. Carter Math Polya Steve Hiking M.S. Carter Chess Skiing Skiing Travel Travel Skiing Travel Skiing Fig. 2. Total unnesting of nested relation infig.1. The one-to-one correspondence between a scheme tree and a nested relation scheme along with the definition of a nested relation scheme impose several properties on a scheme tree. Let T be a scheme tree. We denote the set of attributes in T by Aset(T). Observe that the atomic attributes of a nested relation scheme, at any level of nesting, constitute a node in a scheme tree. Observe further that because Definition 2.1.1 requires nonempty sets of attributes, every node in T consists of a nonempty set of attributes. Furthermore, because the sets of attributes corresponding to nodes in T are pairwise disjoint and include all the attributes of T, the nodes in T are pairwise disjoint, and their union is Aset(T). Let N be a node in T. Notationly, Ancestor(N) denotes the union of attributes in all ancestors of N, including N. Similarly, Descendant N ) denotes the union of attributes in all descendants of N, including N. In a scheme tree T each edge (V, W ), where V is the parent of W, denotes an MVD Ancestor(V) + Descendant(W). Notationly, we use MVD(T ) to denote the union of all the MVDs represented by the edges in T. By construction, each MVD in MVD(T) is satisfied in the total unnesting of any nested relation for T. Because FDs are also of interest, we use FD( T ) to denote any set of FDs equivalent to all FDs X -+ Y implied by a given set of FDs and MVDs over a set of attributes U such that Aset(T ) c U and XY L Aset(T ). (Note that because a set of FDs F together with a set of MVDs M can imply FDs not implied by F alone, FD(T), in general, is not equivalent to the set of FDs in F whose left-hand side is a subset of Aset (T ) and whose right-hand side is restricted to Aset (T ).) Example 2.2.1. Figure 3 shows the scheme tree T for the scheme of the nested relation in Figure 1. Figure 3 also gives the set of attributes in Aset(T ) and the set of dependencies MVD(T). Observe that each of the MVDs in MVD(T ) is satisfied in the unnested relation in Figure 2. 2.3. Data Redundancy Data redundancy is a concern in database design. Redundant data can lead to higher storage and access cost. It can lead to update anomalies, forcing multiple copies of the same data value to be updated when one copy changes, and it can lead to data inconsistency if all copies do not agree. ACM Transactions on Database Systems, Vol 21, No. 1, March 1996.

82. W. Y. Mok et al. T = Dept Chair I Prof /\ Hobby Matriculation I student I Interest Aset(T) = Dept Chair Prof Hobby Matriculation Student Interest MVD(T) = {Dept Chair + Prof Hobby Matriculation Student Interest, Dept Chair Prof + Hobby, Dept Chair Prof + Matriculation Student Interest, Dept Chair Prof Matriculation + Student Interest, Dept Chair Prof Matriculation Student+ Interest) Fig. 3. Scheme tree Z, AaeKZ ), and MVD(Z ) for nested relation scheme in Fig. 1. Except in rare cases, such as Vincent and Srinivasan [1992], papers and textbooks on normalization fail to provide rigorous definitions for redundancy and thus also fail to prove that normalization removes redundancy as expected. Offered instead are motivating examples to illustrate redundancy removal. Thus in the vast body of research literature on normalization, we have mostly only rigorous syntactic justifications for normalization; what we are missing are rigorous semantic justifications. Besides only providing for syntactic characterizations, a danger of not treating redundancy formally is that the examples may be misleading. Indeed, as we show in the following, the definition for 4NF found in most textbooks does not detect potential redundancy for all cases even though some readers of these books are led to believe that it does. In creating definitions for redundancy, we should try to find simple and intuitive characterizations, but creating a simple and intuitive definition for redundancy is more difficult than one might at first think. Any definition will involve a sophisticated statement, and there are many possible approaches one might use. Our notion of redundancy is based on the idea that an atomic value u in a nested or flat relation is redundant if we can erase u, and then from what remains and from a single FD or MVD that holds, determine what u must have been. ACMTransactionsonDatabaseSystems,Vol.21,No. 1, March 1996.

Redundancy in Nested Relations. 83 U = {Dept, Chair, Prof, Hobby, Hobby-Equipment, Matriculation, Student, Interest) F = { Student + Matriculation, Student + Prof. Prof + Dept. Dept + Chair) M = ( Student + Interest, Prof + Hobby Hobby-Equipment, Hobby + Hobby-Equipment) Fig. 4. Some given constraints over a set of attribuks. The way we define holds is important. Here, we adapt Fagin s [ 1977] definition, and we explain it thoroughly before proceeding with our definition of redundancy. Definition 2.3.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. An MVD X + Y holds for T with respect to M and F if X c Aset(T ) and there exists a set of attributes Z & U such that Y = 2 n Aset(T) and M u F implies X + Z on U. An FD X - Y holds for T with respect to M and F if XY G Aset(T ) and M u F implies X + Y on U. This definition is motivated by the following Lemma, which is Theorem 5 in Fagin [ 1977]. LEMMA 2.3.1. Let U be a set of attributes and R G U. Let M be a set of MVDS over U and F be a set of FDs over U. Let X G R, Z c U, and Y= Z~R. If MU Fimplies X+ Zon U,then MU Firnplies X+ Yon R. ROOF. Fagin [1977]. l Example 2.3.1. Figure 4 shows a given set of attributes U and a given set of FDs F over U and a given set of MVDs M over U. All the FDs in F in Figure 4 hold for the scheme tree T in Figure 3, as do all the FDs implied by M u F. Not all the MVDs in M hold for T. In particular, neither Hobby + Hobby-Equipment nor Prof + Hobby Hobby-Equipment hold for T. Because Hobby Hobby-Equipment ~ Aset (T) = Hobby, however, Prof + Hobby does hold for T. Although Prof + Hobby holds for T, obsewe that it is not implied by Mu Fon U. As illustrated in Example 2.3.1, certain MVDs hold for a relation scheme even when they are not implied by a given set of FDs and MVDs. It is those that hold that are of interest to us. We now return to our task of defining redundancy. Because our definition depends on the validity of a nested relation, however, we must first define what it means for a relation to be valid for a given set of FDs and MVDs. Definition 2.3.2. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that ACM Transactions on Database Systems, Vol 21, No. 1, March 1996.

84. W. Y. Mok et al. Aset(Z ) c U. Let r be a nested relation on T. Nested relation r is valid with respect to lkf U 1 if in the total unnesting of r, every FD and every MVD that holds for T with respect tq M and F is satisfied. We now define redundancy. The definition has two parts: FD redundancy and MVD redundancy. Definition 2.3.3. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) G U. Let XY G Aset(T), and let X-* Y be an FD or an MVD that holds for T with respect to M and F and has an attribute A Y and A 6X. Let r be a nonempty nested relation on T that is valid with respect to M U F. Let S be a subtree of T that contains A as an atomic attribute, and Ietsl,..., Sn be the nested relations over S in r. Let u~ and u~ be distinct nested tuples of Si and Sj, respectively, 1 s i, j s n, such that Ul( A) = U, U2(A) = u, and u = u. (Nots that i = ~ is possible so that si and sj may either be the same nested relation under S or may be in different nested relations under S.) Let tl and t~ be distinct tuples in the total unnesting of r such that tl( Aset(S)) and tz( Aset(S )) are tuples in the total unnesting of U1 and u~, respectively. FD redundancy, when X-* Y is X + Y: If tl(x) = tz(x), then atomic value v is a redundant atomic value in r caused by X + Y. MVD redundancy, when X-* Y is X + Y: If tl(x) = tz(x), tl(y) = tz(y), and tl(z) # tz(z) where Z = Aset(T) (xy), then atomic value v is a redundant atomic value in r caused by X * Y. Example 2.3.2. Let Student + Dept and consider the nested relation and its total unnesting in Figure 5. Both appearances of Math are redundant in both the nested and unnested relation. We can see this formally as follows. Let t~ be the third tuple and tz be the last tuple in the unnested relation. Now we have tjstudent) = tjstudent), and thus Math in the third tuple of the unnested relation is redundant. Because Math in the third tuple of the unnested relation comes from the first nested tuple in the nested relation, Math in the first nested tuple of the nested relation is redundant. By reversing t~ and t~, we can see formally that Math in the second nested tuple of the nested relation and in the last tuple of the unnested relation are also redundant. It is possible for a value not tq be redundant in a nested relation and yet be redundant in the total unnesting of the relation. h-ideed, this is often the reason we create nested relations h remove redundant values. Example 2.3.3. Suppose Student + Interest and we allow students to have multiple majors. Now consider the nested relation and its total unnesting in Figure 6. Observe that Skiing is redundant in the unnested relation. However, in the nested relation, Skiing is not redundant because it appears only once. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations. 85 Student + Dept Interest (Dept (Student)*)* Skiing s w at 15!!ll Travel Cs I Lee I I I I Fig.5. Redundancy caused byan FD. Interest Dept Student Skiing CS Barker Skiing Cs Adams skiing Math Carter Travel s Lee Travel Math Carter Student + Interest Interest ( Dept (Student )*)* Skiing w Travel ~ Fig.6. Elimination ofredundancy by nesting. Interest Dept Student Skiing CS Barker Skiing CS Adams Skiing Math Barker Travel Math Catter ACM Transactions on Database Systems, Vol. 21, No. l, March 1996

86. W. Y. Mok et al. Example 2.3.4. Let Prof + Dept Student and Prof + Hobby Hobby- Equipment and consider the flat relation in Figure 7. Because the scheme of the relation in Figure 7 is Prof Student Hobby and Prof + Dept Student and Prof + Hobby Hobby-Equipment, by Lemma 2.3.1 Prof + Student and Prof + Hobby hold for the scheme Prof Student Hobby. But now all the data values under Student and Hobby are redundant, as can be seen formally by appropriately picking two distinct tuples and choosing which attribute and value to consider. For example, let tl be the first tuple and t~ be the second tuple, then Young in t~ is redundant because tj Prof ) = tjprof ), tl(student) = tz(student), and tl(hobby) # tjhobby). As an aside, we observe here that by the common definition of 4NF found in most textbooks (e.g., [Korth and Silberschatz 1991; Maier 1983]) the relation scheme in Figure 7 is in 4NF. This is because no nontrivial MVD, given or implied, applies to the scheme, where applies means that the set of attributes that constitutes the MVD is a subset of the scheme. In particular for Example 2.3.4, neither Prof + Student nor Prof + Hobby is implied by or is in the given set of MVDs {Prof + Dept Student, Prof + Hobby Hobby-Equipment). According to the original definition given by Fagin [1977], however, the relation scheme in Figure 7 is not in 4NF. Fagin s definition not only considers all nontrivial MVDs that are given or implied (without regard to the scheme under consideration), but also the MVDs that hold when the scheme is considered. Example 2.3.5. To show an example of a nested relation (with embedded relations) that has redundancy caused by an MVD, we present one more example of redundancy. Let U = { Prof, Article-Title, Publication-Location) and let Prof - Article-Title and Article-Title + Prof be the MVDs. (Note that Example 2 in Beeri and Kifer [1986] has exactly the same characteristics). Consider the nested relation and its total unnesting in Figure 8. Based on the MVD Article-Title + Publication-Location, which holds for the nested relation scheme in Figure 8, all the values under Publication-Location in the nested relation are redundant. We can see formally, for example, that the last Hong Kong value under (Publication-Location)* is redundant by letting t~ be the last tuple and t~ be the 4th tuple in the unnested relation. Thus t1(article-title) = t2(article-title), t ~(Publication-Location) = t2(publica - tion-location), and tl(prof) + tjprof). Definition 2.3.3 tells us what it means for an individual atomic value to be redundant in a nested relation for an FD or MVD that holds. Our next definition ties together the notion of a redundant data value in a nested relation and the notion of a nested relation scheme that permits valid nested relations that contain redundancy. It is this definition that allows us to later show that our normal form definition detects redundancy. Definition 2.3.4. Let U be a set of attributes. Let M be a set of ~s over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. T is said to have potential redundancy with respect to M U F if there exists a redundant atomic value in any valid nested relation for T caused by either an FD or an MVD that holds for T with respect to M and F. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

Redundancy in Nested Relations. 87 Prof + Prof + Dept Student Hobby Hobby-Equipment Prof Student Hobby Fig. 7. A flat relation with redundancy Jane Young Reading Jane Young Skiing Jane Barker Reading Jane Barker Skiing Prof ~ Article-Title Article-Title + Prof Prof ( Article-Tfile )* (Publication-Location)* Steve Pat lx&xn&lw Programming in Ada W&&!9_l Prof Steve Steve Steve Steve Pat Pat Article-Tfile Programming in C++ Programming in Ada Programming in C++ Programming in Ada Programming in Ada Programming in Ada Publication-Location USA USA Hong Kong Hong Kong USA Hong Kong Fig.8. Redundancycausedby a MVD Example 2.3.6, Because the nested relations in Figures 5, 7, and 8 all have redundancy, the nested relation schemes in Figure 5, 7, and 8 are all said to have potential redundancy with respect to the FDs and MVDs given for the examples. 3. NESTED NORMAL FORM we motivate the need for a new normal-form definition for nested relations by making certain observations about the examples we have presented. First, if we are given the FDs and MVDs in Figure 4, none of the earlier normal-form definitions [Ozsoyoglu and Yuan 1987, 1989; Roth and Korth 1987] allow the ACM Transactionson DatabaseSystems,Vol 21, No. 1, March 1996.

88. W. Y. Mok et al. scheme of the nested relation in Figure 1, which is also equivalent to the scheme tree T in Figure 3. They therefore do not allow the nested relation in Figure 1 even though it is a good clustering for this application and has no redundancy. For a scheme tree T to satisfy the earlier normal-form definitions, T must satis& four conditions. It turns out that T in Figure 3 does not satisfy the fourth condition for any of these previous definitions. One requirement of the fourth condition insists that the root of a scheme tree be the left-hand side of a reduced nontrivial MVD, but all (implied) MVDs with Dept Chair as the left-hand side are trivial. In fact, this is not the only violation. In particular, Matriculation cannot be an inner node of T in Figure 3. For the definitions in Ozsoyoglu and Yuan [1987, 1989], there are even partial MVDs in T because of the edges (Prof, Hobby) and (Student, Interest). Because there are unnecessary conditions in these previous normal form definitions, they all restrict attribute clustering and design flexibility, as these examples show. In fact, these conditions can lead to unnecessary decompositions of schemes. Second, all the earlier definitions [Ozsoyoglu and Yuan 1987, 1989; Roth and Korth 1987] allow the scheme of the nested relation in Figure 8, but as pointed out in Example 2.3.5, the nested relation has redundancy. We can see that the earlier definitions allow the scheme of the nested relation in Figure 8 as follows. Let T be the scheme tree for the scheme of the nested relation in Figure 8, and assume we are given the set of MVDs, M = {Prof + Article- Title, Article-Title + Prof }, and the empty set of FDs. When there are no FDs, all three earlier definitions are equivalent. Now observe that MVD(T) = {Prof + Article-Title, Prof + Publication-Location}. Because ill implies MVD(T), the first condition of their definitions is satisfied. Because M does not imply an MVD with a left-hand side that is a proper subset of Prof, T has no partial MVDs, and thus their second condition is satisfied. Article-Title + Prof, T has no transitive MVDS, and thus their third condition is satisfied. Each node in the scheme tree for Figure 8 is a single attribute, therefore there can be no decomposition of nodes, and thus their fourth condition is satisfied. We now give our definition for Nested Normal Form. Definition 3.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) G U. T is in Nested Normal Form (NNF) with respect to M u F if the following conditions are satisfied. (1) If D is the set of MVDs and FDs that hold for T with respect to M u F, then D is equivalent to MVD(T) u FD(T) on Aset(T). (2) For each nontrivial FD X - A that holds for T with respect to M u F, X + Ancestor(N~) also holds with respect to M u F, where NA is the node in T that contains A. Example 3.1. Suppose we are given U, F, and M as in Figure 4. Then the scheme tree T in Figure 3 is in NNF. We can see this from our definition as follows. We have observed in Example 2.3.1 that Hobby + Hobby-Equipment does not hold for Aset(T). The set of MVDs and FDs that do hold for T is ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

Redundancy in Nested Relations. 89 equivalent to F u {Student + Interest, Prof + Hobby} considered on Aset(T ). This set is thus equivalent to D in the NNF definition. MVD(T ) is the set of MVDS in Figure 3, and we can let F in Figure 4 be FD( T ). We can convince ourselves that Condition 1 is satisfied by applying a few standard MVD and FD derivation rules. For example, we can derive Prof + Hobby from MVD(T) and FD(T ) by using the FDs in FD(T) to obtain Prof + Dept Chair Prof, converting this derived FD into an MVD, and then applying transitivity with Dept Chair Prof + Hobby in MVD(T) to obtain Prof + Hobby. To convince ourselves that Condition 2 is satisfied, we consider Student + Matriculation, which holds for T, and observe that Ancestor( Matriculation) = Matriculation Prof Dept Chair and that Student + Matriculation Prof Dept Chair is implied. Hence Student + Ancestor( Matriculation). We similarly check each nontrivial FD in FD( T), which is suficient to ensure that Condition 2 is satisfied. Example 3.1 not only illustrates NNF in a nontrivial case, but also shows that our definition accepts the nested relation scheme in Figure 1, which we consider to be good, but which is rejected by all the earlier definitions as previously explained. We now continue by giving two more examples, one that violates Condition 1 of NNF and one that violates Condition 2. Our example that violates Condition 1 also shows that NNF detects the problem of the nested relation scheme in Figure 8. It therefore recognizes the scheme that allows redundancy, but is not detected by the earlier definitions. Example 3.2. Let U = {Prof, Article-Title, Publication-Location), M = {Prof + Article-Titie, Article-Title + Prof}, and F = 0. As in Figure 8, let T be Prof( Article-Title)* (Publication-Location)*. T does not satise Condition 1 because MVD(T) U FD(T), which is {Prof * Article-Title, Prof + Publica - tion-location), is not equivalent to the set of FDs and MVDs that hold for T. For example, we cannot derive Article-Title + Prof from {Prof + Article- Title, Prof + Publication-Location}. Thus Condition 1 is not satisfied. Example 3.3. Let U = {Interest, Dept, Student), M = 0, and F = {Student + Dept). As in Figure 5, let T = Interest(Dept(Student)* )*. T does not satisfy Condition 2 because Student + Dept is a nontrivial FD that holds for T and Ancestor( Dept ) = Dept Interest, but Student + Dept Interest. 4. NNF AND POTENTIAL REDUNDANCY In this section we prove one of our main results, In particular, we prove that a nested relation whose scheme is in NNF for a given set of FDs and MVDs cannot have redundancy with respect to the given FDs and MVDs. Many of the lemmas here depend on a set of FD and MVD derivation rules. We use the following rules, where X, Y, Z, V, W, and Z are all subsets of a set of attributes R: FD derivation rules: F1: (reflexivity) Y c X implies X + Y. F2: (augmentation) X + Y and V c W imply XW + YV. F3: (transitivity) X + Y and Y + Z imply X + Z. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

90. W. Y. Mok et al. MVD derivation rules: Ml: (reflexivity) Y c X implies X ~ Y. M2: (augmentation) X a Y and Z c Z imply X2 -B YZ. M3: (transitivity) X ~ Y and Y a Z imply X -P Z Y. M4: (complementation) X ~ Y implies X ~ R (xy). M5: (trivial complementation) X -+ R X. Combined FD and MVD derivation rules: Cl: (replication) X + Y implies X a Y. C2: (coalescence) X ~ Y, Z a W, W c Y, and Y n Z = 0 imply X + W. These FD and MVD derivation rules are sound and complete [Beeri et al. 1977], but not minimal. Indeed, part of what we show is that Ml (reflexivity) is not needed so that without it the derivation rules are sound and complete. The more common choice, of course, is to retain Ml and omit M5. For our proofs about scheme trees, however, it is often required that our MVDs stretch from root to leaf. We therefore use the alternative choice for trivial MVDs. Because this choice is not common, we prove in Lemma 4.1 that this is possible. In addition, we add a corollary that tailors the lemma only for the case of MVDs. LEMMA 4.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let Z -B W be an MVD implied by M U F on U. There exists an (M u F)-based derivation sequence for Z -B Won U that uses only FI F3, M2 M5, and C1 C!2. ~OOF. Because Z ~ W is implied by M U F on U and the derivation rules F1 F3, M 1 M5, C1, and C2 are sound and complete, there exists a derivation sequence S for Z - W on U using these rules. If S does not include Ml, we are done. Otherwise, we replace each usage of Ml as follows: X ~ Y, by Ml (reflexivity) where Y ~ X, by the following sequence of derivation rules: X ~ Y, by F1 (reflexivity) because Y c X. X ~ Y, by C 1 (replication). COROLLARY. If F = 0, there exists an M-based derivation sequence for Z + W that uses only the MVD rules M2 M5. ~OOF. Because M1 M4 are sound and complete when no FDs are given, there exists a derivation sequence S for Z -B W that uses only M1 M4. If S does not include Ml, we are done. Otherwise, we replace each usage of Ml by the following sequence of derivation rules: X ~ R X, by M5 (trivial complementation). X ~ R (X(R X)), by M4 (complementation). X ~ 0, because R (X(R X)) = 0. XY + Y, by M2 (augmentation). X ~ Y, because Y c X. l ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

Redundancy in Nested Relations. 91 Lemma 4.2 guarantees us that if an attribute of a node N in a scheme tree is in the right-hand side, but not the left-hand side, of a nontrivial MVD and if the MVD is implied by the MVDs of the scheme tree, then all the attributes in N are included in the MVD. LEMMA 4.2. Let U be a set of attributes. Let T be a scheme tree such that Aset(T) & U. Let XY G Aset(T). Let X + Y be an MVD in MVD(T) on Aset(T) such that A G Y and A E X. Let A be in node N of T, then N c XY. WOOF. Because X ~ Y is an MVD in MVD(Z ) on Aset(Z ) and MVD(T ) consists only of MVDS, there exists an ikfvd(t)-based derivation sequence S for X * Y on Aset(Z ), that by the Corollary to Lemma 2, uses only the MVD rules M2 M5. We show by induction on the number of MVDs n in S that for every MVD X + Y in S, if A is an attribute in node N of T such that A ~ Y and A @X, then N GX Y, Thus because X + Y is in S, N cxy. Basis: Suppose n = 1,because only rules M2-M5 are used and M2-M4 require antecedents, X + Y is either given or introduced by M5 (trivial complementation). If X ~ Y is given, then X + Y= iwyd(t), and thus Y zn. If X + Y is introduced by M5 (trivial complementation), XY = Aset (7 ), and thus N c XY. Induction: X + Y can be introduced by any of the MVD derivation rules M2-M5 or as a given MVD in MVD(T), and therefore we have five cases to consider. Because the cases for given MVDs and M5 (trivial complementation) have no antecedents, they are the same as in the basis. Therefore, we can reduce the cases to three. (1) X ~ Y is introduced by M2 (augmentation). Let V + W be the antecedent MVD and let Z G Z such that X = VZ and Y = WZ. If A Y X, then because Z c Z, A 6 W and A @ V. By the induction hypothesis, N c VW and thus N G X Y. (2) X ~ Y is introduced by M3 (transitivity). Let V - W and W + Z be the antecedent MVDs. Thus X = V and Y = Z W. Now assume there exists an attribute A in node N of T such that A e Y and A E X, but N ~ X Y. Then there exists an attribute B such that B 6 N and B @ X Y. Because B @ X Y and X = V and Y = Z W, B @ V and either B G W or B ~Aset(T) (VWZ). Suppose B E W, then because B G W and B @ V, by the induction hypothesis, N c VW. A N, therefore A G VW. But because A @ X and X =V, AEV; and because A~Y and Y =Z W, AEW. Thus A @ VW. We therefore suppose that B G Aset(T ) (VWZ ). But now we have A G Y, Y = Z W, and therefore A c Z, A E W, and A = N. Therefore, by the induction hypothesis, we have N L WZ. Because B G N, B = WZ. However, B G Aset(T) ( VWZ) and thus, B # WZ a contradiction. (3) X ~ Y is introduced by M4 (complementation). Let V ~ W be the antecedent MVD. Thus X = V and Y = Aset(T) (VW). Now assume there exists an attribute A in node N of T such that A G Y ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

92. W. Y. Mok et al. and A @ X, but N ~ X Y. Then there exists an attribute B such that B N and B Q!X Y. Hence B E V(Aset(T) (VW)) and thus B E W and B E V. Because B N, by the induction hypothesis, N G VW. A G N, therefore, A e VW. However, because A ~ Y and Y = Aset(T) (VW), A @ VW-a contradiction. Lemma 4.3 extends Lemma 4.2 to not only guarantee us that a node is included, but also that all ancestors and descendants of the node are included. That is, Lemma 4.3 guarantees us that if an attribute of a node in a scheme tree is in the right-hand side of a nontrivial MVD (but not in the left-hand side), and if the MVD is implied by the MVDs of the scheme tree, then both the ancestors and the descendants of the node are included in the MvD. LEMMA 4.3. LQt U be a set of attributes. Let T be a scheme tree such that Aset(T) c U. Let XY c Aset(T). Let X + Y be a nontrivial MVD in MVD(T)+ on Aset(T). Let A be an attribute such that A = Y and A z X, and let A be in node N of T. Then both Ancestor(N) G XY and Descendant(N) c XY. ROOF. As in Lemma 4.2, we show by induction on the number of MVDs n in the &fvd(t)-based derivation sequence S for X a Y on Aset (T ) that for every MVD X ~ Y in S if A is an attribute such that A G Y and A @ X, and if A is in node N of T, then both Ancestor(N) c X Y and Descendants(N) G X Y. Basis: Suppose n = 1. Because S has no MVD introduced by Ml (reflexivity) and there is only one MVD X ~ Y in S, X + Y is given or is introduced by M5 (trivial complementation). If X ~ Y is given, then X ~ Y G MVD(T). As argued in Lemma 4.2, Y 2 N, and since X ~ Y = MVD(T), therefore both Ancestor(N) c XY and Descendant(N) c XY. If X ~ Y is introduced by M5 (trivial complementation), XY = Aset(T). Hence every node is a subset of XY, and thus, both Ancestor(N) G XY and Descendant(N) c XY. Induction: As in Lemma 4.2, we have only three cases to consider. (1) X ~ Y is introduced by M2 (augmentation). The argument is similar to the proof of Case 1 in Lemma 4.2. (2) X ~ Y is introduced by M3 (transitivity). Let V + W and W + Z be the antecedent MVDs. Thus X = V and Y = Z W. Let A be an attribute in node N of T such that A G Y and A G X. Hence, by Lemma 4.2, N G X Y. We claim that Ancestor(N) g X Y. If not, then there exists an attribute B G Ancestor(N) such that B @ X Y. Because B @ X Y and X = V and Y = Z W, B @ V(Z W). Thus B @ V and either B G W or B G Aset(T) (VWZ). We first assume that B G W. Because B g V and B G W, by Lemma 4.2, B is in a node N such that N G VW. B G Ancestor(N) and B G N, thus N G Descendant(N ) and A G ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations. 93 Descendant(i V ). By the induction hypothesis, Descendant(N ) c VW. Thus A G Descendant(N ), and therefore A G VW. But because A~Y and Y =Z W, A@ W,and A@ Xand X = V, A E V. Thus A E VW-a contradiction. Thus we assume that B = Aset (T) (VWZ ). Because A G Y and Y = Z W, A = Z and A g W. Thus by the induction hypothesis, Ancestor(N) c WZ, 23 = Ancestor(N) and Ancestor(N) c WZ, and therefore B E WZ. Thus 1? @ Aset(Z ) ( VWZ ) a contradiction. By an identical argument with Descendant replacing Ancestor and vice versa, Descendant N) G X Y. (3) X + Y is introduced by M4 (complementation). The argument is similar to the proof of Case 3 in Lemma 4.2. Lemma 4.4 tells us that the set D of dependencies that holds for a scheme tree T is the closure of itself on Aset(T). LEMMA 4.4. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs ol)er U. Let T be a scheme tree such that Aset(T ) c U. L-et D be the set of dependencies in (M u F)+ that hold for T. Then D+= D on Aset(T). PROOF. The strategy for proving this result is to show that none of the inference rules can add a new dependency that is not already in D. All the cases except for C2 are straightforward, therefore we just prove that C2 cannot add a new FD. Let X + Y be an MVD in D, and let Z - W be an FD in Dsuchthat WgYand Y nz=o. Thus X+ Yand Z +W hold for T. Because X + Y holds for T, XY G Aset(Z ) and there exits an MVD X + Y in (M u F)+ such that Y = Y n Aset(T). Z + W holds for T, and therefore Z + W is in (M u F)+ and Zw ~ Aset(T). Because W c Y and Y G Y, W G Y. Y = Y n Aset(T), Z CAset(T), and Y n Z = 0, therefore Y n Z = 0. Hence X + W is in (M U F)+. Because XW G Aset(T), X + W is already in D. u Lemma 4.5 provides an interesting result: the given FDs can be disregarded if we are only interested in certain implied MVDs. In particular, if MVD(T) and FD(T ) imply an MVD X + Y, then if we close the left-hand side of the MVD under MVD(T) and FD(T) on Aset (T ) to obtain X+, MVD(T) alone is sufficient to imply X++ Y, The converse also holds, and although we do not need the converse for Theorem 4.1, we provide it here because we need it later for a lemma required for Theorem 5.2. LEMMA 4.5. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T ) c U. Let XY Q Aset(T). If MVD(7 ) u FD(T) implies X + Y on Aset(T), then MVD(T) implies XL VD{T )b FD( T ) + Y on Aset(T) and conversely. PROOF. The result can be proved by using Theorem 1 in Beeri and Kifer [1986]. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

94. W. Y. Mok et al. Lemma 4.6 begins to directly address the redundancy issue in nested relations. We use it twice in Theorem 4.1, and thus we write it separately as a Lemma. Before stating and proving Lemma 4.6, we need a definition for a path in a scheme tree. Definition 4.1. A path of a scheme tree T is a sequence of nodes NI,..., N. where NI is the root of T and N. is a leaf node of T and Ni is the parent of Ni+l,l<i<n l. LEMMA 4.6. Let T be a scheme tree. Let r be a nested relation on T. Let A be an attribute in node N~ of T. If tl and tz are distinct tuples in the total unnesting of r such that tl( Ancestor(N~)) = tz(ancestor(n~)), then tl( A) and tz(a) are in the same nested tuple of a single nested relation under the nested relation scheme whose set of atomic attributes is NA. PROOF. Because we only allow PNF nested relations, this result can be proved easily by using Definition 2.1.2. u THEOREM 4.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. T has no potential redundancy with respect to M ~ F if T is in NNF with respect to MuF. PROOF, Assume not, then T has potential redundance with respect to M u F. Thus by Definition 2.3.4, there exists a redundant atomic value u in a valid nested relation r on T caused by a dependency X * Y, which is either an FD or an MVD, that holds for T with respect to M and F. In either case, by Definition 2.3.3, we have the following: XY G Aset(T), there exists an attribute A such that A G Y and A @ X, there exists a subtree S of T that contains A as an atomic attribute, there exist distinct nested tuples u ~ and u~ that are respectively in two (not necessarily distinct) nested relations on S, and there exists an atomic value v, such that UI(A) = U, UZ(A) = u, and v = v. Furthermore, there exist distinct tuples t~ and tz in the total unnesting of r such that ti(aset(s)) and tz( Aset(S)) are tuples in the total unnesting of u ~ and u ~, respectively. First assume that the redundancy is caused by an FD. Without loss of generality, we may assume that Y = A and thus that the FD is X + A. Because u is a redundant atomic value in r caused by X + A, by the FD part of Definition 2.3.3, tl( X) = tz(x). A is an atomic attribute in S, therefore A is in the root node of the subtree S of T. Let the node that contains A be denoted by NA. Now, either t1(ancestor(n~)) = tz(ancestor(na)) or t~(ancestor(n~)) # tz(ancestor(na)). Case 1. Suppose t1(ancestor(n~)) # tz(ancestor(n~)). Because A @X, X + A is nontrivial. Because ti(x) = tjx) but t1(ancestor(n~)) # tz(ancestor(na)), X + Ancestor(NA). Thus there exists a nontrivial FD X + A that holds for T, but X + Ancestor(NA), where NA is the node in T that contains A, but this contradicts Condition 2 of NNF. ACM Transactions on Databaas Systems, Vol. 21, No, 1, March 1996.

Redundancy in Nested Relations. 95 Case 2. Suppose tl( Ancestor( N~ )) = t2( Ancestor( N~ )). Because r is a nested relation on T, A is an attribute in node N~ of T, and t~ and t~ are distinct tuples in the total unnesting of r such that t I( Ancestor( NA)) = tz( Ancestor(N~)), by Lemma 4.6, tl(a) and t2(a) are in the same nested tuple of a single nested relation under the nested relation scheme whose set of atomic attributes is NA. S is the nested relation scheme whose set of atomic attributes is NA, therefore t 1(A) and t2( A) are in the same nested tuple under S, A G N~ and NA is the root node of subtree S, therefore A G Aset(S ). Because A G Aset(S ), ti(aset(s )) and tz( Aset(S )) are tuples in the total unnesting of u ~ and u~, respectively, t1(a) and t2(a ) are respectively in U1 and Uz. tl(a) is in u,, t2(a)) is in U2, and UI and Uz are distinct nested tuples under S, thus tl( A) and tz( A) are in distinct nested tuples under S, a contradiction to our earlier established fact that t I( A) and t2(a) are in the same nested tuple under S. The redundancy cannot be caused by an FD, and we thus assume that the dependency X * Y that causes the redundancy is the MVD X + Y. Because ~~is a redundant atomic value in r caused by X + Y, by the MVD part of Definition 2.3.3, tl(x) = t2(x), tl(y) = t2(y), and tl(z) # t2(z) where Z = Aset(T) (XY). Z = Aset(T) (XY) and tl(z) # tz(z), Z # @ and thus XY # Aset(T). A = Y X, therefore Y ~ X. Because XY # Aset(T) and Y ~ X, X + Y is a nontrivial MVD on Aset(T). A G Aset(Z ), and A is in a node of T. Let NA be the node of T that contains A. X + Y holds for T given M U F, therefore X + Y E D and thus ~ implies X + Y. By Condition 1 of NNF, D is equivalent to MVD(T) U FD(T) on Aset(T), thus MVD(T ) u I D(T) implies X + Y on Aset(T). Because MVD(T) U I D(T) implies X + Y on Aset(T), by Lemma 4.5, MVD(T) implies XfVD(T)u FD(TI + y on Aset(T). Let X+ = X~VD(T) FD(T), then MVD(T) implies X++ Y and thus X++ Y G MVD(T) + on Aset(T). By Lemma 4.4, D = D on Aset(T) and by Condition 1, MVD(T) u FD(T) is equivalent to D on Aset(T ), therefore each MVD and FD in ( MVD(T ) u FD(T)) + on Aset(Z ) is in D. Because X + X+ is implied by MVD(T) U FD(T) on Aset(T) and each MVD and FD in (MVD(T) u F~(T))+ on Aset(T) is in D, X +X+ is in D. Every attribute of every MVD and FD in D is in Aset(T) and X + X+ is in D, thus XX+ G Aset(T). Because X + X+ and XX+ c Aset(T) and tl(x) = t2(x), tl(x+) = tz(x+). Also, A @ X+; otherwise, X + A holds for T, and by an argument similar to the preceding FD case, we can arrive at a contradiction. Furthermore, Z ~ X+; otherwise, X + Z holds for T, which, because tl(x) = t2(x), makes tl(z) = t2(z), which contradicts tl(z) # t2(z). Now let Z = Aset(7 ) (X Y), Then Z # @ because Z = Aset(T) (xy), Z g X+, Z + 0, and Z =Aset(T) (X+ Y). A G Y and A @ X+, therefore Y ~ X+; and because Z (= Aset(T) (X+ Y)) # 0, X + Y is a nontrivial MVD on Aset(T). We now have X+ Y G Aset(T), X+ + Y is a nontrivial MVD in MVD(T)+ on Aset(T), A G Y, A @ X+, and A is in node NA of T, thus by Lemma 4.3 Ancestor( N~) c X+ Y. Because tl(x+) = tz(x+) and tl(y) = tz(y), tl(x+y) = tz(x+y). tl(x+y) = tz( X Y), and Ancestor(N~) G X Y, therefore tl( Arzcestor(NA )) = t2( An- ACM Transactions on Database Systems, Vol. 21, No. 1, Msrch 1996

96. W. i. Mok et al. ce.stor( NA )). Because t I( Ancestor(NA )) = t2( &cestor(na )), we may use the argument in Case 2 for FDs to obtain a contradiction. 5. POTENTIAL REDUNDANCY AND NNF If we could prove the converse of Theorem 4.1, we would have a precise characterization of potential redundancy in nested relations in terms of NNF. Unfortunately, the converse is not true. With a small adjustment, however, we can malce it true. The problem is that we might have a scheme tree that is not consistent with the given FDs and MVDs. We define consistency as follows. Definition 5.1. Let U be a set of attributes. Let lf be a set of MVDs over U and F be a set of FDs over ~. Let T be a scheme tree such that Aset(T) c U. Let D be the set of MVDs and FDs that hold for T with respect to M and F. A scheme tree T is consistent with M and F if D implies MVD(T) on Aset(T). Example 5.1. As a counterexample to show that the converse of Theorem 4.1 does not hold and to motivate our desire for consistency, consider the following. Let U = ABC. Let M be the empty set of MVDs and F be the empty set of FDs. Let T be the scheme tree of the nested relation scheme A(B)*(C)*. Now iwvd(t) = {A + B, A + C), and we can let FD(T) = 0. In as much as both M and F are empty, the set of FDs and MVDs D that hold for T includes only trivial dependencies. Thus D is not equivalent to MVD(T) U FD(T) and hence T is not in NNF. However, the only constraints that apply are trivial, therefore there is no potential redundancy. Thus we have a scheme tree that has no potential redundancy, but is not in NNF. T, however, is not consistent because the trivial dependencies are insu~lcient to imply MVD(T ), which contains nontrivial dependencies. Intuitively, we should not want any scheme tree that implies nontrivial MVDs unless the implied nontrivial MVDs are given or implied by given constraints. That is, we should only be interested in consistent scheme trees. Therefore, the consistency requirement does not turn out to be a problem, and we can have what we want. If we assume that scheme trees are consistent with a given set of lnlls and FDs, we can obtain a precise characterization of potential redundancy in nested relation schemes. For then, as we show in this section, a consistent scheme tree T has no potential redundancy if and only if T is in NNF. We now proceed to prove this result. We first prove in Theorem 5.1 that our NNF definition not only implies that a scheme tree has no potential redundancy, but also that it is consistent. This result follows almost immediately from Theorem 4.1. THEOREM 5.1. Let U be a set of attributes. Let M be a set of MVDS over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. T is consistent with M and F and has no potential redundancy with respect to M u F if T is in NNF with respect to M u F. ACM Transactions DatabaseSystems,Vol. 21, No. 1, March 1996

Redundancy in Nested Relations. 97 PROOF. By Theorem 4.1, T has no potential redundancy with respect to ~ u F. By Condition 1 of NNF, ~ is equivalent to MVD(T) u FD(T) on Aset(T). Thus D implies MVD(T) on Aset(T) and hence T is consistent with M and F. Before proceeding with the proof of the converse of Theorem 5.1, we need a result about paths in scheme trees, which we prove in Lemma 5.1, and a result about potential redundancy, which we prove in Lemma 5.2. We then prove that if we allow only consistent scheme trees and some scheme tree does not satisfi Condition 1 of NNF, it has potential redundancy (Lemma 5.3) or if it does not satisfy Condition 2 of NNF, it has potential redundancy (Lemma 5.4). LEMMA 5.1. Let U be a set of attributes. Let T be a scheme tree such that Aset(T) c U. Let X + Y be an MVD on Aset (T). If MVD(T) does not imply X + Y on Aset(T), then there is a path p of T whose set of attributes is P such that (Y X) n P and (Aset(T) (XY)) n P are both nonempty proper subsets of P. PROOF. Throughout the proof, all implications (and thus also all closures of attributes) are taken with respect to Aset(T ). Furthermore, because by Lemma 4.4, D+= D on Aset(T), all implications on Aset(T ) remain on Aset (T ). We therefore omit references to Aset (T), leaving this understood without explicit mention. X + Y must be nontrivial; otherwise, MVD(T ) implies X + Y. Assume that the paths of T are p,,.... P., n ~ 1. Let the set of attributes Of Pi be P,, 1 < i s n. Let Z = Aset (T) ( XY ). We proceed by contradiction, and thus assume that for all i, 1 < i < n, either (Y X) n P, is not a nonempty proper subset of P, or Z n PCis not a nonempty proper subset of P,. Thus for all i, 1 s z s n, either P, GXYor p, cxz. Because P, c XY or P, c X2, for all i, 1< i < n, if there is no P, G X2, l<i<n, then P,& XY, l< i<n. Butif P, GXY, l<i< n, then because PI u... u P. = Aset(T), XY = Aset(T), and thus X + Y is trivial. Similarly, if there is no P, c XY, 1 s i < n, then X + Z is trivial, Thus (after reindexing, if necessary) there is an index q, 1 < q < n, such that P, c XY, for each i,l<i<q, and~, CXZ, for each j,q+l Sj<n. Let V= PIU.. U Pq and W= P~+l u... up.. We next show that MVD(T) implies (V n W) + V. For any scheme tree, there is a one-to-one correspondence between leaf nodes and paths. Thus if we let L, be the leaf node of path pi, 1 < i < n (under the possible reindexing), L, GVforl<i<q and L,g VnWforl<isq. Every path in any scheme tree includes the root, and the root of T is in V n W. Thus, for every path P,, 1 < i s q, we know that the leaf Li of p, is not in V n W and that the root of p, is in V n W. Hence for every path p,, 1 s i s q, there exists a lowest level node N,, 1< i < q, that is not the leaf and is in V n W, None of the nodes N,, 1 s i s q, is a leaf, therefore each has one or more children. Let N,, 1 < i < q, be the child of N, on path p,. Now by the definition of MVD(T), Ancestor( N,) + Descendant(N~) = MVD(T), 1< i < q. Because N, ACM Transactions on Database Systems, Vol 21, No. 1, March 1996

98. W. Y. Mok et al. G V n W, by the definition of V and W, Ancestor(Ni ) G V n W, 1< i s q. Ancestor (Ni) c V n W, 1 s i s q, by M2 (augmentation), thus MVD(T) implies V n W + Descendant(N~), 1 s i s q. By Ml (reflexivity), MVD(T) implies V n W + V n W. By our construction process (V n W ) u ~escendant(n{) U... u Ilescendant( NJ) = V. Thus by applying the MVD union rule to the MVDs V n W + Descendant(N~), 1 s i s q, and V n W + V n W and substituting V for the right-hand side of the result, MY.(T) implies V n W + v. To finish the proof, we prove that on Aset(Z ), X + Y follows from V n W + V and thus we will have a contradiction. We first prove V n W c X and V X=Y X. Because V CXYand W cxz, (V n W) G (XYn XZ). Z =Aset(T) - (xy), thus Z n XY = 0. Therefore, XY n XZ =X. Hence (V n W) G X. V c XY, therefore V X c XY X. However, XY X = Y X and thus V XCY X. Aasume Y- X$Z V-X; thenthere isanattribute A~Y Xsuchthat A@ V X. Because A= Y X, AZ X. A@ V Xand A@X, thus A @ V. Let the path containing A be p and let P be the set of attributes of p. Hence A G P. P g V; otherwise A G V, which contradicts A @ V. In addition, P g XY; otherwise P c V, as V is the union of paths that are subsets of XY. Hence there is an attribute B G P that is not in XY. Because A c Y X, B @ XY, and both A and 1? are attributes in P, (Y X) n P and (Aset(T) (XY)) n P are both nonempty proper subsets of P, which contradicts the hypothesis we made at the beginning of this proof. There fore, Y- XgV-X. Thus because V XGY X, V X=Y X. MVD(Z ) implies V n W + V and V n W c X, therefore by M2 (augmentation), MVD(Z ) implies X + V. Because MVD(T) implies X + X and X -+ V, by M3 (transitivity), MVD(Z ) implies X + V X. Thus, because V X = Y X, MVD(Z ) implies X + Y X. Now by augmenting both sides of X + Y X with X n Y, we have MVD(T) implies X + Y a contradiction, o LEMMA 5.2. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U and T is consistent with M and F. Let D be the set of Ml?Ds and FDs that holds for T with respect to M and F. Let A and B be attributes in Aset(T), and let X and Z be sets of attributes in Aset(T). If A is in node N~ of T, B G Ancestor(NA), A @ Z, A 64X, B G Z, Z G DEP~(X) z such that X ++Z, then T has potential redundancy with respect to M U F. ~OOF. As we have done before, we omit references to Aset(T), inasmuch all implications are taken with respect to and remain on Aset(T). To establish potential redundancy for T with respect to M U F, we must exhibit a valid nested relation for T with respect to M u F that has a redundant atomic value. Consider a flat relation r on Aset(T) that has two tuples tland t2.let tlbe a row of all 1s, tz(aset(t) Z) = tl(aset(t) Z), 2DEP~(X) is the dependency basis for X with respect to D on Aset(Z ) [Maier 1983], ACM Transactionson DatabaseSystems,Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations. 99 and tz( Z ) be all 2s. Because Z = DEP~( X) and X + Z, r is valid with respect to M U F. We now establish the following groups of essential facts, and then conclude. (1) Because T is consistent with M and F, D implies MVD(T). r is a flat relation and is valid with respect to A4 u F, therefore r satisfies every FD and MVD that holds for T with respect to M and F, and thus r satisfies D. r satisfies D and D implies MVD( T ), therefore r satisfies MVD(T). r satisfies MVD(T) and r is defined on Aset(T), thus we can nest r according to T to obtain a nested relation q on T. Because r is the total unnesting of q and r satisfies every FD and MVD that holds for T with respect to M and F, q is valid with respect to Lf u F. (2) Because B c Z, tl(b) = 1 and tz(b) = 2. A @ Z and A @ X, therefore A = Aset(T) (X2). A G Aset(T) (X2), thus tl(a) = tz(a) and are both 1s. Let S be the nested relation scheme in T whose set of atomic attributes is N~. Then, because t I(B ) = 1 and tz( B ) = 2 and B = Ancestor( N~), t I(A) and t2( A) are in distinct nested tuples u, and Ux under S in q. (3) Let Y = Aset(T) (X2). Z G DEP~(X), thus D implies X + Y. Therefore, by Lemma 4.4, X + Y is in D. Because A Aset(T) (X2) and Y= Aset(Z ) (X2), A G Y and A EX. Z G DEPJX) and X+2, thus Zfl X=O. Zn X=@ and tl(aset(t) Z)= tz( Aset(T) Z), therefore tl( X) = tz(x). Because Y = Aset(T) (X2), t,(y) = tjy). Now, according to Definition 2.3.3, atomic value u,(a) is a redundant atomic value in q caused by X + Y. There exists a redundant atomic value in a valid nested relation for T caused by X + Y, which holds for T with respect to A4 and F, therefore T has potential redundancy with respect to MuF. LEMMA 5.3. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U and T is consistent with M and F. If D is the set of MVDS and FDs that holds for T with respect to M and F and D is not equivalent to MVD(T) U FD(T) on Aset( T), then T has potential redundancy with respect to M u F. PROOF. As we have done before, we omit references to Aset( T ). Because T is consistent with M and F, D implies MVD(T ). By definition of FD( T) and D, FD( T ) is equivalent to the set of FDs in D and thus D implies MVD(T ) u FD( T ). However, D is not equivalent to MVD(T) u I D(T ), therefore MVD(T ) u FD(T ) does not imply D. Thus there is an FD or MVD in D that is not implied by MVD(T) u FD( T ). Because FD(T ) is equivalent to the set of FDs in D, there is an MVD X * Y in D that is not implied by MVD(T) U FD(T). Let X+ = X~v~(~~, ~~~~). MVD(T) U FD(T ) does not imply X + Y, thus by Lemma 4.5, MVD(Z ) does not imply X++ Y. Therefore, by Lemma 5.1, there is a path p of T whose set of attributes is P ACM Transactions on Database Systems, Vol 21, No. 1, March 1996.