Functional Dependency and Algorithmic Decomposition

Similar documents
UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Relational Database Design

Functional Dependencies & Normalization. Dr. Bassam Hammo

Constraints: Functional Dependencies

CSC 261/461 Database Systems Lecture 13. Spring 2018

Schema Refinement and Normal Forms

Constraints: Functional Dependencies

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

SCHEMA NORMALIZATION. CS 564- Fall 2015

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement: Other Dependencies and Higher Normal Forms

Schema Refinement and Normalization

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Design theory for relational databases

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement and Normal Forms Chapter 19

CSIT5300: Advanced Database Systems

Schema Refinement. Feb 4, 2010

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement and Normal Forms

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CS54100: Database Systems

Schema Refinement and Normal Forms

Relational Design: Characteristics of Well-designed DB

Chapter 3 Design Theory for Relational Databases

Chapter 10. Normalization Ext (from E&N and my editing)

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Database Design and Implementation

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Normal Forms Lossless Join.

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Lossless Joins, Third Normal Form

Design Theory for Relational Databases

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CS322: Database Systems Normalization

Schema Refinement and Normal Forms. Why schema refinement?

Functional Dependencies and Normalization

Databases 2012 Normalization

Functional Dependencies

Functional Dependency Theory II. Winter Lecture 21

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Relational Design Theory

CSC 261/461 Database Systems Lecture 12. Spring 2018

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

HKBU: Tutorial 4

Chapter 7: Relational Database Design

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Lectures 6. Lecture 6: Design Theory

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

CSC 261/461 Database Systems Lecture 11

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

Chapter 3 Design Theory for Relational Databases

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Chapter 8: Relational Database Design

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

CMPT 354: Database System I. Lecture 9. Design Theory

Relational Database Design

Introduction to Data Management. Lecture #6 (Relational Design Theory)

CSE 132B Database Systems Applications

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

INF1383 -Bancos de Dados

Normaliza)on and Func)onal Dependencies

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

Design Theory for Relational Databases

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Functional Dependencies

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Functional Dependencies

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Relational-Database Design

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Database Design and Normalization

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

CSE 544 Principles of Database Management Systems

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

CSE 344 MAY 16 TH NORMALIZATION

Introduction to Data Management CSE 344

CSE 344 AUGUST 3 RD NORMALIZATION

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

Transcription:

Functional Dependency and Algorithmic Decomposition In this section we introduce some new mathematical concepts relating to functional dependency and, along the way, show their practical use in relational design. Our goal is to have a toolbox to algorithmically generate relations, which meet our criteria for good relational design: Eliminate/reduce redundancy and the possibilities for update anomalies by decomposing relations without losing any information from the original relations (i.e. all of the original attributes must be present in the set of resulting relations; decomposition should preserve functional dependencies and not create spurious data when relations are joined). Closure of a set of FDs: Given a set of FDs we can usually infer additional FDs. For example, consider the following relation and set of FDs holding on the relation: TEACH = {Module_ID, Lecturer_ID, Lect_First_Name, Lect_Last_Name} F = { {Module_ID} {Lecturer_ID}, {Lecturer_ID} {Lect_First_Name, Lect_Last_Name}} Given these FDs we can infer that, for example: {Module_ID} {Lect_First_Name, Lect_Last_Name} In addition, given F, it is also true that, for example: {Lecturer_ID} {Lect_First_Name} We say that these FDs are logically implied by F. We can continue to find other FDs that are logically implied by F. The set of all such FDs (and including those in F) is called the closure of F (denoted F + ). Stating these concepts formally: An FD f is logically implied by a set of FDs F if f holds whenever all FDs in F hold. The closure of F is the set of all FDs that are implied by F. The closure of F is denoted as F +. Database Systems (EE221) - Page 49

In order to fully define logical implication, so that we can compute the closure of a set of FDs, we must have a set of rules (axioms) which govern how an FD can be logically implied by some set of FDs. The Inference Axioms (Armstrong s Axioms) (1) If Y X, then X Y (reflexivity) (2) If Z W and X Y, then XW YZ (augmentation) (3) If X Y and Y Z then X Z (transitivity) Other rules which may be derived from the axioms include: If X Y and YW Z then XW Z If X Z and X Y then X YZ If X YZ then X Y and X Z Using these axioms we can find all FDs logically implied by some set of FDs F and hence find its closure F +. Exercise 11 Using the axioms, find the closure F + of F = {AB CD, C E}. Database Systems (EE221) - Page 50

Application of Closure to Testing Dependency Preservation Looking again at the PROJECTS schema from our previous 3NF example, we had the following functional dependencies: Project-Budget Project Department Dept-Address We decomposed the PROJECTS relation into two relations on which two corresponding sets of FDs held: Project-Budget Department Project Department Dept-Address If we look at all FDs in the new schema taken together: {Project Project-Budget, Project Department, Department Dept-Address} it appears that we have lost one of the original FDs: Project Dept-Address. However, using the axioms, we can infer that Project Dept-Address is logically implied by Project Department and Department Dept-Address so that the FD has not actually been lost. Database Systems (EE221) - Page 51

One of our goals during decomposition is to preserve dependencies. This is important as dependencies represent constraints on the data which must be enforced to maintain the semantics of the relations. Given our definition of closure we can now define precisely how to check if a decomposition is dependency preserving: A decomposition of a relation R, on which the dependencies F hold, into relations R1 and R2 is dependency preserving if: (F R1 F R2 ) + = F + Where F R1 is the set of dependencies that apply between attributes only in R1 and similarly F R2 is the set of dependencies that apply between attributes only in R2. Although the set of dependencies (F R1 U F R2 ) might be different to the original set of dependencies F, we only require that the two sets are equivalent (that is, their closures are the same) for the decomposition to be dependency preserving. For example, the closure of the PROJECTS relation in the previous example is: {Project Project-Budget, Project Department, Department Dept-Address, Project Dept-Address} The closure of the combined FDs in the decomposed tables is the same, so the decomposition is dependency preserving. Database Systems (EE221) - Page 52

Application of Closure to Testing for Lossless Decomposition Consider the following example of decomposition: R1 = ({A, B, C}, {A B, C B}) RELATION: R1 A B C a1 b1 c1 a3 b1 c2 a2 b2 c3 a4 b2 c4 If we decompose this into the relations: R2 = ({A, B}, {A B}) R3 = ({B, C}, {C B}) we would have: RELATION: R2 RELATION: R3 A B B C a1 b1 b1 c1 a3 b1 b1 c2 a2 b2 b2 c3 a4 b2 b2 c4 A B C B Database Systems (EE221) - Page 53

Now, if we join relations R2 and R3 over the common attribute B, the resultant relation would appear as follows: A B C a1 b1 c1 a1 b1 c2 a3 b1 c1 a3 b1 c2 a2 b2 c3 a2 b2 c4 a4 b2 c3 a4 b2 c4 The resultant relation contains more rows (spurious tuples) than existed in the original R1. This is called a lossy decomposition as it causes constraint information to be lost. Lossy decomposition is extremely undesirable and should be avoided at all costs. How to test for lossless/lossy decomposition? Decomposition of R into two relations R 1 and R 2 is lossless, with respect to a set of functional dependencies F, if and only if the closure F + contains: (R 1 R 2 ) R 1 or (R 1 R 2 ) R 2 We can interpret this condition as the condition that common attributes must form a superkey in either R 1 or R 2 for the decomposition to be lossless. Exercise: use the definition above to show that decomposition of R1 = ({A, B, C}, {A B, C B}) into R2 = ({A, B}, {A B}) and R3 = ({B, C}, {C B}) is not lossless. We will now develop an algorithm that guarantees a lossless, dependency-preserving 3NF decomposition. First we need to consider the notions of attribute closure and minimal cover. Database Systems (EE221) - Page 54

Attribute Closure The closure of a set of FDs is of interest, as we have seen, but can be very large and take a long time to compute. There is no efficient method (shortcut) for computing it. For algorithms used in decomposition, typically we only want to check if some given FD X Y is in the closure of a set of FDs F. There is a way of doing this without computing the closure F +. One ad hoc way of answering this is to try to derive X Y from F using the inference axioms. Another is by use of the concept of the closure of an attribute. The attribute closure of X with respect to the set of FDs F is defined as the set of all attributes A such that X A is in F +. Closure of a set of attributes X (denoted X + ) is the set of all the attributes that are functionally dependent on X, given some set of functional dependencies F. The attribute closure is much more efficient to compute than F +. The aim of computing it is to find what attributes depend on a given set of attributes and therefore ought to be together in the same relation. We will use it later to formulate an algorithmic way of decomposing a relation into relations of higher normal form. Computing the Closure of an Attribute The following algorithm is used to compute the closure of a set of attributes X under the set of FDs F. X + := X repeat temp_x + := X + for each FD Y Z in F do if Y X + then X + := X + Z until (temp_x + = X + ) Explanation of Algorithm: If the attribute X determines some attribute Y and in turn Y determines Z, then X also determines Z (axiom (3)) and so Z is added to X +. Continuing, if any set of attributes in X + determine some other attribute(s) then X also determines this attribute(s) and it is added to X +, and so on. Database Systems (EE221) - Page 55

Attribute Closure Applications We can use attribute closure to quickly answer questions about specific FDs, and also to find keys of complex relations. For example, can a given FD X Y be inferred by some set of FDs F? This is the same as asking: is X Y in F +? (difficult to find F + ) Which in turn is the same as asking: is Y in X + given F? (easy to find X + ) Example: Is B G in F + where R = {A, B, C, D, E} and F = {B CD, E F, D E, B A, AD B, F G} Using the algorithm compute B + and then check if it contains G: B + B + B + B + B + B + = B = BCD = BCDE = BCDEA = BCDEAF = BCDEAFG stop Yes G is in B + so B G is in F +. Note that B + contained all the attributes in the relation R and so B must be a key to the relation. Finding the closure of attributes can be used as a general method for checking if multiple attributes taken together are relation keys. For example, is AD a relation key of R? To answer this, find A +, D + and AD +. If neither A + nor D + contain all the attributes in R but AD + does, then AD is a relation key. Exercise 12 Given the relation: R = ({V, W, X, Y, Z}, {W XY, Y V, Z V, X Z}) Compute the X +, showing each step of the algorithm, and from X + determine if (a) X is a key of R (b) X VZ holds on R Database Systems (EE221) - Page 56

Using Attribute Closure to Remove Redundant FDs from a Set of FDs As we ve seen, the closure of a set of attributes X under a set of FDs F is the set of all the attributes that are functionally dependent on X. We can use attribute closure to remove redundant FDs from a set. This will be useful in systematically normalising relations (later). As an example, remove all redundant functional dependencies from F, using the attribute closure algorithm, where: F = {AD B, B C, C D, A B} We can do this by examining each FD in turn to see if it is redundant, i.e. if it can be inferred from the other FDs in F, and can thus be removed from F to form a smaller equivalent set G. We can summarise this algorithm as follows: set G = F for each FD X A in G { } compute X + if X + with respect to the set of dependencies (G (X A)) contains A, then G = (G (X A)) Applying this algorithm to F = {AD B, B C, C D, A B} G = {AD B, B C, C D, A B} For AD B compute AD + under (G (AD B)) AD + = AD AD + = ADB AD + = ADBC AD + = ADBCD stop G = (G (AD B)) = {B C, C D, A B} For B C compute B + under (G (B C)) B + = B stop G = {B C, C D, A B} For C D compute C + under (G (C D)) C + = C stop G = {B C, C D, A B} For A B compute A + under (G (A B)) A + = A stop G = {B C, C D, A B} end Note that depending on the FDs and their order we can sometimes find a different G. Database Systems (EE221) - Page 57

Exercise 13 Remove any redundant FDs from the following sets of FDs, using the attribute closure algorithm: Set 1 A B B C AD C Set 2 XY V ZW V VX Y W Y Z X Set 3 PQ R PS Q QS P PR Q S R Database Systems (EE221) - Page 58

Minimal Cover of a Set of Functional Dependencies There may be further redundancy in a set of FDs in the form of redundant attributes in the determinants of the FDs. If we also remove such attributes we arrive at what is called a minimal cover for the set of FDs. The minimal cover must also have the property of having single attributes on the right hand side of all FDs in the set. Formally: The minimal cover for a set of FDs F is the set of FDs G such that: o G + = F + (that is G and F are equivalent) o and the right hand side of each FD in G is a single attribute o and G is minimal, that is, if we remove any attribute from an FD in G or remove any FD from G, then G + will no longer equal F + We can extend our previous algorithm to obtain a minimal cover algorithm: set G = F replace each FD X {A 1,A 2,...,A n } in G by n FDs X A 1, X A 2,...,X A n for each FD X A in G { compute X + with respect to the set of dependencies (G (X A)) if X + contains A, then G = (G (X A)) } for each remaining FD X A in G that has multiple attributes in X for each attribute B X { compute (X-B) + with respect to (G-(X A)) ((X-B) A) if (X-B) + contains B, then replace X A in G with (X-B) A } The first step simply replaces any FDs with multiple attributes on the rhs with equivalent sets of FDs with rhs containing only one attribute. The second step removes any redundant FDs, using the algorithm we had before. The third step checks if any attribute on the left hand side of any FD can be removed. Note: it is sometimes possible to have more than one valid minimal cover for a given set of FDs. Database Systems (EE221) - Page 59

Example: Using the minimal cover algorithm, find a minimal cover of F = {AB D, B C, AE B, A D, D EF} Step 1. Make right hand sides atomic G = {AB D, B C, AE B, A D, D E, D F} Step 2. Remove any redundant FDs For AB D compute AB + under (G (AB D)) AB + = ABCDEF D in AB + so remove AB D from G G = {B C, AE B, A D, D E, D F} For B C compute B + under (G (B C)) B + = B C not in B + so G = {B C, AE B, A D, D E, D F} For AE B compute AE + under (G (AE B)) AE + = AEDF B not in AE + so G = {B C, AE B, A D, D E, D F} For A D compute A + under (G (A D)) A + = A D not in A + so G = {B C, AE B, A D, D E, D F} For D E compute D + under (G (D E)) D + = DF E not in D + so G = {B C, AE B, A D, D E, D F} For D F compute D + under (G (D F)) D + = DE F not in D + so G = {B C, AE B, A D, D E, D F} Step 3. Remove any redundant left hand side attributes G = {B C, AE B, A D, D E, D F} For AE B For A: compute E + E + wrt {B C, E B, A D, D E, D F} = EBC E + doesn t contain A, so A not redundant in AE B For E: compute A + A + wrt {B C, A B, A D, D E, D F} = ABDEFC A + contains E, so E is redundant in AE B with respect to (G-(AE B) (E B)) with respect to (G-(AE B) (A B)) Minimal closure = {B C, A B, A D, D E, D F} Database Systems (EE221) - Page 60

Exercise 14 Find a minimal cover for the following set of functional dependencies: A BC B C A B AB C AC D Database Systems (EE221) - Page 61

Algorithmic 3NF Normalisation We can use our minimal cover algorithm to systematically decompose relations into a higher normal form, whilst meeting our objectives of preserving dependencies (at least for 3NF) and avoiding loss in the decomposition (for either 3NF or BCNF). Below is an algorithm for decomposing a relation to 3NF. It will always produce a lossless decomposition which is dependency preserving (proof omitted). given F, find a minimal cover G for F for each set of FDs in G of the form {X A 1, X A 2,...,X A n } containing alls FDs in G with the same determinant X { create a relation R = {X,A 1,A 2,...,A n } } place any remaining attributes in a single relation We stop here at 3NF but note that the maths can be applied to systematically normalising to higher normal forms. Note however, that we cannot guarantee that there will always exist a BCNF (or higher normal form) which is lossless and dependency preserving. For this reason, 3NF is often a high enough normal form for many practical applications. Exercise 15 By first finding a minimal cover, normalise the relation R to 3NF. R = ({A, B, C, D, E, F}, {AB C, C A, BC D, ACD B, D EF}) Database Systems (EE221) - Page 62

Summary of Relational Database Design Procedures Two alternative approaches to relational analysis and design: 1: Modelling plus Decomposition/Verification Approach o Gather information about application and analyse o Find functional dependencies between attributes o Make E-R diagram of schema o Refine E-R diagram, e.g. convert many-to-many relationships o Examine resulting tables and normalise if necessary 2: Pure Algorithmic Decomposition Approach o Gather information about application and analyse o Find functional dependencies between attributes o Put all attributes into one large relation and systematically normalise/decompose into final schema. Database Systems (EE221) - Page 63