Chapter 3 Design Theory for Relational Databases

Similar documents
Design Theory for Relational Databases

Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases

CS54100: Database Systems

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Design theory for relational databases

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Relational Database Design

Relational Design Theory

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Lossless Joins, Third Normal Form

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Database Design and Implementation

CSC 261/461 Database Systems Lecture 11

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

CSC 261/461 Database Systems Lecture 13. Spring 2018

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Databases 2012 Normalization

Introduction to Data Management CSE 344

Information Systems (Informationssysteme)

SCHEMA NORMALIZATION. CS 564- Fall 2015

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

CSE 544 Principles of Database Management Systems

INF1383 -Bancos de Dados

Schema Refinement and Normal Forms

Functional Dependencies and Normalization

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Chapter 11, Relational Database Design Algorithms and Further Dependencies

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

CSC 261/461 Database Systems Lecture 12. Spring 2018

DECOMPOSITION & SCHEMA NORMALIZATION

Relational Database Design

Functional Dependency Theory II. Winter Lecture 21

CMPT 354: Database System I. Lecture 9. Design Theory

CSE 344 AUGUST 3 RD NORMALIZATION

CSE 344 MAY 16 TH NORMALIZATION

CS322: Database Systems Normalization

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Chapter 8: Relational Database Design

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Relational Design: Characteristics of Well-designed DB

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

Introduction to Management CSE 344

Relational-Database Design

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Relational Database Design

Comp 5311 Database Management Systems. 5. Functional Dependencies Exercises

Constraints: Functional Dependencies

Functional Dependencies

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Schema Refinement. Feb 4, 2010

Schema Refinement and Normal Forms

Chapter 7: Relational Database Design

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Constraints: Functional Dependencies

Problem about anomalies

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

Schema Refinement and Normal Forms

Functional Dependency and Algorithmic Decomposition

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Schema Refinement & Normalization Theory

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

CSE 132B Database Systems Applications

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Functional Dependencies

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Normal Forms Lossless Join.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Introduction to Data Management CSE 344

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Lectures 6. Lecture 6: Design Theory

CS54100: Database Systems

Database Tutorial 2: Functional Dependencies and Normal Forms

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms

Schema Refinement: Other Dependencies and Higher Normal Forms

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Schema Refinement and Normal Forms. Why schema refinement?

Normaliza)on and Func)onal Dependencies

Transcription:

1 Chapter 3 Design Theory for Relational Databases

Contents Functional Dependencies Decompositions Normal Forms (BCNF, 3NF) Multivalued Dependencies (and 4NF) Reasoning About FD s + MVD s 2

Remember our questions: Why do we design relations like the example? -- good design What makes a good relational database schema? -- no redundancy, no Update/delete anomalies, what we can do if it has flaws? -- decomposition New Question: any standards for a good design? Normal forms: a condition on a relation schema that will eliminate problems any standards or methods for a decomposition?

Boyce-Codd Normal Form (BCNF) We say a relation R is in BCNF if whenever X ->Y is a nontrivial FD that holds in R, X is a superkey. Remember: nontrivial means Y is not contained in X. Remember, a superkey is any superset of a key (not necessarily a proper superset). 4

Example Drinkers(name, addr, beersliked, manf, favbeer) FD s: name->addr favbeer, beersliked->manf Only key is {name, beersliked}. In each FD, the left side is not a superkey. Drinkers is not in BCNF 5

Another Example Beers(name, manf, manfaddr) FD s: name->manf, manf->manfaddr Only key is {name}. name->manf does not violate BCNF, but manf->manfaddr does. Beers is not in BCNF 6

Decomposition into BCNF 7 Given: relation R with FD s F. Aim: decompose R to reach BCNF Step 1: Look among the given FD s for a BCNF violation X ->Y. If any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BCNF. Step 2: Compute X +. Not all attributes, or else X is a superkey.

Decomposition into BCNF (cont.) Step 3: Replace R by relations with schemas: 1. R 1 = X +. 2. R 2 = R (X + X ). R1 R-X + X X + -X 8 R 2 R

Decomposition into BCNF (cont.) Step4: Project given FD s F onto the two new relations. given FD s F find all implied FD s pick up those FD s which have only attributes needed. 9

Example: BCNF Decomposition 10 Drinkers(name, addr, beersliked, manf, favbeer) F = name->addr, name -> favbeer, beersliked->manf Step1: Pick BCNF violation name->addr. Step2: Closure the left side: {name} + = {name, addr, favbeer}. Step3: Decomposed relations: 1. Drinkers1(name, addr, favbeer) 2. Drinkers2(name, beersliked, manf) Step4: projecting FD s to drinker1 and drinker2 Step5: check drinker 1 and drinker2 for BCNF

Example -- Continued Given FD s: name->addr, name -> favbeer, beersliked->manf All FD s: same as given FD s For Drinkers1(name, addr, favbeer), relevant FD s are name->addr and name->favbeer. Thus, {name} is the only key and Drinkers1 is in BCNF. 11

Example -- Continued For Drinkers2(name, beersliked, manf), the only FD is beersliked->manf, and the only key is {name, beersliked}. Violation of BCNF. beersliked + = {beersliked, manf}, so we decompose Drinkers2 into: 1. Drinkers3(beersLiked, manf) 2. Drinkers4(name, beersliked) 12

Example -- Concluded The resulting decomposition of Drinkers : 1. Drinkers1(name, addr, favbeer) 2. Drinkers3(beersLiked, manf) 3. Drinkers4(name, beersliked) Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like. 13

Text book on page 89 Classroom Exercise Any two-attribute relation R(A,B) is in BCNF True or False? Three Cases: 1) No nontrivial FD s 2) A B 3) A B, B A

Third Normal Form -- Motivation There is one structure of FD s that causes trouble when we decompose. R(A,B,C) AB ->C and C ->B. Example: A = street address, B = city, C = zip code. There are two keys, {A,B } and {A,C }. C ->B is a BCNF violation, so we must decompose into R1(AC), R2(BC). 15

We Cannot Enforce FD s Cannot enforce the FD AB ->C after decomposition. 16 Original: R(A,B,C) AB ->C and C ->B. Decompose into R1(AC) no FD s R2(BC) with C->B Assume A = street, B = city, and C = zip

A = street, B = city, and C = zip We Cannot Enforce FD s (cont.) C B keeps A C 545 Tech Sq. 02138 545 Tech Sq. 02139 B C Cambridge 02138 Cambridge 02139 Join tuples with equal C (zip codes). A B C 545 Tech Sq. Cambridge 02138 545 Tech Sq. Cambridge 02139 17 FD: A (street) B(city) -> C(zip) is violated by the database as a whole.

3NF Let s Us Avoid This Problem 3 rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation. An attribute is prime if it is a member of any key. X ->A violates 3NF if and only if X is not a superkey, and also A is not prime. 18

Example: 3NF In our problem situation R(A,B,C) with FD s AB ->C and C ->B, we have keys AB and AC. Thus A, B, and C are each prime. Although C ->B violates BCNF, it does not violate 3NF. R(A,B,C) above is in 3NF, not in BCNF 19

BCNF vs. 3NF BCNF 3NF conditions If X ->Y is a nontrivial FD that holds in R, X is a superkey. If X ->Y is a nontrivial FD that holds in R, X is a superkey, or Y is a prime example R(A,B,C) with A B, A C R(A,B,C) with AB - >C and C ->B. 20 2 NF: no nonkey attribute is dependent on only a portion of the primary key. R(A,B,C) with A B, B C 1 NF: every component of every tuple is an atomic value.

Properties of a decomposition Elimination of anomalies by a decomposition, it needs other two properties: 1. Lossless Join : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original. 2. Dependency Preservation : it should be possible to check in the projected relations whether all the given FD s are satisfied. 21

Decomposition for 3NF and BCNF We can get (1: Lossless Join ) with a BCNF decomposition. we can t always get (1) and (2 : Dependency Preservation ) with a BCNF decomposition. We can get both (1) and (2) with a 3NF decomposition. 22

Testing for a Lossless Join If we project R onto R 1, R 2,, R k, can we recover R by rejoining? Any tuple in R can be recovered from its projected fragments. So the only question is: when we rejoin, do we ever get back something we didn t have originally? 23

Example Any tuple in R can be recovered from its projected fragments. Not less, not more. As long as FD b c holds, the joining of two projected tuples cannot produce a bogus tuple A B C 1 2 3 4 2 5 A B 1 2 4 2 B C 2 3 2 5 A B C 1 2 3 1 2 5 4 2 3 24 4 2 5

textbook on page 98 The Chase Test (an example) 25 Aim: to prove that a tuple t in the join, using FD s in F, also to be a tuple in R. Method: Suppose tuple t comes back in the join. Then t is the join of projections of some tuples of R, one for each R i of the decomposition. Can we use the given FD s to show that one of these tuples must be t? Belong to R(A,B,C) Decomposed into R 1 (A,B) R 2 (B,C) = Join of R1and R2 Tuple t (a,b,c) a b c1 Because B C, c1= c, therefore a,b,c is in R a2 b c

The Chase Test (method) 1. Start by assuming t = abc. 2. For each i, there is a tuple s i of R that has a, b, c, in the attributes of R i. 3. s i can have any values in other attributes. 4. We ll use the same letter as in t, but with a subscript, for these components. 26

Example: The Chase Let R = ABCD, the given FD s be C->D and B ->A Suppose the decomposition be AB, BC, and CD. Question: Is it a lossless join or not? R(ABCD) == R1(AB) R2(BC) R3(CD) 27

Example: The Chase (cont.) 1. Suppose the tuple t = abcd is the join of tuples projected onto AB, BC, CD. 2. For each i, there is a tuple s I of R that has a, b from R1(AB), b,c from R2(BC) and c,d from R3(CD) 3. a I b I c I d I are any values Aim: t=abcd is also in the R 28

The Tableau A B C D a b c 1 d 1 a a 2 b c d 2 a 3 b 3 c d d 29 Use B ->A Use C->D We ve proved the second tuple must be t.

Summary of the Chase Test method 1. If two rows agree in the left side of a FD, make their right sides agree too. 2. Always replace a subscripted symbol by the corresponding unsubscripted one, if possible. 3. If we ever get an unsubscripted row, we know any tuple in the project-join is in the original (the join is lossless). 4. Otherwise, the final tableau is a counterexample. 30

Example: Lossy Join (more tuples) Same relation R = ABCD and same decomposition: AB, BC, and CD. But with only the FD C->D. 31

These projections rejoin to form abcd. The Tableau A B C D a b c 1 d 1 a 2 b c d 2 a 3 b 3 c d d These three tuples are an example R that shows the join lossy. abcd is not in R. More tuples Use C->D 32

Some results Some decompositions can not keep lossless join (lossy join). Use chase method to find out whether the decomposition is lossy join. BCNF decomposition is lossless join, sometimes it can not keep functional dependencies. Relations with 3NF keep lossless join and also functional dependencies. How to decompose relations to reach 3NF? 33

3NF Synthesis Algorithm We can always construct a decomposition into 3NF relations with a lossless join and dependency preservation. Need minimal basis for the FD s: 1. Right sides are single attributes. 2. No FD can be removed. 3. No attribute can be removed from a left side. 34

Constructing a Minimal Basis 1. Split right sides. 2. Repeatedly try to remove an FD and see if the remaining FD s are equivalent to the original. 3. Repeatedly try to remove an attribute from a left side and see if the resulting FD s are equivalent to the original. 35

3NF Synthesis method 1. Find a minimal basis for F 2. One relation for each FD in the minimal basis. 1. Schema is the union of the left and right sides. 2. X A then (XA) is a schema. 3. If no key is contained in an FD, then add one relation whose schema is some key. 36 Algorithm 3.26 is on pp.103

Example: 3NF Synthesis Relation R = ABCD. FD s A->B and A->C. Key is AD Decomposition: AB and AC from the FD s, plus AD for a key. R is decomposed into R1(AB), R2(AC), R3(AD) 37

Another example Relation R(A,B,C,D,E) and FD s AB C, C B,A D Using 3NF synthesis to decompose: 1) A minimal basis 2) R1(ABC) R2(CB) R3(AD) 3) R2 is a part of R1, delete R2 4) No key is in R1, R3, add a key R4(ABE) R has two keys: ABE, ACE. Add one of them 38

Classroom exercises Given R(A,B,C,D,E) FD s AB C, C B, A D To test R1(ABC), R3(AD), R4(ABE) is in 3NF To test whether functional dependency keeps in the R1,R3,R4 To test the decomposition is lossless. 39

Why It Works (3NF synthesis) Preserves dependencies: each FD from a minimal basis is contained in a relation, thus preserved. Lossless Join: use the chase to show that the row for the relation that contains a key can be made all-unsubscripted variables. Question: Why we say BCNF decomposition, 3NF synthesis? 40

BCNF decomposition algorithm Input: relation R + FDs for R Output: decomposition of R into BCNF relations w With lossless join 1.Compute keys for R 2.Repeat until all relations are in BCNF: Pick any R with A B that violates BCNF Decompose R into R1(A+) and R2(A, rest) Compute FDs for R1 and R2 Compute keys for R1 and R2 41

3NF synthesis Input: relation R + FDs for R Output: decomposition of R into 3NF relations w With lossless join and keep FD s 42 1.Computer key of R 2.Find a minimal basis for F 3.One relation for each FD in the minimal basis. 4.If no key is contained in an FD, then add one relation whose schema is some key.

Summary Conditions of Norm Forms (BCNF, 3NF) The way to decompose in order to reach BCNF The way to decompose in order to reach 3NF The way to test the join is lossless join 43