Database Design and Normalization Chapter 11 (Week 12) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1
1NF FIRST S# Status City P# Qty S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400 Sample tabulation of FIRST EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 2
Second Normal Form (2NF) A B, B C A C S# 1 3 Status city 2 R1 (SECOND) S# Status City S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens Suppose transitive FD (dependency between non-key attributes) exists. S# P# Qty R2 (SP) S# P# Qty S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 3
Problem with 2NF - Insertion in SECOND - Update of SECOND - Deletion in SECOND Relation SP has no problem. It is in 3NF EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 4
Third Normal Form (3NF) 1)Full dependency on the P.key 2)No mutual dependency among non-p.key attributes. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 5
Decomposition of a Relation Scheme Suppose that relation R contains attributes A1... An. A decomposition of R consists of replacing R by two or more relations such that: Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and Every attribute of R appears as an attribute of one of the new relations. Intuitively, decomposing R means we will store instances of the relation schemes produced by the decomposition, instead of instances of R. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 6
Problems with Decompositions There are three potential problems to consider: ❶ Some queries become more expensive. (joining) ❷ Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! (lossy decomposition). ❷ Checking some dependencies may require joining the instances of the decomposed relations. (dependency preserving) Tradeoff: Must consider these issues vs. redundancy. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 7
Lossless Join Decompositions Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r that satisfies F: (r) (r) = r π X >< π Y It is always true that r π (r) (r) X >< π Y In general, the other direction does not hold! If it does, the decomposition is lossless-join. Definition extended to decomposition into 3 or more relations in a straightforward way. It is essential that all decompositions used to deal with redundancy be lossless! (Avoids Problem (2).) Minimal cover M.C. Lossless-Join EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 8
More on Lossless Join The decomposition of R into X and Y is lossless-join wrt F if and only if the closure of F contains: X Y X, or X Y Y In particular, the decomposition of R into UV and R - V is lossless-join if U V holds over R. A B C 1 2 3 4 5 6 7 2 8 A B C 1 2 3 4 5 6 7 2 8 1 2 8 7 2 3 A B 1 2 4 5 7 2 B C 2 3 5 6 2 8 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 9
Decomposition from 2NF to 3NF R(A,B,C) P.K. (A) B C Decomposition (loss-less) R1 (B,C) R2 (A,B) P.K (B) P.K (A) F.K (B) reference R1 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 10
Additional Consideration: Dependency Preservation Issue Solution1 (loss-less) SC (S#,city): P.K (S#) F.K (city) CS (city,status): P.K (city) Solution2 (loss-less) SC(S#,city) SS(S#,status) Solution 2 is bad. We cannot insert The status information for a city unless some supplier is located in that city. city status. Loss of an FD (inter-relation database constraint problem) S# Status city EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 11
Dependency Preserving Decompositions (Contd.) Decomposition of R into X and Y is dependency preserving if (F X union F Y ) + = F + i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +. Important to consider F +, not F, in this definition: ABC: A B, B C, C A, decomposed into AB and BC. Is this dependency preserving? Is C A preserved? Dependency preserving does not imply lossless join, and vice-versa. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 12
Decomposition into 3NF To ensure dependency preservation, one idea: If X Y is not preserved, add relation XY. Problem is that XY may violate 3NF Refinement: Instead of the given set of FDs F, use a minimal cover for F. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 13
Lossy Decomposition S# Status S1 20 S2 10 S3 10 S4 20 S5 30 City status Athens 30 London 20 Paris 20 Rome 50 NY 20 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 14
Dependency Preservation Issue Independent Projection: Update can be made to either relation without regard for the other Theorem: Projections R1 and R2 of R are independent iff: Every FD in R can be logically deduced from those given in R1 and R2 The common attribute of R1 and R2 form a candidate key for at least one of them. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 15
BCNF(Boyce/Codd Normal Form) 3 NF deals with exactly one candidate key (always arrows out of a candidate key) Relations with Multiple Candidate keys Keys are composite Key overlap Update anomalies EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 16
Multiple Keys with Overlap Sname S# P# Qty EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 17
Determinant Any attribute on which some other attribute is fully FD. Qty S# P# Status city S#, city, (S#,P#) are determinants. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 18
BCNF(Boyce/Codd Normal Form) Definition:A relation is in BCNF iff every determinant is a candidate key (only arrows in the whole diagram are arrows out of candidate keys) Alternate definition: A relation R(A1, A2,, An) is in BCNF iff the existence of a non-trivial FD X Y implies the existence of FD s X Ai, for all I = 1,2, n) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 19
Example of FIRST Relation S#, city, (S#,P#) are determinants. What are the candidate keys? Is this relation in BCNF? Qty S# P# Status city EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 20
BCNF(Boyce/Codd Normal Form) Ex: SSP (S#,Sname,P#,Qty) (in 3NF) Candidate keys are (S#,P#) (Sname,P#) Update anomalies S# Sname P# Qty S1 Smith P1 300 S1 Smith P2 200 S1 Smith P3 400 S1 Smith P4 200 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 21
BCNF(Boyce/Codd Normal Form) Original S relation (S#,Sname, Status, Qty) is in BCNF. S# Status Sname city Reason: Only determinants are candidate keys (although multiple candidate keys exist) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 22
BCNF(Boyce/Codd Normal Form) 3 cases of overlapping composite candidate keys Sname S# Case A: Two composite candidate P# keys with a FD Case B: Inter-relational Qty Constraints, an attribute is a determinant but not a candidate key S Case C: S P T J J EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 23
Case A: BCNF Case A: Two composite candidate keys with a FD Solutions 1: SS (S#,Sname) SP (S#,P#,Qty) Solutions 2: SS (S#,Sname) SP (Sname,P#,Qty) Sname Qty S# P# EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 24
Case B: BCNF Case B: Inter-relational T Constraints, an attribute is a determinant but not a candidate key J S SJT(Student, Subject, Teacher) (in 3NF but not in BCNF) Problem: T is determinant But not a candidate key. S J T Smith Math White Jones Math White Jones Physics Brown Smith Physics Green EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 25
Case B: Decomposing into BCNF Solution ST(S,T) TJ(T,J) loss-less Not independent projection (inter-relational constraint). Two projections cannot be independently updated. Note : Sometimes decomposing a relation into BCNF and decomposition into independent relations may conflict. T Green Brown Kirk J Physics Physics Chem S Smith Jones Smith Smith T White Brown Brown Don Kirk EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 26
Case C: BCNF Case C: Exam(S,J,P) S J P Candidate keys are the only determinants. No anomalies. Assumption:?? EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 27