Lossless Joins, Third Normal Form FCDB 3.4 3.5 Dr. Chris Mayfield Department of Computer Science James Madison University Mar 19, 2018
Decomposition wish list 1. Eliminate redundancy and anomalies 2. Recover the original relation exactly (by joining) 3. Joined relation should satisfy the original FDs BCNF decomposition: Properties 1 and 2, but not 3 3NF decomposition: Properties 2 and 3, but not 1 No algorithm can get all three! Mar 19, 2018 Lossless Joins, Third Normal Form 2 of 12
Recovering from a decomposition Suppose R has a schema that violates BCNF BCNF algorithm decomposes R into a set {S 1, S 2,..., S k } of new relations, such that: 1. Each relation S i is in BCNF, and 2. Decomposition of R is a lossless join R = S 1 S 2 S k Every tuple in R is in S1 S 2 S k Every tuple in S1 S 2 S k is in R Mar 19, 2018 Lossless Joins, Third Normal Form 3 of 12
Lossless join vs lossy join Consider R(A, B, C) and B C with tuples (a, b, c) and (d, b, e) BCNF is... S(A, B) and T (B, C) How do we prove that R = S T? BCNF is lossless because of FDs! Now consider R(A, B, C) without FD s: A B C 1 2 3 4 2 5 A B 1 2 4 2 and B C 2 3 2 5...? Mar 19, 2018 Lossless Joins, Third Normal Form 4 of 12
Dependency Preservation What does the following relation model? Teach(dept, course, prof, semester, year) Example university has the following rules: Offered either in fall or spring of each year Each prof teaches 1 course per semester DC S PSY DC What are the keys? {D, C, P, Y } and {P, S, Y } Decompose using DC S: T 1 (D, C, S) and T 2 (D, C, P, Y ) How do you enforce PSY DC? (i.e., with a CHECK constraint) Not without joining T 1 and T 2 first BCNF does not always preserve FDs Mar 19, 2018 Lossless Joins, Third Normal Form 5 of 12
Third Normal Form (3NF) R is in Third Normal Form if each nontrivial FD A 1 A 2 A n B 1 B 2, B m has: Either {A 1, A 2,..., A n } is a superkey for R Or each B i is prime (part of some key in R) (Relaxed version of BCNF) Teach(D, C, P, S, Y) has FDs DC S and PSY DC Keys are {D, C, P, Y } and {P, S, Y } DC S violates BCNF (since DC PY ) However, 3NF because S is a part of a key Mar 19, 2018 Lossless Joins, Third Normal Form 6 of 12
More 3NF examples Teach(dept, course, prof, semester, year) Each prof teaches 1 course per semester PSY DC What if we change/add other rules? Offered either in fall or spring of each year, but the semester can change from year to year DCY S Keys are {P, S, Y } and {D, C, P, Y } still 3NF Every time it s offered, each course is taught by at most one prof DCSY P Keys are {P, S, Y } and {D, C, Y } still 3NF Modify the previous constraint: Each course is always taught by the same prof DC P Keys are {P, S, Y } and {D, C, Y } still 3NF Mar 19, 2018 Lossless Joins, Third Normal Form 7 of 12
Synthesis algorithm for 3NF input: Relation R with set of FDs F output: Decomposition of R into 3NF 1. Find a minimal basis G based on F 2. For each FD X A in G, use X A as the schema for one of the new relations Drop relations that are subsets of others 3. If none of the relations is a superkey for R, add a relation whose schema is a key for R Read section 3.5.3 to understand why this works! Mar 19, 2018 Lossless Joins, Third Normal Form 8 of 12
Example 3.27 Consider R(A, B, C, D, E) with FDs AB C, C B, and A D 1. Is this a minimal basis? Find the closure of each, using the others Can we eliminate A or B from AB C? 2. Create relations for each FD S 1 (A, B, C) S 2 (B, C) drop this one; already in S 1 S 3 (A, D) 3. What are the keys of R? {A, B, E} and {A, C, E} S 4 (A, B, E) or S 5 (A, C, E) Mar 19, 2018 Lossless Joins, Third Normal Form 9 of 12
What about 1NF, 2NF, etc? http://www.bkent.net/doc/simple5.htm
Summary of normal forms 1st Normal Form (1NF) Each cell contains a single value (no table within a table) 2nd Normal Form (2NF) 1NF + non-prime attributes depend on all key attributes 3rd Normal Form (3NF) 2NF + non-prime attributes depend only on key attributes Each attribute must be a fact about the key, the whole key, and nothing but the key. Boyce-Codd Normal Form (BCNF) 3NF + every left hand side determinant is a superkey 4th Normal Form (4NF) BCNF + multivalued dependencies are based on superkeys 5th Normal Form (5NF) 4NF + join dependencies are a consequence of superkeys Mar 19, 2018 Lossless Joins, Third Normal Form 11 of 12
What about performance? Normal forms prevent update anomalies and data inconsistencies penalize retrieval (i.e., several records instead of one) There is no obligation to fully normalize all records... Factors affecting normalization: Dependency on the entire key Presence of mutual constraints Independent vs dependent facts Single-valued vs multi-valued Mar 19, 2018 Lossless Joins, Third Normal Form 12 of 12