Practice and Applications of Data Management CMPSCI 345 Lecture 16: Schema Design and Normalization
Keys } A superkey is a set of a/ributes A 1,..., A n s.t. for any other a/ribute B, we have A 1,..., A n à B } A key is a minimal superkey } I.e. set of a/ributes which is a superkey and for which no subset is a superkey 2
Computing (Super)Keys } Compute X + for all sets X } If X + = all a/ributes, then X is a superkey } List only the minimal X s to get the keys 3
Example Product(name, price, category, color) name, category à price category à color What is the key? 4
Example Product(name, price, category, color) name, category à price category à color What is the key? (name, category) + = { name, category, price, color } Hence (name, category) is a key 5
Examples of Keys Enrollment(student, address, course, room, Ome) Find the keys student à address room, time à course student, course à room, time 6
Eliminating Anomalies Main idea: } X A is OK if X is a (super)key } X A is not OK otherwise 7
Example SSN à Name, City Name SSN PhoneNumber City Fred 123-45-6789 413-555-1234 Amherst Fred 123-45-6789 413-555-6543 Amherst Joe 987-65-4321 908-555-2121 Wesaield Joe 987-65-4321 908-555-1234 Wesaield What is the key? {SSN, PhoneNumber} Hence SSN à Name,City is a bad dependency 8
Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s, s.t. there are two or more keys 9
Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s, s.t. there are two or more keys AB à C BC à A or A à BC B à AC what are the keys here? Can you design FDs such that there are three keys? 10
Boyce-Codd Normal Form (BCNF) A simple condioon for removing anomalies from relaoons: A relaoon R is in BCNF if: If A 1,..., A n à B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R In other words: there are no bad FDs Equivalently: for all X, either (X + = X) or (X + = all a/ributes) 11
BCNF Decomposition Algorithm repeat choose A 1,, A m à B 1,, B n that violates BCNF split R into R 1 (A 1,, A m, B 1,, B n ) and R 2 (A 1,, A m, [others]) cononue with both R 1 and R 2 un(l no more violaoons B s A s Others Is there a 2-a/ribute relaoon that is not in BCNF? R 1 R 2 In pracoce, we have a be/er algorithm (coming up) 12
Example (revisited) SSN à Name, City Name SSN PhoneNumber City Fred 123-45-6789 413-555-1234 Amherst Fred 123-45-6789 413-555-6543 Amherst Joe 987-65-4321 908-555-2121 Wesaield Joe 987-65-4321 908-555-1234 Wesaield What is the key? {SSN, PhoneNumber} Hence SSN à Name,City is a bad dependency 13
Example (revisited) SSN à Name, City Name SSN City Fred 123-45-6789 Amherst Joe 987-65-4321 Wesaield SSN PhoneNumber 123-45-6789 413-555-1234 123-45-6789 413-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 Let s check anomalies: Redundancy? Update? Delete? 14
Example Decomposition Person(name, SSN, age, haircolor, phonenumber) FD1: SSN à name, age FD2: age à haircolor Decompose into BCNF: 15
Example Decomposition Person(name, SSN, age, haircolor, phonenumber) FD1: SSN à name, age FD2: age à haircolor Decompose into BCNF: What is the key? {SSN, phonenumber} But how to decompose? Person(SSN, name, age) Phone(SSN, haircolor, phonenumber) or Person(SSN, name, age, haircolor) Phone(SSN, phonenumber) SSN à name, age, haircolor or. 16
BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X X + [all a/ributes] if (not found) then R is in BCNF let Y = X + - X let Z = [all a/ributes] - X + decompose R into R 1 (X Y) and R 2 (X Z) cononue to decompose recursively R 1 and R 2 17
Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) FD1: SSN à name, age FD2: age à haircolor Find X s.t.: X X + [all a/ributes] IteraOon 1: Person SSN + = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) IteraOon 2: P age + = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber) What are the keys? 18
Example R(A,B,C,D) A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) What are the keys? What happens if in R we first pick B +? Or AB +? 19
Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) R 1 = projecoon of R on A 1,..., A n, B 1,..., B m R 2 = projecoon of R on A 1,..., A n, C 1,..., C p 20
Theory of Decomposition } SomeOmes it is correct: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name à Price Name Price Name Category Gizmo 19.99 OneClick 24.99 Gizmo OneClick Gizmo Gadget Camera Camera Lossless decomposioon 21 x
Incorrect Decomposition } SomeOmes it is not: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name à Price What s incorrect?? Name Category Price Category Gizmo OneClick Gizmo Gadget Camera Camera 19.99 Gadget 24.99 Camera 19.99 Camera Lossy decomposioon 22
Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) If A 1,..., A n à B 1,..., B m Then the decomposioon is lossless Note: don t need A 1,..., A n à C 1,..., C p BCNF decomposioon is always lossless. WHY? 23
General Decomposition Goals 1. EliminaOon of anomalies BCNF 2. Recoverability of informaoon } Can we get the original relaoon back? 3NF 3. PreservaOon of dependencies } Want to enforce FDs without performing joins SomeOmes cannot decompose into BCNF without losing ability to check some FDs in single relaoon 24 x
BCNF and Dependencies Unit Company Product Unit à Company Company, Product à Unit So, there is a BCNF violaoon, and we decompose. Unit Company Unit à Company Unit Product No FDs In BCNF we lose the FD Company, Product à Unit 25
3NF Motivation A relaoon R is in 3rd normal form if : Whenever there is a nontrivial dep. A 1, A 2,..., A n B for R, then {A 1, A 2,..., A n } is a super-key for R, or B is part of a key. Tradeoffs: BCNF: no anomalies, but may lose some FDs 3NF: keeps all FDs, but may have some anomalies 26