So far we have seen: RECAP How to use functional dependencies to guide the design of relations How to modify/decompose relations to achieve 1NF, 2NF and 3NF relations But How do we make sure the decompositions are lossless (equivalence preserving)? Are some decompositions better than others? What if there are multiple candidate keys? 1 2 Why is losslessness important? An Instance of Relation NEWS No information is lost or added implicitly by mistake. Any information that can be derived from the original relation can also be derived from the relations that result from the decomposition and vice versa. In other words, you get the same answers to your queries. 3 S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 10 Paris S4 Clark 20 London S5 Adams 30 Athens 4 Suppose we decompose NEWS into: NEWS (S#, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY R1(S#, Sname) R2(City, Status) So no attributes are lost. R1 R2 S# Sname City Status S1 Smith London 20 S2 Jones Paris 10 S3 Blake Athens 30 What is the status or city of Smith? 5 6
Would it be good enough if the 2 relations were to have at least one attribute in common? NEWS (S#, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY R1(S#, Sname, Status) R2(City, Status) So no attributes are lost and the R1 and R2 have an attribute in common.?????? Do you see any problems??????? 7 8 R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 Rome 20 9 10 R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 Rome 20 What is the city of Smith? London or Rome? So still not good enough. R R1 R2 Do not loose any attributes. Make sure R1 and R2 have some attribute(s) in common. Some extra condition on the shared attribute(s) to ensure losslessness. 11 12
Losslessness Definition: Lossless decomposition A decomposition of a relation R into relations R1,..., Rn is lossless (nonloss) if and only if for every instance of R and the Ri the natural join of R1,..., Rn gives the relation R. Natural Join very briefly Teaches Class Lecturer Course Course class fs logic logic msc jm ai ai msc sd C++ ai meng2 13 14 Teaches JOIN Class Teaches Class Lecturer Course Course Class fs logic logic msc jm ai ai msc sd C++ ai meng2 Lecturer Course Class fs logic msc jm ai msc jm ai meng2 15 16 Another Example of JOIN Teaches Class-number Lecturer Course Number Class fs logic 60 msc jm ai 100 meng2 Teaches JOIN Class-number Lecturer Course Number Class fs logic 60 msc fs logic 100 meng2 jm ai 60 msc jm ai 100 meng2 17 A Lossy Decomposition R A B C a1 b1 c1 a2 b1 c2 a2 b2 c2 18
Suppose we decomposed R into R1 and R2. Now consider the JOIN of R1 and R2. R1 R2 A B B C a1 b1 b1 c1 a2 b1 b1 c2 a2 b2 b2 c2 R1 R2 A B B C a1 b1 b1 c1 a2 b1 b1 c2 a2 b2 b2 c2 19 20 R1 JOIN R2 R1 JOIN R2 versus R A B C A B C a1 b1 c1 a1 b1 c1 a1 b1 c2 a2 b1 c2 a2 b1 c1 a2 b2 c2 a2 b1 c2 a2 b2 c2 R Theorem: sufficient condition for losslessness Suppose R is a relation scheme and F is a set of functional dependencies on R. Let R1 and R2 be projections of R such that the union of the sets of attributes of R1 and R2 is equal to the set of attributes of R. This decomposition of R is lossless if at least one of the following fds is in F+: R1 R2 R1 R1 R2 R2 21 22 Example: Lecturer DB Lecturer Course Number Class fs logic 60 msc fs logic 100 meng2 jm ai 60 msc jm ai 100 meng2 Class Number Only key: (Lecturer,Course,Class) Example cntd. What normal form is Lecturer DB in? Lecturer(Lecturer,Course,Number,Class) Decompose to: Degree(Class,Number) Teaches(Lecturer,Course,Class) Degree Teaches = Class Class Degree 23 24
So far we have done: How do we make sure the decompositions are lossless (equivalence preserving)? Are some decompositions better than others? Dependency Preservation It is often possible to decompose a relation in different ways. Amongst the lossless decompositions some may be better than others. What if there are multiple candidate keys? 25 26 Here is an instance of relation NEWS. Example NEWS (S #, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY 27 S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 10 Paris S4 Clark 20 London S5 Adams 30 Athens NEWS is in 2NF, but not in 3NF. It can be transformed to 3NF by two alternative decompositions. 28 Instance of A A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) Both decompositions are lossless. All resulting relations are in 3NF. Supplier City-info S# SNAME CITY CITY STATUS S1 Smith London London 20 S2 Jones Paris Paris 10 S3 Blake Paris Athens 30 S4 Clark London Rome 50 S5 Adams Athens 29 30
Supplier Instance of B Status-info S# SNAME CITY S# STATUS S1 Smith London S1 20 S2 Jones Paris S2 10 S3 Blake Paris S3 10 S4 Clark London S4 20 S5 Adams Athens S5 30 Which one is better? A or B? Example: Consider the update Change CITY of supplier S1 from London to Paris. What does this involve in A? What does this involve in B? 31 32 In A In A: All we have to do is change the relevant value in relation Supplier. In B: We have to update both relations Supplier and Status-info to ensure that the functional dependency is maintained. In Supplier: Change <S1, Smith, London> to <S1, Smith, Paris>. 33 34 In B Supplier City-info S# SNAME CITY CITY STATUS S1 Smith London London 20 S2 Jones Paris Paris 10 S3 Blake Paris Athens 30 S4 Clark London Rome 50 S5 Adams Athens In Supplier: Change <S1, Smith, London> to <S1, Smith, Paris>. In Supplier find a row with City=Paris, and read its S#. In Status-info find the Status of that S#. In Status-info change the value of the Status of S1 to this new Status. 35 36
Another problem with B is that: Supplier Status-info S# SNAME CITY S# STATUS S1 Smith London S1 20 S2 Jones Paris S2 10 S3 Blake Paris S3 10 S4 Clark London S4 20 S5 Adams Athens S5 30 (Assuming the Entity Integrity Rule) We cannot insert in B the information that a given city has a given status, unless some supplier is located in that city. 37 38 In A In A, it is the transitive dependency S # STATUS which is an inter-relational constraint. This constraint is maintained automatically as long as the constraints S # CITY are maintained in each relation, and these are just primary key constraints in each relation of A. A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) S# SNAME S# CITY S# STATUS 39 40 In B Formalisation/Generalisation of This Intuition B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) S# SNAME S# CITY S# STATUS The problem with B is that the dependency has become an inter-relational constraint. R1 F1 R (F) R2. Rn F2. Fn 41 42
Let F = F1 F2.. Fn In general F F. But if F + = F +, then to check F we only need to check F. Definition: Dependency-Preserving A decomposition R1,..., Rn of R is dependencypreserving if and only if F + = F +, where F and F are defined as above. Example: In the NEWS example A is dependency preserving, but B is not. 43 44 Example: NEWS (S #, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY In A F supplier ={S# SNAME, CITY} F city-info = { } A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) 45 So F = F supplier F city-info. So clearly F + =F +. 46 In B F supplier ={S# SNAME, CITY} F Status-info = { S# STATUS} So F = F supplier F Status-info. CITY STATUS is in F + but not in F +. So F + F +. So far we have done: How do we make sure the decompositions are lossless (equivalence preserving)? Are some decompositions better than others? What if there are multiple candidate keys? 47 48
Generalising 2NF and 3NF: Boyce-Codd Normal Form (BCNF) For 2NF Get rid of: A B C D E F 49 50 For 3NF Get rid of : What we want: D A B C D E D F A B C E F G H 51 52 With 2NF and 3NF: assumed that the relation has one candidate key. Now we generalise to cater for multiple candidate keys. This more general normal form is called the Boyce- Codd Normal Form (BCNF). CK1 CK2 CK3 attribute1 attribute2 attribute1 attribute3 attribute2 attribute4 attribute3 53 54
Definition: Determinant A determinant is any attribute, or set of attributes, on which some other attribute is fully functionally dependent. Example: R(A,B,C,D,E) AB C A B C D DE A Here A and C are determinants. There are 2 others. Can you see what they are? Definition: BCNF A relation is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a candidate key. Any relation can be nonloss decomposed into a collection of BCNF relations. 55 56 Example: Enrols(Student #, Subject, Teacher) Teacher Subject (student #, subject) Teacher 57 An Instance of the Relation Scheme Enrols Student # Subject Teacher 100 maths smith 101 maths jones 102 maths smith 103 maths smith 104 physics brown 101 physics brown 100 physics green 58 Each student is taught by several teachers. Each teacher teaches only one subject. Each student takes several subjects and has only one teacher for a given subject. What are the candidate keys of Enrols? What normal form is Enrols in? What problems do you see in the design of Enrols? 59 60
Candidate keys of Enrols are (Teacher, Student#) (Subject, Student#) Teacher Student# Subject Student# Subject Teacher 61 62 Problems with Enrols We cannot insert the fact that a teacher teaches a certain subject until at least one student enrols for that subject. The fact that a teacher teaches a certain subject is recorded with a lot of redundancy, for every student to whom he teaches that subject. Teacher Student# Subject Student# Subject Teacher 63 64 Solution Decompose Enrols into Courses(Teacher, Subject) Students(Student #,Teacher) In Enrols: Teacher is a determinant, but not a candidate key. In Courses(Teacher, Subject) : The only dependency is Teacher Subject. So Teacher is the only determinant. It is also the only candidate key. 65 66
Exercise In Students(Student#,Teacher) Teacher Subject (student #, subject) Teacher No determinant. So BCNF. WHY?? Only Candidate Key: (Student#,Teacher) What are the candidate keys and the determinants of Students? 67 68 Exercise Is the decomposition lossless? Is the decomposition dependency preserving? Enrols (Student #, Subject, Teacher) Courses(Teacher, Subject) Students(Student#,Teacher) Teacher Subject (student #, subject) Teacher Exercise S (S#, Sname, Status, City) with FDs S # Status, City, Sname Sname City, Status, S# Is S in BCNF? 69 70 Determinants: In S: S# and Sname Exercise SSP (S#, Sname, P#, Qty) Candidate keys: S# and Sname So all determinants are candidate keys. So S is in BCNF. 71 with FDs S# Sname Sname S# S#, P# Qty Sname, P# Qty Is SSP in BCNF? 72
In SSP: Determinants: S# and Sname Candidate keys: (S#, P#) and (Sname, P#) SSP Decomposing SSP to BCNF relations S1(S#, Sname) S2(S#,P#,Qty) Lossless? Dependency Preserving? So there are determinants that are not candidate keys. So S is not in BCNF. SSP R1(S#, Sname) R2(Sname,P#,Qty) Lossless? Dependency Preserving? 73 74 An Algorithm for BCNF Decomposition Input: A relation R, the closure, F +, of the set of functional dependencies on R. Output (result): A set of relations R i, such that each R i is in BCNF and the decomposition of R into the R i is lossless. 75 begin result : = { R } ; done : = false ; while (not done) do if (there is a scheme Ri in result that is not in end; BCNF) then begin let A B be a nontrivial ffd that holds on Ri, such that A is not a candidate key of Ri; result : = (result - Ri) (Ri - B) (A, B); end else done : = true ; 76 Same algorithm we have been using for 2NF and 3NF begin result : = { R } ; done : = false ; while (not done) do if (there is a scheme Ri in result that is not in the required normal form) then begin let A B be an fd that holds on Ri, that shows Ri is not in the required normal form; result : = (result - Ri) (Ri - B) (A, B); end else done : = true ; end; 77 Normalisation - Conclusion Objectives of normalisation: Eliminate redundancy Avoid update anomalies (From 5NF upwards) Simplify the enforcement of certain integrity constraints 78
Some Limitations of Normalisation Full normalisation not always desirable. Example: Customer(Name,Street,City, Postcode) Postcode City, Street So Customer is not in 3NF. Normalisation often facilitates update, but tends to have an adverse effect on query evaluation. Related data which may have been retrievable from one relation in an unnormalised schema may have to be retrieved from several relations in the normalised form. 79 80 Decomposition into normal forms is not always unique. But there is not much guidance which decomposition to choose. 81