Normal Forms Lossless Join http://users.encs.concordia.ca/~m_oran/ 1
Types of Normal Forms A relation schema R is in the first normal form (1NF) if the domain of its each attribute has only atomic values (No attribute is allowed to be composite or multi valued) Example: The following relation is not in 1NF Student (SID, SName, {(CourseId, CouseName, Grade)}) 2
Types of Normal Forms A relation schema R w.r.t. F is in 3NF if, for every FD X A in F, at least one of the following conditions holds: A X, that is, X A is a trivial FD, or X is a superkey, or If X is not a key, then A is part of some key of R To determine if a relation <R, F> is in 3NF: Check whether the LHS of each nontrivial FD in F is a superkey If not, check whether its RHS is part of any key of R 3
Types of Normal Forms A relation schema R w.r.t. F is in BCNF if, for every FD X A in F, at least one of the following conditions holds: A X, that is, X A is a trivial FD, or X is a superkey To determine whether R w.r.t. F is in BCNF Check whether the LHS X of each nontrivial FD in F is a superkey How? Simply compute X+ (w.r.t. F) and check if X+ = R 4
Consider R = {A, B, C, D, E, F, G} with a set of FDs F = {ABC DE, AB D, DE ABCF, E C} Is R in BCNF, 3NF, or neither? 5
Consider R = {A, B, C, D, E, F, G} with a set of FDs F = {ABC DE, AB D, DE ABCF, E C} Is R in BCNF, 3NF, or neither? 3NF: if, for every FD X A in F: A X, that is, X A is a trivial FD, or X is a superkey, or If X is not a key, then A is part of some key of R BCNF: For every FD X A in F: A X, that is, X A is a trivial FD, or X is a superkey 6
We first check if R w.r.t. F is in BCNF. Check whether the LHS X of each nontrivial FD in F is a superkey How? Simply compute X+ (w.r.t. F) and check if X+ = R Consider ABC DE ABC+ = ABCDEF R So, <R, F> is NOT in BCNF. 7
Is <R, F> in 3NF? We have to compute the candidate keys to determine whether <R, F> is in 3NF. 8
Tricks for Finding Keys using FDs F = {ABC DE, AB D, DE ABCF, E C} R = {A, B, C, D, E, F, G} If an attribute never appears on the RHS of any FD, it must be part of the key: G If an attribute never appears on the LHS of any FD, but appears on the RHS of any FD, it must not be part of any key. F We now see if G is itself a key: G+= G R; So, G alone is not key. 9
F = {ABC DE, AB D, DE ABCF, E C} R = {A, B, C, D, E, F, G} Now we try to find keys by adding more attributes (except F) to G Add LHS of FDs that have only one attribute (E in E C) GE+ = GEC R Add LHS of FDs that have two attributes (AB in AB D & DE in DE ABCF) GAB+ = GABD GDE+ = ABCDEFG = R; It s a key! Add LHS of FDs that have three attributes (ABC in ABC DE), but not taking super set of GDE: GABC+ = ABCDEFG = R; [ABC DE, DE ABCF] It s a key! GABE+ = ABCDEFG = R; [AB D, DE ABCF] It s a key! If we add any further attribute(s), they will form the superkey. Therefore, we can stop here. The candidate key(s) are {GDE, GABC, GABE} 10
Consider R = {A, B, C, D, E, F, G} with a set of FDs F = {ABC DE, AB D, DE ABCF, E C} Is R in 3NF? The candidate key(s) are {GDE, GABC, GABE} 3NF: if, for every FD X A in F: A X, that is, X A is a trivial FD, or X is a superkey, or If X is not a key, then A is part of some key of R 11
<R, F> is not in 3NF because: No FD is trivial. LHS of all FDs are not superkey. There is FD whose RHS is not a part of key (DE ABCF). So, <R, F> is NOT in 3NF either! 12
Binary Decomposition Binary Decomposition can be applied to: Decompose a non-bcnf relation into a collection of BCNF relations. Decompose a non-3nf relation into a collection of 3NF relations. 13
Basic Steps of Binary Decomposition Suppose X A F is a FD violating the BCNF (resp. 3NF) requirement, where X R and A R. Decompose R into XA and R A. R X A If either XA or R A is not in BCNF (resp. 3NF), decompose it further. XA R A 14
Binary Decomposition If R w.r.t. F is not in BCNF, we can always obtain a lossless-join decomposition of R into a collection of BCNF relations. However, it may not always be dependency preserving If R w.r.t. F is not in 3NF, we can always obtain a lossless-join decomposition and dependency preserving of R into a collection of 3NF relations. How? For each FD X A in Lost, create a relation schema XA and add it to R Refinement step: if there are several FD s with the same LHS, e.g., X A1, X A2,..., X Ak, we replace these k FD s with a single FD, X A1 Ak, and create just one relation with schema XA1 Ak. 15
Synthesis Approach (applicable for 3NF) Consider relation schema <R, F>, Get a canonical cover C of F For each FD X A in C, add schema XA to R If the decomposition R is not lossless, need to fix it by adding to R an extra relation schema containing just those attributes that form any key of R. 16
Review: Decomposition into 3NF Binary Decomposition Lossless-join May not be dependency preserving. If so, then add extra relations XA, one for each FD X A we lost Synthesis Approach Dependency preservation May not be lossless-join. If so, we need to add to R one extra relation schema that includes the attributes that form any key of R 17
Consider R = {A, B, C, D, E, F, G, H} with a set of FDs F = {CD A, EC H, GHB AB, C D, EG A, H B, BE CD, EC B} The candidate keys are: {BEFG, CEFG, EFGH} Is R w.r.t. F in 3NF? If not, decompose it into relations in 3NF using: 1. Binary Decomposition 2. Synthesis Approach 18
The candidate keys are: {BEFG, CEFG, EFGH} No, R w.r.t. F is NOT in 3NF, because CD A violates the 3NF requirements, i.e. CD A is not trivial FD CD is not a superkey CD is not a key, but A is not part of any key of R either 19
1. Binary Decomposition Approach Considering R, Keys ={BEFG, CEFG, EFGH}, F = {CD A, EC H, GHB AB, C D, EG A, H B, BE CD, EC B} Decomposition #1: CD A is a violating FD R is decomposed into R1 and R2: R1 (A,C,D): We need to project FDs F onto relation R1: A+ = A C+ = CDA (C DA) D+ = D AC+ = ACD (AC D) AD+ = AD CD+ = CDA (CD A) So, F1 = {C DA, AC D, CD A} R2 ( B,C,D,E,F,G,H): We need to project FDs F onto relation R2 Note that the only difference between R and R2 is attribute A. Attribute A has never appeared on LHS of any FD. So, removing it won t make any change in F2. So, F2={EC H, GHB B, C D, H B, BE CD, EC B} FDs that are lost in Decomposition #1 are {GHB A, EG A} 20
1. Binary Decomposition Approach Decomposition #1: R is decomposed into R1 and R2: R1 (A,C,D): F1 = {C DA, AC D, CD A} R2 ( B,C,D,E,F,G,H): F2={EC H, C D, H B, BE CD, EC B} Do we need further decomposition? 21
1. Binary Decomposition Approach Decomposition #1: R1 (A,C,D): F1 = {C DA, AC D, CD A} Since C+=ACD, C is a key. C (in C DA), CD (in CD A), and AC (in AC D) are key/super keys. Therefore, we have no violating FD. (So, we are done with this branch.) 22
1. Binary Decomposition Approach Decomposition #1: R2 ( B,C,D,E,F,G,H): F2={EC H, C D, H B, BE CD, EC B} Keys of R2 = Keys of R = {BEFG, CEFG, EFGH} EC H is not a violating FD, since H is part of a key. C D is a violating FD, since C is not a super key and D is not part of any key. So, further decomposition is needed. 23
1. Binary Decomposition Approach R2 ( B,C,D,E,F,G,H) F2={EC H, C D, H B, BE CD, EC B} Decomposition #2: C D is a violating FD R2 is decomposed into R21 and R22: R21 (C,D): We need to project FDs F2 onto relation R21: C+=CD D+=D So, F21={C D} R22 ( B,C,E,F,G,H): In general, we should project F2 onto R22. However, if we look carefully, we can easily see that the only difference between R2 and R22 is attribute D. Attribute D has never appeared on LHS of any FD. So, removing it won t make any change in F22. So, F22 = {EC H, H B, BE C, EC B} FDs that are lost in Decomposition #2 is {BE D} So, overall, we ve lost the following FDs: {GHB A, EG A, BE D} 24
1. Binary Decomposition Approach Decomposition #2: R21 (C,D): F21={C D} R22 ( B,C,E,F,G,H): F22 = {EC H, H B, BE C, EC B} Do we need further decomposition? 25
1. Binary Decomposition Approach Decomposition #2: R21 (C,D): F21={C D} Since C+=CD, C is a key. Therefore, we have no violating FD. (So, we are done with this branch.) R22 ( B,C,E,F,G,H): F22 = {EC H, H B, BE C, EC B} EC H is not a violating FD since H is part of a key. H B is not a violating FD since B is part of a key. BE C is not a violating FD since C is part of a key. EC B is not a violating FD since B is part of a key. So, we are done with this branch. 26
1. Binary Decomposition Approach Overall, we have: R1 (A, C, D) F1 = {C DA, AC D, CD A} R21 (C, D) F21 = {C D} R22 (B, C, E, F, G, H) F22 = {EC H, H B, BE C, EC B} Since R1 includes R21 we might want to remove R21. This is a loss-less join decomposition, but it is not dependency preserving. To make the decomposition dependency preserving, we need to add the lost FDs as new relations. The lost FDs are: {GHB AB, EG A, BE D} So, we add three relations: L1(A, B, G, H) FL1 = {GHB AB} L2(A, E, G) FL2 = {EG A} L3(B, D, E) FL3 = {BE D} 27
2. Synthesis Approach R = {A, B, C, D, E, F, G, H} with a set of FDs F = {CD A, EC H, GHB AB, C D, EG A, H B, BE CD, EC B} The candidate keys are {BEFG, CEFG, EFGH} In the last tutorial, we found that the canonical cover for F is: C= {C AD, EC H, GH A, EG A, H B, BE C} 28
2. Synthesis Approach C= {C AD, EC H, GH A, EG A, H B, BE C} Now, we create the relations: R1 = {A, C, D} F1 = {C AD} R2 = {E, C, H} F2 = {EC H} R3 = {A, G, H} F3 = {GH A} R4 = {A, E, G} F4 = {EG A} R5 = {B, H} F5 = {H B} R6 = {B, C, E} F6 = {BE C} Now, we need to check if at least one of the keys exists in the above relations. The candidate keys are {BEFG, CEFG, EFGH} Since none of these keys is in the relations, this decomposition is not lossless. So, we need to add an extra relation containing those attributes that form any key of R: R7 = {B, E, F, G} F7 = {} 29
Assume R(A, B, C, D, E, F, G) with the set of FDs F={C AD, E G, FG A, EF A, G B, BE C} is decomposed into the following relations. Check if this decomposition is lossless join. R1 = {A, C, D} R2 = {E, C, G} R3 = {A, F, G} R4 = {A, E, F} R5 = {B, G} R6 = {B, C, E} 30
Step1-Table Initialization A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} b2a b2b a b2d a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a b4g R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} b6a a a b6d a b6f b6g 31
Round1: Consider C AD A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} b2a b2b a b2d a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a b4g R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} b6a a a b6d a b6f b6g 32
Round1: Consider C AD A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a b2b a a a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a b4g R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f b6g 33
Round1: Consider E G A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a b2b a a a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a b4g R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f b6g 34
Round1: Consider E G A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a b2b a a a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 35
Round1: Consider FG A A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a b2b a a a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 36
Round1: Consider EF A A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a b2b a a a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 37
Round1: Consider G B A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a b2b a a a b2f a R3 = {A, F, G} a b3b b3c b3d b3e a a R4 = {A, E, F} a b4b b4c b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 38
Round1: Consider G B A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a a a a a b2f a R3 = {A, F, G} a a b3c b3d b3e a a R4 = {A, E, F} a a b4c b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 39
Round1: Consider BE C A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a a a a a b2f a R3 = {A, F, G} a a b3c b3d b3e a a R4 = {A, E, F} a a b4c b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 40
Round1: Consider BE C A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a a a a a b2f a R3 = {A, F, G} a a b3c b3d b3e a a R4 = {A, E, F} a a a b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 41
Round2: Consider C AD A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a a a a a b2f a R3 = {A, F, G} a a b3c b3d b3e a a R4 = {A, E, F} a a a b4d a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 42
Round2: Consider C AD A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a a a a a b2f a R3 = {A, F, G} a a b3c b3d b3e a a R4 = {A, E, F} a a a a a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 43
We don t need to continue since we found one row in the table with all cells having a. So, this is a lossless join. A B C D E F G R1 = {A, C, D} a b1b a a b1e b1f b1g R2 = {E, C, G} a a a a a b2f a R3 = {A, F, G} a a b3c b3d b3e a a R4 = {A, E, F} a a a a a a a R5 = {B, G} b5a a b5c b5d b5d b5f a R6 = {B, C, E} a a a a a b6f a 44
Dependency-Preserving Checking Let R,F, where F = {X1 Y1,, Xn Yn}. Let R ={ R1,,Rk} be a decomposition of R and Fi be the projection of F on R preserved TRUE for each FD X Y in F and while preserved == TRUE do begin compute X+ under F1... Fn; if Y X+ then {preserved FALSE; Exit }; end 45
Example Consider R = ( A, B, C, D ), F = { A B, B C, C D }. Is the decomposition R = {R1, R2} dependency-preserving, where R1= ( A, B ), F1 = { A B}, R2= ( A, C, D), and F2 = { C D, A D, A C }? Check if A B is preserved: Compute A+ under F1 F2={ A B} { C D, A D, A C} A+ = { A, B, D } Check if B A+ Yes, A B is preserved Check if B C is preserved Compute B+ under F1 F2={ { A B} { C D, A D, A C} B+ = { B } Check if C B+ No, B C is not preserved The decomposition is not dependency-preserving 46
http://users.encs.concordia.ca/~m_oran/ 47