Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Information Systems for Engineers Exercise 8 ETH Zurich, Fall Semester 2017 Hand-out 24.11.2017 Due 01.12.2017 1. (Exercise 3.3.1 in [1]) For each of the following relation schemas and sets of FD s, i) indicate BCNF (Boyce-Codd normal form) violations and ii) decompose the relations, as necessary, into collections of relations that are in BCNF (Hint: use Algorithm 3.20 from [1]): (a) R(A, B, C, D) with FD s: A, B C D A i. Solution: First, we compute the closure of each left-hand side of the FD s: {C} + = A, C, D {D} + = A, D The second and the third FD s violate the BCNF, as they are not superkeys. ii. Solution: By opting to use the violating dependency, we split the initial relation into R 1 (A, C, D) and R 2 (B, C). Relation R 2 has only 2 attributes and is in BCNF. The projection of the FD s on relation R 1 can be found from the closures of the subsets of its attributes (on relation R): {A} + = A {C} + = A, C, D {D} + = A, D {A, C} + = A, C, D {A, D} + = A, D {C, D} + = A, C, D From the above, the following non-trivial FD s can be inferred (after ignoring 1

attributes that are not in R 1 ): C A D A A minimal basis is: D A While {C} is a superkey, {D} is not and therefore, the relation R 1 is not in BCNF. Based on this violation we decompose the relation into R 3 (A, D) and R 4 (C, D). Since both of them have only two attributes and are in BCNF, we are done. Notice that the FD s A, B C and D A are not preserved. iii. Solution: It is interesting to see that, although the FD s are not preserved, it is still possible to recreate the original relation as R = R 1 R 3 R 4. For example, let s assume that this is an instance of R: A B C D a 1 b 1 c 1 d 1 R = a 1 b 2 c 1 d 1 a 1 b 3 c 2 d 2 a 2 b 3 c 3 d 3 a 2 b 4 c 4 d 3 Using set semantics, i.e. removing duplicate tuples, the projection of R into R 1, R 3 and R 4 is: R 1 = B C C D A C b c 1 d 1 a 1 c 1 c 1 1 b c 2 d 2, R 3 = a 1 c 2, R 4 = 2 c 1 b c 3 d 3 a 2 c 3 c 2 3 b c 4 d 3 a 2 c 3 c 3 4 b 4 c 4 R 1 R 3 = R 5 = A B C D A C D a a 1 c 1 d 1 b 1 c 1 d 1 1 a a 1 c 2 d 2, R 5 R 4 = 1 b 2 c 1 d 1 a a 2 c 3 d 1 b 3 c 2 d 2 3 a a 2 c 4 d 2 b 3 c 3 d 3 3 a 2 b 4 c 4 d 3 = R On the other hand, if the original relation is decomposed without following the algorithm, there is no guarantee that it can be reconstructed. As an example, if R is decomposed to the relations S 1 (A, B), S 2 (A, C) and S 3 (A, D) the following problem occurs: Page 2

S 1 = A B a 1 b 1 a 1 b 2 a 1 b 3 a 2 b 3 a 2 b 4, S 2 = A C a 1 c 1 a 1 c 2 a 2 c 3 a 2 c 4, S 3 = A D a 1 d 1 a 1 d 2 a 2 d 3 A B C a 1 b 1 c 1 a 1 b 1 c 2 a 1 b 2 c 1 a 1 b 2 c 2 S 1 S 2 = a 1 b 3 c 1 a 1 b 3 c 2 a 2 b 3 c 3 a 2 b 3 c 4 a 2 b 4 c 3 a 2 b 4 c 4 The circled rows do not exist in the given instance of R and violate the FD A, B C. (b) R(A, B, C, D) with FD s: B C B D i. Solution: We compute the closure of the left-hand sides of the FD s: {B} + = B, C, D Since A is not in the closure, there is a violation of the BCNF. ii. Solution: We decompose the relation R to R 1 (B, C, D) and R 2 (A, B). Relation R 2 has only two attributes and is therefore in BCNF. We project the FD s to relation R 1 : {B} + = B, C, D {C} + = C {D} + = D {C, D} + = C, D Notice, that we do not need to compute the closures of subsets such as {B, C}, {B, D} because they cannot add any other FD s. The following nontrivial FD s can be inferred: B C, D B is a superkey of R 1, therefore the relation is in BCNF. Notice, that in this case it happened the the original FD was preserved. Page 3

(c) R(A, B, C, D) with FD s: A, B C B, C, D A A, D B i. Solution: We compute the closure of the left-hand sides of the FD s: {B, C} + = A, B, C, D {C, D} + = A, B, C, D {A, D} + = A, B, C, D This conforms to the BCNF, so no action is necessary. ii. Solution: No action necessary. (d) R(A, B, C, D) with FD s: A B B C D A i. Solution: We compute the closure of the left-hand sides of the FD s: {A} + = A, B, C, D {B} + = A, B, C, D {C} + = A, B, C, D {D} + = A, B, C, D This conforms to the BCNF, so no action is necessary. ii. Solution: No action necessary. (e) R(A, B, C, D, E) with FD s: A, B C D, E C B D Page 4

i. Solution: We compute the closure of left-hand sides of the FD s: {D, E} + = C, D, E {B} + = B, D All three FD s violate the BCNF. ii. Solution: By selecting the first FD, we decompose R to R 1 (A, B, C, D) and R 2 (A, B, E). We project the FD s to relation R 1 : {A} + = A {B} + = B, D {C} + = C {D} + = D A, B C, D B D A, B C B D... The second FD violates the BCNF because B is not a superkey. Therefore, we decompose R 1 into the relations R 3 (B, D) and R 4 (A, B, C). R 3 has two attributes and is in BCNF. The projection of the FD s of R 1 (and not R) into R 4 is: {A} + = A {B} + = B, D {C} + = C A, B C... {A, B} is a superkey of R 4 and the relation is therefore in BCNF. If we compute the closure of the subsets of attributes in R 2 we do not get any non-trivial FD s. Because of this, relation R 2 is also in BCNF. Notice, that in this case the FD D, E C is not preserved. (f) R(A, B, C, D, E) with FD s: A, B C D B D E i. Solution: We compute the closure of the left-hand sides of the FD s:, E {C} + = B, C, D, E {D} + = B, D, E The second and third FD s violate the BCNF. Page 5

ii. Solution: By selecting the second FD,, we decompose R to R 1 (B, C, D, E) and R 2 (A, C). Relation R 2 has two attributes and is in BCNF. We project the FD s to relation R 1 : {B} + = B {C} + = B, C, D, E {D} + = B, D, E {E} + = E C B, D, E D B, E D B, E... The second FD violates the BCNF because D is not a superkey. Therefore, we decompose R 1 into the relations R 3 (B, D, E) and R 4 (C, D). R 4 has two attributes and is in BCNF. The projection of the FD s of R 1 (and not R) into R 3 is: {B} + = B {D} + = B, E {E} + = E D B, E... {D} is a superkey of R 3 and the relation is therefore in BCNF. Notice, that in this case the FD A, B C is not preserved. 2. Show that the transitivity, reflexivity and augmentation rules can be used to implement the combining rule. Solution: We consider two relations: A B A C Use the augmentation rule with C for the first relation, and with A for the second relation: A, C B, C A A, C Now use the transitivity rule (A A, C B, C): A B, C This is exactly the result we would get by applying the combining rule to the first two relations. 3. Show that the transitivity, reflexivity and augmentation rules can be used to implement the splitting rule. Solution: Consider the relation: A B, C Page 6

Use reflexivity to construct the following two relations: B, C B B, C C We can now use transitivity to get (A B, C X): A B A C This is exactly the result we would get by applying the splitting rule to the first relation. 4. (Exercise 3.4.1 in [1]) Let R(A, B, C, D, E) be decomposed into relations with the following three sets of attributes: {A, B, C}, {B, C, D}, and {A, C, E}. For each of the following sets of FD s, use the chase test to tell whether the decomposition of R is lossless. For those that are not lossless, give an example of an instance of R that returns more than R when projected onto the decomposed relations and rejoined. Solution: The tableau for the relation R and the proposed decomposition is: (a) B E and C, E A: Solution: Using B E, the tableau changes to: a 1 b c d e 1 Using C, E A, the tableau changes to: a b c d e 1 There is no other FD to apply and there is no row with unsubscripted symbols. Therefore, the decomposition is not lossless. An example showcasing the problem is the following natural join: R 1 R 3 = A B C a b c a 1 b c a b 1 c A C E a c e 1 a 1 c e 2 a c e = A B C E a b c e 1 a b c e a 1 b c e 2 a b 1 c e 1 a b 1 c e Page 7

In the resulting relation, the FD B E is violated. (b) A, C E and B, : Solution: Using A, C E, the tableau changes to: a b c d 1 e Using B,, the tableau changes to: a b c d e a b c d e 2 The first row has no subscripted symbols and the decomposition is lossless. (c) A D, D E and B D: Solution: Using A D, the tableau changes to: Using D E, the tableau changes to: a b c d 1 e Using B D, the tableau changes to: a b c d e The first row has no subscripted symbols and the decomposition is lossless. (d) A D, C, D E and E D: Solution: Using A D, the tableau changes to: Page 8

Using C, D E, the tableau changes to: a b c d 1 e There are no further changed to be done. There is also no row with unsubscripted symbols. Therefore, the decomposition is not lossless. An example showcasing the problem is the following natural join: R 2 R 3 = B C D b c d 1 b c d b 1 c d 2 A C E a c e 1 a 1 c e 2 a c e In the resulting relation, the FD E D is violated. = a b c d e 1 1 a 1 b c d 1 e 2 a 1 b 1 c d 2 e 2 a b c d 1 e a b c d e 5. (Exercise 3.4.2 in [1]) For each of the sets of FD s in Exercise 3.4.1, are dependencies preserved by the decomposition? (a) B E and C, E A: Solution: B E is not preserved. (b) A, C E and B, : Solution: All FD s are preserved. (c) A D, D E and B D: Solution: A D and D E are not preserved. (d) A D, C, D E and E D: Solution: None of the FD s is preserved. References [1] Garcia-Molina, Hector and Ullman, Jeffrey D. and Widom, Jennifer, Database Systems: The Complete Book, Second Edition. Page 9