INF1383 -Bancos de Dados Prof. Sérgio Lifschitz DI PUC-Rio Eng. Computação, Sistemas de Informação e Ciência da Computação Projeto de BD e Formas Normais Alguns slides são baseados ou modificados dos originais dos livros Elmasri and Navathe, Fundamentals of Database Systems Pearson Education, Inc. e Database System Concepts, McGraw Hill Silberschatz, Korth and Sudarshan Slide 1 Relational Database Design Finding database schemes with good properties Example: Database for information on suppliers, parts supplied, and shipments Information consists of: S#: supplier number SNAME: supplier name SCITY: supplier city P#: part number PNAME: part name PCITY: city where part is stored QTY: quantity of shipment Slide 2
Relational Database Design One possible database schema BAD [S#, SNAME, SCITY, P#, PNAME, PCITY, QTY] Why BAD is bad: redundancy SCITY is determined by S# However, the same SCITY could appear many times with the same S# in this relation Functional dependency: S# SCITY update anomalies SCITY could be changed in one place but not in another, for the same S#, resulting in inconsistency Slide 3 Relational Database Design Why BAD is bad: insertion anomalies cannot record supplier info unless that supplier supplies a part deletion anomalies by deleting parts we can lose supplier info Solution New schema: S[S#, SNAME, SCITY] Key: S# P[P#, PNAME, PCITY] Key: P# SP[S#, P#, QTY] Key: S# P# No other functional dependencies besides key dependencies Normal forms: nice forms for schemas Slide 4
Decomposing Relation Schemes Let R be a relation scheme A decomposition of R is a set ρ={r 1,, R k } of relation schemes such that: att(r) = k i = 1 att(r i ) Example {S, P, SP} is a decomposition of the relation schema BAD (see earlier example) Slide 5 Conditions for a Reasonable Decomposition Do S, P, SP contain the same info as BAD? BAD = π S (BAD) >< π P (BAD) >< π SP (BAD)? Dependencies for BAD can be enforced by local dependencies on S, P, SP? Let F be the set of FDs for BAD F + denotes the set of FDs implied by F π X (F + ) = {V W F + VW X} Is π S (F + ) π P (F + ) π SP (F + ) equivalent to F? Slide 6
Functional-Dependency Theory We now consider the formal theory that tells us which functional dependencies are implied logically by a given set of functional dependencies. We then develop algorithms to generate lossless decompositions into BCNF and 3NF We then develop algorithms to test if a decomposition is dependency-preserving Slide 7 Functional Dependencies Functional dependency over R: expression X Y where X, Y att(r) A relation R satisfies X Y iff whenever two tuples in R agree on X, they also agree on Y e.g. SCHEDULE THEATER TITLE Roxy Artplex Tropa Rambo Satisfies THEATER TITLE SCHEDULE THEATER TITLE Roxy Tropa Artplex Rambo Artplex Platoon Violates THEATER TITLE, satisfies TITLE THEATER 8 Slide 8
Lossless-join Decomposition For the case of R = (R 1, R 2 ), we require that for all possible relations r on schema R r = R1 (r ) R2 (r ) A decomposition of R into R 1 and R 2 is lossless join if and only if at least one of the following dependencies is in F + : R 1 R 2 R 1 R 1 R 2 R 2 Slide 9 Dependency Preservation Let F i be the set of dependencies F + that include only attributes in R i. A decomposition is dependency preserving, if (F 1 F 2 F n ) + = F + If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive. Slide 10
Technical Problem Checking if X A is implied by a set F of fds (F = X A) Notation: F + = {X Y F = X Y} Needed to compute keys and check satisfaction of BCNF and other normal forms. Solution: closure of a set of attributes wrt F: X + X + = {A F = X A} (all attributes determined by X) So, X A F + iff A X + 11 Slide 11 Example of Closure Computation R = ABCDEF F = {A C, BC D AD E} X = AB X (0) = AB X (1) = ABC X (2) = ABCD X (3) = ABCDE X (4) = X (3) X + = ABCDE To check if X is key in R: X + = R A A o B o A o B o C o A B C D o o o o B C D o o o o E o 12 Slide 12
Computing the closure of a set of attributes wrt a set of fd s Let F be a set of fd s over R, and x R. The following computes the closure x + of x wrt FL 1. X (0) X 2. X (i+1) X (i) {Z V Z F, V X (i) } We get: X (0) X (1) X (i) X (i+1) att(r) Since att(r) is finite, there must exist a k 0 such that X (k) = X (k+1) (and thus, X (j) = X (k) j k). Then, X + = X (k) Slide 13 Dependency Preserving Decompositions Notation: if F is a set of fd s over R and X R then π X (F + ) = {V W F + VW X}. these are the fds that apply to the set of attributes X ( local to X) Definition: Let ρ = (R 1,, R k ) be a decomposition for R and F a set of fd s over R. Then ρ preserves F iff the set of local fds k i=1π Ri (F + ) is equivalent to F In other words, all fds in F are implied by k i=1π Ri (F + ) 14 Slide 14
Normal Forms Terminology: Let R be a relation schema and F a set of fd s over R. Key: X att(r) such that X att(r) F +. Minimal key: X att(r) s.t. X att(r) F + and there is no Y X such that Y att(r) F +. A att(r) is prime: A X where X is a minimal key A is non-prime: A is not a member of any minimal key. 15 Slide 15 Purpose of normal forms Eliminate problems of redundancy and anomalies. Normal forms: first, second, third, Boyce-Codd Boyce-Codd Normal Form (BCNF) A relation scheme R is in BCNF wrt a set of fd s F over R, iff whenever X A F + and A X, X is a key for R Only nontrivial dependencies are those induced by keys 16 Slide 16
Example BAD(S#, P#, SNAME, PNAME, SCITY, PCITY, QTY) not in BCNF wrt F = S# SNAME SCITY P# PNAME PCITY S# P# QTY S(S#, SCITY, SNAME) is in BCNF wrt S# SNAME SCITY P(P#, PCITY, PNAME) is in BCNF wrt P# PNAME PCITY SP(S# P# QTY) is in BCNF wrt S# P# QTY 17 Slide 17 Decomposition of a relation schema into BCNF relation schemas, with lossless join Any relation scheme has a decomposition into BCNF relation schemes, with lossless join not always dependency preserving Example R = CITY ZIP ST F = CITY ST ZIP, ZIP CITY R has no decomposition into BCNF schemas, which preserves F (CITY ST ZIP is never preserved) Slide 18
Algorithm to obtain a lossless join decomposition of a given R wrt F Start with ρ = {R} Apply recursively the following procedure: for each S in ρ not in BCNF, let X A F + be such that XA S, A X, X is not a key of S replace S by S 1 and S 2, where S 1 = XA S 2 = S A until all relation schemes are in BCNF Note S S 1 S 2 lossless join: X A F + X is a key for S 1 Slide 19 Note a) R R 1 R i R k Lossless join wrt F S i1 S i2 S im Lossless join (for R i ) wrt π Ri (F+) R 1 R R 2 R i-1 S i1 S im S i2 R k Lossless join wrt F b) If ρ is a lossless join decomposition of R wrt F and ρ ρ then ρ is a lossless join decomposition as well. Slide 20
Example R = C T H R S G C = course T = teacher H = hour R = room S = student G = grade F: C T HR C HT R CS G HS R only key: HS Slide 21 Example CTHRSG Key = HS, C T CS G, HR C, HS R, TH R key = CS CSG CS G CTHRS key = SH C T TH R HR C HS R CT key = C C T CHRS key = SH CH R HR C HS R CHR key = CH, CH R HR HR C CHS key = SH SH C Slide 22
Remarks (drawbacks) Decomposition is not unique CHRS could be decomposed into CHR and CHS or CHR and RHS The decomposition does not preserve TH R local fds: G = {CS G C T CH R HR C SH C} which does not imply TH R Slide 23 BCNF and Dependency Preservation It is not always possible to get a BCNF decomposition that is dependency preserving R = (J, K, L ) F = {JK L L K } Two candidate keys = JK and JL R is not in BCNF Any decomposition of R will fail to preserve JK L This implies that testing for JK L requires a join Slide 24
Efficiency Is there an efficient algorithm for BCNF decomposition? Most likely NO: It is NP-complete to decide whether a relation scheme R is in BCNF wrt a set of fd s. Any algorithm for BCNF is most likely exponential 25 Slide 25 3NF Problem with BCNF: Not every relation schema can be decomposed into BCNF relation schemas which preserve the dependencies and have lossless join. Third Normal Form A relation scheme R is in Third Normal Form wrt a set F of fd s over R, if whenever X A holds in R and A X then either X is a key or A is prime Weaker than BCNF 26 Slide 26
Third Normal Form: Motivation There are some situations where BCNF is not dependency preserving, and efficient checking for FD violation on updates is important Solution: define a weaker normal form, called Third Normal Form (3NF) Allows some redundancy (with resultant problems; we will see examples later) But functional dependencies can be checked on individual relations without computing a join. There is always a lossless-join, dependency-preserving decomposition into 3NF. Slide 27 Computing a 3NF decomposition Minimal covers A set F of fd s is minimal iff 1) X Y F Y is a single attribute 2) for no X A F is F {X A} equivalent to F (no fd can be removed from F) 3) if X A F then there is no proper subset Z of X such that F {X A} {Z A} is equivalent to F (attributes cannot be removed from fd s in F) Fact:. Every set of fd s is equivalent to some minimal set of fds (called a minimal cover). 28 Slide 28
Dependency preserving decompositions into Third Normal Form R a schema F minimal set of fd s over R ρ = {XA 1 A m X A i F} {B R B does not occur in F} Example R = CTHRSG (see earlier example) F = C T CS G HR C HS R (minimal) HT R Then ρ = {CT, HRC, HTR, CSG, HRS} Slide 29 3NF Decomposition Algorithm (Cont.) 3NF decomposition algorithm ensures: each relation schema R i is in 3NF decomposition is dependency preserving and lossless-join Slide 30
Formal Results Theorem. ρ preserves F and each R i in ρ is in 3NF wrt π Ri (F + ). To obtain decomposition that also has lossless join: add to ρ a set K where K is a minimal key for R unless there already is a piece in the decomposition whose attributes form a key for R Theorem ρ {K} is a 3NF decomposition which is dependency preserving and has lossless join. 31 Slide 31 Extraneous Attributes Consider a set F of FDs and the functional dependency α β in F. Attribute A is extraneous in α if A α and F logically implies (F {α β}) {(α A) β}. Attribute A is extraneous in β if A β and the set of functional dependencies (F {α β}) {α (β A)} logically implies F. Example: Given F = {A C, AB C } B is extraneous in AB C because {A C, AB C} logically implies A C (i.e. the result of dropping B from AB C). Example: Given F = {A C, AB CD} C is extraneous in AB CD since AB C can be inferred even after deleting C Slide 32
Testing if an Attribute is Extraneous Consider a set F of functional dependencies and the functional dependency α β in F. To test if attribute A α is extraneous in α 1. compute ({α} A) + using the dependencies in F 2. check that ({α} A) + contains A; if it does, A is extraneous To test if attribute A β is extraneous in β 1. compute α + using only the dependencies in F = (F {α β}) {α (β A)}, 2. check that α + contains A; if it does, A is extraneous Slide 33 Canonical (Minimal) Cover Sets of functional dependencies may have redundant dependencies that can be inferred from the others For example: A C is redundant in: {A B, B C} Parts of a functional dependency may be redundant E.g.: on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} E.g.: on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} Intuitively, a canonical cover of F is a minimal set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies Slide 34
Canonical (minimal) Cover A canonical cover for F is a set of dependencies F c such that F logically implies all dependencies in F c, and F c logically implies all dependencies in F, and No functional dependency in F c contains an extraneous attribute, and Each left side of functional dependency in F c is unique. To compute a canonical cover for F: repeat Use the union rule to replace any dependencies in F α 1 β 1 and α 1 β 2 with α 1 β 1 β 2 Find a functional dependency α β with an extraneous attribute either in α or in β If an extraneous attribute is found, delete it from α β until F does not change Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied Slide 35 Comparison of BCNF and 3NF It is always possible to decompose a relation into a set of relations that are in 3NF such that: the decomposition is lossless the dependencies are preserved It is always possible to decompose a relation into a set of relations that are in BCNF such that: the decomposition is lossless it may not be possible to preserve dependencies. Slide 36
Design Goals Goal for a relational database design is: BCNF. Lossless join. Dependency preservation. If we cannot achieve this, we accept one of Lack of dependency preservation Redundancy due to use of 3NF Interestingly, SQL does not provide a direct way of specifying functional dependencies other than superkeys. Can specify FDs using assertions, but they are expensive to test Even if we had a dependency preserving decomposition, using SQL we would not be able to efficiently test a functional dependency whose left hand side is not a key. Slide 37 A (simplified) overview of schema design 1. Choose attributes in R 2. Specify dependencies F (tool: Armstrong relations ) 3. Find a lossless-join, dependency preserving decomposition into Normal Form schemas BCNF, if possible 3NF, if not BCNF 38 Slide 38