Relational Design: Characteristics of Well-designed DB 1. Minimal duplication Consider table newfaculty (Result of F aculty T each Course) Id Lname Off Bldg Phone Salary Numb Dept Lvl MaxSz 20000 Cotts 103 DuPont 1234 45000 867 EE 5 25 20000 Cotts 103 DuPont 1234 45000 652 CIS 5 25 00333 Garth 423 DuPont 4321 87000 323 EE 3 25 00333 Garth 423 DuPont 4321 87000 413 EE 4 25 55555 Jones 211 Ewing 9876 55000 230 MATH 2 60 20001 Clarke 103 DuPont 1235 96000 120 AST 1 60 20001 Clarke 103 DuPont 1235 96000 450 AST 4 15... 2. Represent all info in specs 3. Prevent info from being lost Duplication can be minimized by decomposition: Consider tables newfac: Id Lname Off Bldg Phone Salary 20000 Cotts 103 DuPont 1234 45000 00333 Garth 423 DuPont 4321 87000 55555 Jones 211 Ewing 9876 55000 20001 Clarke 103 DuPont 1235 96000... and newcourse: Off Bldg Numb Dept Lvl MaxSz 103 DuPont 867 EE 5 25 103 DuPont 652 CIS 5 25 423 DuPont 323 EE 3 25 423 DuPont 413 EE 4 25 211 Ewing 230 MATH 2 60 103 DuPont 120 AST 1 60 103 DuPont 450 AST 4 15... Now, consider their join newf ac newcourse: Id Lname Off Bldg Phone Salary Numb Dept Lvl MaxSz 20000 Cotts 103 DuPont 1234 45000 867 EE 5 25 20000 Cotts 103 DuPont 1234 45000 652 CIS 5 25 20000 Cotts 103 DuPont 1234 45000 120 AST 1 60 20000 Cotts 103 DuPont 1234 45000 450 AST 4 15 00333 Garth 423 DuPont 4321 87000 323 EE 3 25 00333 Garth 423 DuPont 4321 87000 413 EE 4 25 55555 Jones 211 Ewing 9876 55000 230 MATH 2 60 20001 Clarke 103 DuPont 1235 96000 120 AST 1 60 20001 Clarke 103 DuPont 1235 96000 450 AST 4 15 20001 Clarke 103 DuPont 1235 96000 450 EE 5 25 20001 Clarke 103 DuPont 1235 96000 450 CIS 5 25... 1
Relational Design: Functional Dependencies (FDs) FD expresses a constraint on values of a set of attributes imposed by another set Formally: Let X, Y R Y is functionally dependent on X, denoted X Y, iff for all tuples t 1, t 2 r(r), t 1 [Y ] = t 2 [Y ] whenever t 1 [X] = t 2 [X] Denoted X Y Relational Design: Closure of a Set of FDs Generally, given a set of FDs, there are additional FDs that can be derived from the set The closure of a set FDs F : Set of FDs derivable from F Denoted F + Armstrong s Axioms allow computation of F + 1. Reflexivity rule: Given set of attributes X and Y X, then X Y 2. Augmentation rule: If X Y, and Z is a set of attributes, then XZ Y Z (OR, If X Y, then XZ Y ) 3. Transitivity rule: If X Y, and Y Z, then X Z Supplemental axioms: 1. Union rule: If X Y, and X Z, then X Y Z 2. Decomposition rule: If X Y Z, then X Y and X Z 3. Pseudotransitivity rule: If X Y, and Y W Z, then XW Z Full family of FDs: A set of FDs F is said to be a full family of FDs if F = F + Closure of X under F Let X be a set of attributes Relational Design: Closure of a Set of Attributes Closure of X under F is the set of attributes functionally dependent on X as determined by a set of FDs F when applied to X Denoted X + To determine X + with respect to F : X + X do { oldx + X + for (each Y Z F ) if (Y X + ) X + X + Z } until (oldx + = X + ) 2
K schema R is a superkey of R if for all tuples t 1, t 2 r(r), t 1 = t 2 whenever t 1 [K] = t 2 [K] I.e., K R Full functional dependency: Y is fully functionally dependent on X in FD X Y if there is no subset of X on which Y is dependent I.e., for any Z X, Z Y X is said to be irreducible C R is a candidate key of R if C R and C is irreducible To find a key K for scheme R: K R for (each a i K) { T (K a i ) + if ((K a i ) + = R K K a i } Given sets of FDs F and G F covers G if G F + Relational Design: Equivalence of sets of FDs F and G are equivalent if F covers G and G covers F I.e., F + G + To determine whether X Y F +, compute X + WRT F If Y X +, then X Y F + To determine equivalence of F and G 1. For each X Y F, compute X + WRT G (a) If Y X +, then X Y G + (b) If fail for any FD, stop. G does not cover F 2. For each A B G, compute A + WRT F (a) If B A +, then A B F + (b) If fail for any FD, stop. F does not cover G 3. If succeed for all FDs, F G 3
Minimal cover of a set of FDs F : Smallest set of FD s that is equivalent to F Denoted F c Minimal cover has the following properties 1. Every FD has a single attribute on right side 2. No left-hand side has extraneous attributes Relational Design: Minimal (Canonical) Covers I.e., every left-hand side is irreducible a is extraneous in X if (F c (X y)) ((X a) y) F 3. No FD is redundant; I.e., X y is redundant if (F c (X y)) F To compute the minimal cover G for F : 1. G F 2. For each FD of the form X a 1, a 2,..., a n, replace by X a 1, X a 2,..., X a n 3. For each FD X a G, delete all extraneous attributes 4. Delete each redundant FD X a from G Relational Design: Decomposition A decomposition of a scheme R is a set of subschemas derived from R Formally: Let R be a relational scheme Then {R 1, R 2,..., R n } is a decomposition of R if R 1 R 2... R n = R Formally: Given Relational Design: Lossless Join Decomposition 1. Scheme R, 2. Relation r(r), 3. Decomposition D = {R 1, R 2,..., R n }, and 4. Relations r 1 (R 1 ), r 2 (R 2 ),..., r n (R n ), where r 1 = π R1 (R) Then D is a lossless join decomposition of R if r 1 r 2... r n = r A decomposition of R is lossless if either 1. R 1 R2 (R 1 R 2 ) 2. R 1 R2 (R 2 R 1 ) 4
Algorithm to determine whether decomposition is lossless: Given 1. A set of FDs F, 2. schema R(A 1, A 2,..., A n ), and 3. decomposition D = R 1, R 2,..., R k Steps: Consider 1. Construct table with n columns and k rows Rows correspond to k subschemas R i Columns correspond to n attributes A j 2. In table[i, j], put a j if A j R i Otherwise, put b ij 3. For each FD α β F Look for 2 rows that have matching values for every A j α Set the column values that correspond to the attributes in β to the same values for these 2 rows The goal is to replace b ij with a j 4. Continue until either (a) No more changes can be made, or (b) A row contains α 1, α 2,..., α n 5. If a row contains α 1, α 2,..., α n, The decomposition is lossless R Snum City Status s1 London 20 s2 Paris 10 s3 Paris 10 s4 London 20 and FDs Snum City City Status Now consider the following decompositions Relational Design: Dependency Preservation - Motivation 1. 2. S1 Snum City s1 London s2 Paris s3 Paris s4 London T1 Snum City s1 London s2 Paris s3 Paris s4 London S2 City Status London 20 Paris 10 T2 Snum Status s1 20 s2 10 s3 10 s4 20 Both decompositions are LLJ 5
Suppose you wanted to insert the data (s5, London, 30) into each decomposition For decomposition S This would require inserting 1. < s5, London > into S1, and 2. < London, 30 > into S2 The insert into S2 would violate FD 2 For decomposition T This would require inserting 1. < s5, London > into T 1, and 2. < s5, 30 > into T 2 The fact that FD 2 is violated is not obvious from an examination of the individual tables The only way to determnine whether FD 2 is violated in T is to join T 1 and T 2 Restriction of set of FDs Given 1. set of FDs F, 2. schema R, 3. decomposition D = {R 1, R 2,...} Relational Design: Dependency Preservation The restriction of F to R i is the set of FDs in F + that are wholly contained in R i I.e., X Y F i if X Y R i and X Y F Denoted F i Let F = n i=1 F i Generally, F F But if F + F +, checking against F is equivalent to checking against F Dependency preservation A decomposition is dependency preserving if F + F + Rissamon s Theorem: A decomposition {R 1, R 2 } of R is DP if 1. {F 1 F 2 } + = F + 2. R 1 R 2 is candidate key of R 1 or R 2 Given scheme R, decomposition D = R 1, R 2,...t, and F To determine dependency preservation of F compute F + for (each R i in D) F i restriction of F to R i F F i compute F + if (F + = F + ) return TRUE else return FALSE Relational Design: Dependency Preservation - Algorithms 6
To determine dependency preservation of α β in F oldresult φ result α while (oldresult!= result) { oldresult result for (each R i ) { I result R i C I + T C R i result result T } } if (β in Result) return TRUE else return FALSE Relational Design: Normalization - Intro Normal form is a set of constraints on a DB schema Normal forms: Form Alt Name Restrictiveness Duplication 1NF least most 2NF 3NF Boyce-Codd NF 4NF 5NF Project-Join 6NF Domain-Key most least Except for 1NF, normal forms based on dependencies (2, 3, BNF: FDs; 4: MVDs; 5: JDs; 6: DK) A schema R is in 1NF if all attributes are atomic Composites: flatten (ala ER-RM mapping) Multivalued: Relational Design: Normalization - 1NF 1. Decompose into 2 tables (ala ER-RM mapping): R 1 contains PK + MV attribute R 2 contains R MV attribute 2. Use 1 table: For each key, have one row for each value of the MV attribute 7
Relational Design: Normalization - 2NF A non-prime attribute is not part of a candidate key An attribute is fully dependent on a set of FDs if it is not dependent on a subset of those attributes A schema R is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on the PK of R Alternative definition: No non-prime attribute is partially dependent on the PK To normalize to 2NF: 1. For every schema R in which FD X Y F + violates 2NF (a) Replace schema R with schemas i. R 1 = X Y ii. R 2 = R Y Transitive dependency Relational Design: Normalization - 3NF Y is transitively dependent on PK X if there is a Z such that 1. X Z, 2. Z Y, and 3. Z P K X Y is a transitive dependency on the PK A schema R is in 3NF if it is in 2NF and every non-prime attribute is non-transitively dependent on the PK of R To normalize to 3NF: 1. For every table R in which FD Z Y violates 3NF (a) create 2 tables: i. R 1 = Z Y ii. R 2 = R Y Codd s definition of 3NF is based on 2NF Given schema R and set of FDs F, create a 3NF decomposition directly from 1NF by: 1. Find minimal cover F c of F 2. For each unique set of attributes appearing on the lefthand side of an FD X Y i F c (a) Create a schema consisting of X n i=1 Y i 3. Create a schema containing any attributes of R not included in the previous step 4. If none of the schemas created contain a candidate key (a) Create a schema containing a candidate key Resulting DB schema guaranteed to be 1. Lossless join 2. Dependency preserving Not unique 8
Relational Design: Normalization - General Definitions for 2NF and 3NF A schema R is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on every CK of R A schema R is in 3NF if it is in 2NF and every non-prime attribute is non-transitively dependent on every CK of R Alternative 3NF def: A schema R is in 3NF if for every non-trivial FD X Y F +, either 1. X is a superkey of R, or 2. Every attribute in Y is prime Relational Design: Normalization - BCNF A schema R is in BCNF if every attribute is fully functionally dependent on every CK of R Alternative BCNF def: A schema R is in BCNF if, for every non-trivial FD X Y F +, X is a superkey of R To normalize to BCNF: 1. For every schema R in which FD X Y F + violates BCNF (a) Replace schema R with schemas i. R 1 = X Y ii. R 2 = R Y Resulting DB schema guaranteed to be Lossless join Resulting DB schema not guaranteed to be Dependency preserving Unique 9