Functional Dependencies Functional Dependencies Framework for systematic design and optimization of relational schemas Generalization over the notion of Keys Crucial in obtaining correct normalized schemas 1
Definitions In any relation R, if there exists a set of attributes A 1, A 2, A n and an attribute B such that if any two tuples have the same value for A 1, A 2, A n then they also have the same value for B. A functional dependency (FD) of the above form is written as: A 1, A 2, A n B Functional dependencies define properties of the schema and not of any particular instance. The dependency must hold for all tuples in the schema. Definitions If A 1, A 2, A n can uniquely determine many attributes, they can all be clubbed together in one expression. A 1, A 2, A n B 1 A 1, A 2, A n B 2 A 1, A 2, A n B 3 A 1, A 2, A n B m A 1, A 2, A n B 1 B 2 B 3 B m 2
Definitions Keys revisited: If a subset of attributes can uniquely determine the entire tuple, then they are called super keys. Minimal super keys and candidate keys can be defined analogously. Functional Dependencies Consider the relation: Movies (title, year, length, filmtype, studio, star) We can identify some FDs as the following: title, year length title, year filmtype However, note that title, year star may not always be true! 3
Reasoning about FDs Transitivity: Example: In any relation R, if A B and B C, then the FD A C also holds for R. If Employee_Number Job and Job Salary, then Employee_Number Salary Reasoning about FDs Two FDs S = A B and T = C D are said to be equivalent if the set of relation instances satisfying S is the same as the set of relation instances satisfying T. We say that S follows T, if the set of all relation instances satisfying T also satisfies S. FDs S and T are equivalent, if S follows T and T follows S. 4
Trivial Functional Dependencies Note that in the Movies relation: title, year title An FD where the right hand side is contained within the left hand side is called a trivial FD. If there is at least one element on the RHS that is not contained in the LHS, it is called non-trivial, and if none of the elements of the RHS are contained in the LHS, it is called completely non-trivial FD. Closure of FDs In any relation R, let A be a set of attributes of R. The closure of FDs defined by A, is the set of all attributes that are eventually defined by A. Let: A B; B C, D; B D E; Then, closure(a) = A B C D E 5
Adding attributes to closure(a): Closure of FDs Let A closure(a) and A F, then closure(a) = closure(a) F Computing closure of FDs Given a relation R and a set of attributes A, closure(a) is computed by the following algorithm: 1. Initially closure(a) = A 2. For every A A, if there exists an FD of the form A B and B A, then closure(a) = closure(a) B 3. Repeat step 2 until no more attributes can be added to closure(a) The closure of a set of attributes A is denoted by A +. Note that if A + is the set of all attributes of R, then A is a super-key of R. 6
Inferred FDs In a relation R, suppose A, B, C and D be sets of attributes of R such that: A B; B C; and C D Also let D A D such that D A A and let D = D D A. Given this, we can infer a non-trivial FD: A D. FDs which are specified are called stated FDs, and FDs which are derived are called inferred FDs. Inferred FDs A given set of FDs from which the set of all FDs for a relation can be inferred, is called the basis of the relation. If the basis is such that no subset of the basis is also a basis, then it is said to be a minimal basis for the relation. 7
Armstrong s Axioms For computing the set of FDs that follow a given FD, the following rules called Armstrong s axioms are useful: 1. Reflexivity: If B A, then A B 2. Augmentation: If A B, then A C B C Note also that if A B, then A C B for any set of attributes C. 3. Transitivity: If A B and B C then A C Projecting FDs Let R be a relation and F(R) be the set of all FDs in R. Suppose relation S is projected from R, by removing some attributes. How can we infer F(S)? FDs that belong to F(S) are those which: 1. Follow from F(R) 2. Involve only attributes of S 8
Projecting FDs Given a relation R (A,B,C,D) and F(R) = {A B, B C, C D}. Suppose S is projected from R as S(A,C,D). What is F(S). To compute F(S), start by computing the closures of all attributes in S. In R, A + = {A B, A C, A D} In S, A + = {A C, A D} C + = {C D} and D + = {D} Since A + contains all attributes of S, it is not required to compute (AC) +, (AD) + or (ACD) +. Designing Relational Schemas In a carelessly designed relational schema, functional dependencies are improper. This leads to the following problems: 1. Redundancy: Information is repeated across tuples 2. Update anomalies: If information is repeated across tuples, then an update of any such information has to be performed across all tuples containing the information 3. Deletion anomalies: If information is repeated across tuples, deletion of information has to be performed across all these tuples. 9
Designing Relational Schemas Consider the Movie (title, year, length, studio, star) relation, where: title, year length title, year studio But title,year star need not be true. For each movie star of a given movie, the title, year, length and studio information has to be repeated. If any of these values have to be updated or deleted, they should consult all tuples where they occur. Decomposition Anomalies are removed from a relation R(A), by decomposing it into other relations S(B) and T(C) where B, C A, such that there are no anomalies in S and T. A decomposition that does not contain any anomalies is said to be in Boyce-Codd Normal Form (BCNF). A BCNF relation has the following property: A relation R(A) is said to be in BCNF, if any nontrivial FD of the form A A exists in R(A), it means A is a super-key for R. 10
Decomposition In a given relation R(A), let there be a functional dependency of the form A A which violates BCNF. In order to bring R into BCNF, decompose R as follows: Let B be the set of all attributes which lie in the RHS of any FD that has A in the LHS. Remove the set of all attributes A B and form a separate relation. Retain A along with A {A B} to form the other decomposed part of the relation R. Decomposition Example: Consider the Movies (title, year, length, studio, star) relation. Here the following FD holds: title, year length, studio, star However, this is a BCNF violating FD, since (title, year) is not a super-key as the attribute star is not in (title,year) +. To decompose Movies, remove (title, year) along with (length, studio, star) and put them in a separate relation. Retain (title, year) along with (star) to form the other relation. 11
Decomposition Hence: Movies (title, year, length, studio, star) is decomposed into Movies1 (title, year, length, studio) and Movies2 (title, year, star) 2-attribute Relations Any 2-attribute relation of the form R(A,B) is always in BCNF. To prove, consider the following cases: 1. There are no FDs between A and B, in which case only trivial FDs exist and R is in BCNF 2. A B, but there is no FD of the form B A. In this case, A is the key and R is in BCNF. 3. B A, but there is no FD of the form A B. This is symmetric to the case above, here, B is the key. 4. A B and B A. Both A and B are keys, this does not violate the BCNF condition. 12
Third Normal Form (3NF) Sometimes, some BCNF violating FDs cannot be removed from relations without losing information. Consider the relation Drama (title, theater, city) having the following FDs: FD1: title, city theater (title and city form the key as they uniquely determine theater) FD2: theater city (each drama theater has a unique name across cities) FD2 violates BCNF since {theater} is not a key to Drama. Third Normal Form (3NF) Based on FD2, if we decompose Drama into the relations Drama1 (title, theater) and Drama2 (theater, city) it will be incorrect! This is because in the join of the relation Drama1 and Drama2, (title, city) will no longer be the key! 13
Third Normal Form (3NF) Consider the example tables: Drama1 Drama2 Title Theater Theater City Jeans Naz Naz Lahore Jeans Jude Brave Golden Jude Brave Golden Karachi Troy Naz Third Normal Form (3NF) A Join between Drama1 and Drama2 gives the table: Title Jeans Jeans Troy Theater Naz Jude Brave Golden Naz City Lahore Karachi Lahore Note that (theater, city) no longer uniquely determine title! 14
Third Normal Form (3NF) Discrepancies in the previous example occurred because of the FD theater city where theater is not part of a key, but city is! In accommodate such cases, the third normal form (3NF) decomposition is used which relaxes BCNF as follows: Any relation R is said to be in 3NF, if for any non-trivial FD of the form A B, either A is the super-key, or B is a member of some key. An attribute that is a member of a key is called a prime attribute. Multi-valued Dependencies In some cases, even if a relation is in BCNF, there could still be redundancies. Consider the relation: Drama (title, theater, star, genre). Drama is in BCNF. A given drama may have many stars. For every entry of star, the theater and genre attributes have to be repeated. 15
Multi-valued Dependencies The notation for multivalued dependency is a double-headed arrow between two attributes, A B. In English, a multivalued dependency means that if I know a value of A, I can determine a subset of B values. This relationship was also axiomized by Beri, Fagin, and Howard (1977). Their axioms are Reflexive: X X Augmentation: if X Y then XZ Y Union: if X Y and X Z then X YZ Projection: if X Y and X Z then X (Y U Z) and X (Y Z) Multi-valued Dependencies Transitivity: if X Y and Y Z then X (Z Y) Pseudotransitivity: if X Y and YW Z then XW (Z YW) Complement: if X Y and Z = (R XY) then X Z Replication: if X Y then X Y Coalescence: if X Y and Z W where W Y and Y U Z = Ø then X W 16
Multi-valued Dependencies In a given relation R(A), we say that there is a multi-valued dependency (MVD) if the following condition exists: Suppose A be the key and suppose A B Now if B is independent of all attributes in A B, then the above dependency is said to be a multi-valued dependency denoted by: A B Fourth Normal Form (4NF) A relation that has no non-trivial multi-valued dependencies is said to be in fourth normal form (4NF). In a given relation R(A), the MVD A B is said to be non-trivial if: B A and A B A A relation R(A) is said to be in 4NF if for every non-trivial MVD of the form A B, A is the super-key. 17
Example Consider a table of departments, their projects, and the parts they stock. The MVDs in the table would be department projects department parts Assume that department d1 works on jobs j1 and j2 with parts p1 and p2; that department d2 works on jobs j3, j4, and j5 with parts p2 and p4; and that department d3 works on job j2 only with parts p5 and p6. The table would look like this: Example Contd.. Table department job part d1 j1 p1 d1 j1 p2 d1 j2 p1 d1 j2 p2 d2 j3 p2 d2 j3 p4 d2 j4 p2 d2 j4 p4 d2 j5 p2 d2 j5 p4 d3 j2 p5 d3 j2 p6 18
Example Contd.. If you want to add a part to a department, you must create more than one new row. Likewise, to remove a part or a job from a row can destroy information. Updating a part or job name will also require multiple rows to be changed. The solution is to split this table into two tables, one with (department, projects) in it and one with (department, parts) in it. The definition of 4NF is that we have no more than one MVD in a table. If a table is in 4NF, it is also in BCNF. Relationship between NFs 4NF BCNF 3NF Note that 4NF implies BCNF implies 3NF. 19
Join Dependencies A join dependency is a further generalization of MVDs. A join dependency (JD) {R1...Rn} is said to hold over a relation R if R1... Rn is a lossless-join decomposition of R. An MVD X Y over a relation R can be expressed as the join dependency {XY, X(R Y)}. Unlike FDs and MVDs, there is no set of sound and complete inference rules for JDs. course teacher book Physics101 Green Mechanics Physics101 Green Optics Physics101 Brown Mechanics Physics101 Brown Optics Math301 Green Mechanics Math301 Green Vectors Math301 Green Geometry As an example, in the CTB relation, the MVD C T can be expressed as the join dependency {CT, CB}. 20
21
SELECT BS.buyer, SL.seller, BL.lender FROM BuyerLender AS BL, SellerLender AS SL, BuyerSeller AS BS WHERE BL.buyer = BS.buyer AND BL.lender = SL.lender AND SL.seller = BS.seller; 22
Fifth Normal Form (5NF) Fifth normal form, also called the join-projection normal form (JPNF) or the projection-join normal form Based on the idea of a lossless join or the lack of a join-projection anomaly. This problem occurs when you have an n-way relationship, where n > 2. A quick check for 5NF is to see if the table is in 3NF and all the candidate keys are single columns. Domain-Key Normal Form (DKNF) Domain-key normal form was proposed by Ron Fagin (1981). The idea is that if all the constraints implied by domain restrictions and by key conditions are true, then the database is in at least 5NF. The interesting part of Fagin s paper is that there is no mention of functional dependencies, multivalued dependencies, or join dependencies. This is currently considered the stongest normal form possible. The problem is that his paper does not tell you how you can achieve DKNF and shows that in some cases it is impossible. 23