CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 4: DESIGN THEORIES (FUNCTIONAL DEPENDENCIES)
Design theory E/R diagrams are high-level design Formal theory for relational design Evaluate the design for potential problems Modify the relations to avoid them 2
Functional Dependencies (FD) Not exactly like a regular function f(x)=y Generalize the idea of a key: one (or more) attribute(s) identifies the value(s) of one (or more) other attribute(s) Identify the constraints of the system 3
FD Formal Definition X A is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on the attribute A. Say X A holds in R. Convention:, X, Y, Z represent sets of attributes; A, B, C, represent single attributes. Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C}. 4
Example Title Year Length Genre studioname starname Forrest Gump 1994 142 Drama Paramount Tom Hanks Titanic 1997 195 Romance 20 th Century Fox Kate Winslet Titanic 1997 195 Romance 20 th Century Fox Leonardo DiCaprio Terminator-2 1991 154 Action Carolco Pictures Arnold Schwarzenegger Reasonable FD s to assert: 1. Title Year Length 2. Title starname Genre 3. Title Year starname length Genre studioname 5
FD s and Keys Keys can be defined using FD s A set of attributes {A 1, A 2,, A n } is a key if: It functionally determines all other attributes of a relation, R No proper subset of {A 1, A 2,, A n } functionally determines all other attributes of R Superkeys can also be defined: A set of attributes {A 1, A 2,, A n } is a superkey if: It functionally determines all other attributes of a relation, R Definitions may vary based on book 6
Example Movies1(Title, Year, Length, Genre, studioname, starname) {Ttile, Year, starname} is a key because together these attributes determine all the other attributes and is minimal. Title Year Length Genre studioname Superkeys: {Ttile, Year, starname, studioname} 7
E/R and Relational Keys Keys in E/R concern entities. Keys in relations concern tuples. Usually, one tuple corresponds to one entity, so the ideas are the same. But --- in poor relational designs, one entity can become several tuples, so E/R keys and Relational keys are different. 8
Rules About FD s (1) Splitting/Combining Rule A 1 A 2 A n B 1 B 2 B m is equivalent to A 1 A 2 A n B 1 A 1 A 2 A n B 2... A 1 A 2 A n B m Example: Title Year Length Genre Is equivalent to Title Year Length, Title Year Genre
Trivial Functional Dependency If B 1 B 2 B m A 1 A 2 A n then A 1 A 2 A n B 1 B 2 Example: B m Title Title Year starname Therefore Title Year starname Title
Rules About FD s (2) Trivial-dependency rule A 1 A 2 A n B 1 B 2 B m is equivalent to A 1 A 2 A n C 1 C 2 C k Where the C s are B - A Example: Title Year Year Length Genre is equivalent to Title Year Length Genre
Rules About FD s (3) Transitivity Rule A 1 A 2 A n B 1 B 2 B m and B 1 B 2 B m C 1 C 2 C k Then A 1 A 2 A n C 1 C 2 C k Example: Movies2(Title, Year, Length, Genre, studioname, studioaddress) Title Year studioname studioname studioaddress Then Title Year studioaddress
Closure Test X Initial set of attributes A closure For a set of attributes Y, closure set {Y} + contains all the attributes that are functionally dependent on Y based on a set of FD s S An easier way to test Y B is to check if B {Y} + Basis: Y + = Y. Induction: Look for an FD whose left side X is a subset of the current Y +. If the FD is X -> A, add A to Y +. Continue until Y + cannot be changed. 13
Y + 14
X Y + 15
X A Y + 16
X A Y + new Y + 17
Example A relation with attributes A,B,C,D,E and F S = {AB C, BC AD, D E and CF B} We want to find X={A,B} + Step 1: Split rights S = {AB C, BC A, BC D, D E and CF B} Step 2: Initialize X={A,B}. First iteration: add C to X because AB C Second iteration: add D to X because BC D Third iteration: add E to X because D E Stop Output: {A,B} + = {A,B,C,D,E}
Anomalies 1. Redundancy 2. Update Anomalies 3. Deletion Anomalies Title Year Length Genre studioname starname Forrest Gump 1994 142 Drama Paramount Tom Hanks Titanic 1997 195 Romance 20 th Century Fox Kate Winslet Titanic 1997 195 Romance 20 th Century Fox Leonardo DiCaprio Terminator-2 1991 154 Action Carolco Pictures Arnold Schwarzenegger Solution: Decompose the relation
Types of Decompositions 1. Boyce-Codd Normal Form 2. Third Normal Form 3. Fourth Normal Form Details: Next Class
Finding All Implied FD s Motivation: normalization, the process where we break a relation schema into two or more schemas. Example: ABCD with FD s AB C, C D, and D A. Decompose into ABC, AD. What FD s hold in ABC? Not only AB C, but also C A! 21
Basic Idea 1. Start with given FD s and find all nontrivial FD s that follow from the given FD s. Nontrivial = left and right sides disjoint. 2. Restrict to those FD s that involve only attributes of the projected schema. 22
Simple, Exponential Algorithm 1. For each subset of attributes X, compute X +. 2. Add X A for all A in X + \ X. 3. However, drop XY A whenever we discover X A. Because XY A follows from X A in any projection. 4. Finally, use only FD s involving projected attributes. 23
A Few Tricks 1. No need to compute the closure of the empty set or of the set of all attributes. 2. If we find X + = all attributes, so is the closure of any superset of X. 24
Example ABC with FD s A B and B C. Project onto AC. A + =ABC ; yields A B, A C. We do not need to compute AB + or AC +. B + =BC ; yields B C. C + =C ; yields nothing. BC + =BC ; yields nothing. 25
Example --- Continued Resulting FD s: A B, A C, and B C. Only FD that involves a subset of {A,C}. 26