Constraints: Functional Dependencies

Similar documents
Constraints: Functional Dependencies

Schema Refinement: Other Dependencies and Higher Normal Forms

Relational Database Design

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

INF1383 -Bancos de Dados

Functional Dependencies & Normalization. Dr. Bassam Hammo

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

CS322: Database Systems Normalization

Functional Dependencies

Relational Design: Characteristics of Well-designed DB

CSE 132B Database Systems Applications

Database Design and Normalization

Chapter 7: Relational Database Design

Schema Refinement and Normal Forms

Databases 2012 Normalization

Schema Refinement and Normal Forms Chapter 19

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Schema Refinement and Normal Forms. Why schema refinement?

CS54100: Database Systems

Functional Dependency and Algorithmic Decomposition

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Chapter 10. Normalization Ext (from E&N and my editing)

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement. Feb 4, 2010

SCHEMA NORMALIZATION. CS 564- Fall 2015

Schema Refinement and Normal Forms

Schema Refinement & Normalization Theory

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. Chapter 19

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Database Design and Implementation

Lecture 6 Relational Database Design

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Chapter 8: Relational Database Design

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Lecture #7 (Relational Design Theory, cont d.)

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Lossless Joins, Third Normal Form

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Relational Design Theory

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Schema Refinement and Normal Forms

Shuigeng Zhou. April 6/13, 2016 School of Computer Science Fudan University

Schema Refinement and Normal Forms

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Schema Refinement and Normalization

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Functional Dependency Theory II. Winter Lecture 21

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Normaliza)on and Func)onal Dependencies

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

Relational Normalization: Contents

Functional Dependencies and Normalization

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Problem about anomalies

CSC 261/461 Database Systems Lecture 11

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Relational Database Design

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Information Systems (Informationssysteme)

Lectures 6. Lecture 6: Design Theory

DECOMPOSITION & SCHEMA NORMALIZATION

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

CMPT 354: Database System I. Lecture 9. Design Theory

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Chapter 3 Design Theory for Relational Databases

Introduction to Data Management. Lecture #6 (Relational Design Theory)

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Chapter 3 Design Theory for Relational Databases

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CSIT5300: Advanced Database Systems

CSC 261/461 Database Systems Lecture 13. Spring 2018

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Lecture 15 10/02/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Database Design and Normalization

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

CSC 261/461 Database Systems Lecture 12. Spring 2018

Relational-Database Design

Kapitel 3: Formal Design

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Design Theory for Relational Databases

Design Theory for Relational Databases

Transcription:

Constraints: Functional Dependencies Fall 2017 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Functional Dependencies 1 / 42

Schema Design When we get a relational schema, how do we know if its any good? what to watch for? what are the allowed instances of the schema? does the structure capture the data? too hard to query? too hard to update? redundant information all over the place? (University of Waterloo) Functional Dependencies 2 / 42

Change Anomalies Assume we are given the E-R diagram Sno Sname Ino Supplied_Items City Iname Price (University of Waterloo) Functional Dependencies 3 / 42

Change Anomalies (cont.) Supplied_Items Sno Sname City Ino Iname Price S1 Magna Ajax I1 Bolt 0.50 S1 Magna Ajax I2 Nut 0.25 S1 Magna Ajax I3 Screw 0.30 S2 Budd Hull I3 Screw 0.40 Problems: 1 Update problems (Changing name of supplier) 2 Insert problems (New item w/o supplier) 3 Delete problems (Budd no longer supplies screws) 4 Likely increase in space requirements (University of Waterloo) Functional Dependencies 4 / 42

Change Anomalies (cont.) Compare to Item Ino I1 I2 I3 Iname Bolt Nut Screw Supplier Sno Sname City S1 Magna Ajax S2 Budd Hull Supplies Sno Ino Price S1 I1 0.50 S1 I2 0.25 S1 I3 0.30 S2 I3 0.40 (University of Waterloo) Functional Dependencies 5 / 42

Change Anomalies (cont.) Compare to Item Ino I1 I2 I3 Iname Bolt Nut Screw Supplier Sno Sname City S1 Magna Ajax S2 Budd Hull Supplies Sno Ino Price S1 I1 0.50 S1 I2 0.25 S1 I3 0.30 S2 I3 0.40 Decomposition seems to be better... (University of Waterloo) Functional Dependencies 5 / 42

Change Anomalies (cont.) But other extreme is also undesirable information about relationships can be lost Snos Sno S1 S2 Inums Inum I1 I2 I3 Snames Sname Magna Budd Inames Iname Bolt Nut Screw Cities City Ajax Hull Prices Price 0.50 0.25 0.30 0.40 (University of Waterloo) Functional Dependencies 6 / 42

Change Anomalies (cont.) But other extreme is also undesirable information about relationships can be lost Snos Sno S1 S2 Inums Inum I1 I2 I3 Snames Sname Magna Budd Inames Iname Bolt Nut Screw Cities City Ajax Hull Prices Price 0.50 0.25 0.30 0.40... so how do we know how much can we decompose? (University of Waterloo) Functional Dependencies 6 / 42

How to Find and Fix Anomalies? Detection: How do we know an anomaly exists? (certain families) of Integrity Constraints postulate regularities in schema instances that lead to anomalies. (University of Waterloo) Functional Dependencies 7 / 42

How to Find and Fix Anomalies? Detection: How do we know an anomaly exists? (certain families) of Integrity Constraints postulate regularities in schema instances that lead to anomalies. Repair How can we fix it? Certain Schema Decompositions avoid the anomalies while retaining all information in the instances. (University of Waterloo) Functional Dependencies 7 / 42

Integrity Constraints Idea: allow only well-behaved instances of the schema the relational structure (= selection of relations) is often not sufficient to capture all of these. restrict values of an attribute describe dependencies between attributes in a single relation (bad) between relations (good) postulate the existence of values in the database... Dependencies between attributes in a single relation lead to improvements in schema design. (University of Waterloo) Functional Dependencies 8 / 42

Functional Dependencies (FDs) Idea: to express the fact that in a relation schema (values of) a set of attributes uniquely determine (values of) another set of attributes. Definition: Let R be a relation schema, and X, Y R sets of attributes. The functional dependency X Y is the formula v 1,..., v k, w 1,..., w k.r(v 1,..., v k ) R(w 1,..., w k ) ( j X v j = w j ) ( i Y v i = w i ) We say that (the set of attributes) X functionally determines Y (in R). (University of Waterloo) Functional Dependencies 9 / 42

Examples of Functional Dependencies Consider the following relation schema: EmpProj SIN PNum Hours EName PName PLoc Allowance SIN determines employee name SIN EName project number determines project name and location PNum PName, PLoc allowances are always the same for the same number of hours at the same location PLoc, Hours Allowance (University of Waterloo) Functional Dependencies 10 / 42

Implication for FDs How do we know what additional FDs hold in a schema? A set F logically implies a FD X Y if X Y holds in all instances of R that satisfy F. The closure of F + of F is the set of all functional dependencies that are logically implied by F Clearly: F F +, but what else is in F +? (University of Waterloo) Functional Dependencies 11 / 42

Implication for FDs How do we know what additional FDs hold in a schema? A set F logically implies a FD X Y if X Y holds in all instances of R that satisfy F. The closure of F + of F is the set of all functional dependencies that are logically implied by F Clearly: F F +, but what else is in F +? For Example: F = {A B, B C} then F + = {A B, B C, A C} (University of Waterloo) Functional Dependencies 11 / 42

Reasoning About FDs Logical implications can be derived by using inference rules called Armstrong s axioms (reflexivity) Y X X Y (augmentation) X Y XZ YZ (transitivity) X Y, Y Z X Z The axioms are sound (anything derived from F is in F + ) complete (anything in F + can be derived) (University of Waterloo) Functional Dependencies 12 / 42

Reasoning About FDs Logical implications can be derived by using inference rules called Armstrong s axioms (reflexivity) Y X X Y (augmentation) X Y XZ YZ (transitivity) X Y, Y Z X Z The axioms are sound (anything derived from F is in F + ) complete (anything in F + can be derived) Additional rules can be derived (union) X Y, X X X YZ (decomposition) X YZ X Y (University of Waterloo) Functional Dependencies 12 / 42

Reasoning (example) Example: F = { SIN, PNum Hours SIN EName PNum PName, PLoc PLoc, Hours Allowance } A derivation of SIN, PNum Allowance: 1. SIN, PNum Hours ( F) 2. PNum PName, PLoc ( F) 3. PLoc, Hours Allowance ( F) 4. SIN, PNum PNum (reflexivity) 5. SIN, PNum PName, PLoc (transitivity, 4 and 2) 6. SIN, PNum PLoc (decomposition, 5) 7. SIN, PNum PLoc, Hours (union, 6, 1) 8. SIN, PNum Allowance (transitivity, 7 and 3) (University of Waterloo) Functional Dependencies 13 / 42

Keys: formal definition Definition: K R is a superkey for relation schema R if dependency K R holds on R. K R is a candidate key for relation schema R if K is a superkey and no subset of K is a superkey. (University of Waterloo) Functional Dependencies 14 / 42

Keys: formal definition Definition: K R is a superkey for relation schema R if dependency K R holds on R. K R is a candidate key for relation schema R if K is a superkey and no subset of K is a superkey. Primary Key = a candidate key choosen by the DBA. (University of Waterloo) Functional Dependencies 14 / 42

Efficient Reasoning How to figure out if an FD is implied by F quickly? (University of Waterloo) Functional Dependencies 15 / 42

Efficient Reasoning How to figure out if an FD is implied by F quickly? a mechanical and more efficient way of using Armstrong s axioms: function ComputeX + (X, F) begin X + := X; while true do if there exists (Y Z ) F such that (1) Y X +, and (2) Z X + then X + := X + Z else exit; return X + ; end (University of Waterloo) Functional Dependencies 15 / 42

Efficient Reasoning (cont.) Let R be a relational schema and F a set of functional dependencies on R. Then Theorem: X is a superkey of R if and only if ComputeX + (X, F) = R Theorem: X Y F + if and only if Y ComputeX + (X, F) (University of Waterloo) Functional Dependencies 16 / 42

Computing a Decomposition Decomposition Let R be a relation schema (= set of attributes). The collection {R 1,..., R n } of relation schemas is a decomposition of R if R = R 1 R 2 R n A good decomposition does not lose information complicate checking of constraints contain anomalies (or at least contains fewer anomalies) (University of Waterloo) Functional Dependencies 17 / 42

Lossless-Join Decompositions We should be able to construct the instance of the original table from the instances of the tables in the decomposition Example: Consider replacing by decomposing to two tables Marks Student Assignment Group Mark Ann A1 G1 80 Ann A2 G3 60 Bob A1 G2 60 SGM Student Group Mark Ann G1 80 Ann G3 60 Bob G2 60 AM Assignment Mark A1 80 A2 60 A1 60 (University of Waterloo) Functional Dependencies 18 / 42

Lossless-Join Decompositions (cont.) But computing the natural join of SGM and AM produces Student Assignment Group Mark Ann A1 G1 80 Ann A2 G3 60 Ann A1 G3 60! Bob A2 G2 60! Bob A1 G2 60... and we get extra data (spurious tuples) and would therefore lose information if we were to replace Marks by SGM and AM. (University of Waterloo) Functional Dependencies 19 / 42

Lossless-Join Decompositions (cont.) But computing the natural join of SGM and AM produces Student Assignment Group Mark Ann A1 G1 80 Ann A2 G3 60 Ann A1 G3 60! Bob A2 G2 60! Bob A1 G2 60... and we get extra data (spurious tuples) and would therefore lose information if we were to replace Marks by SGM and AM. If re-joining SGM and AM would always produce exactly the tuples in Marks, then we call SGM and AM a lossless-join decomposition. (University of Waterloo) Functional Dependencies 19 / 42

Lossless-Join Decompositions (cont.) A decomposition {R 1, R 2 } of R is lossless if and only if the common attributes of R 1 and R 2 form a superkey for either schema, that is R 1 R 2 R 1 or R 1 R 2 R 2 (University of Waterloo) Functional Dependencies 20 / 42

Lossless-Join Decompositions (cont.) A decomposition {R 1, R 2 } of R is lossless if and only if the common attributes of R 1 and R 2 form a superkey for either schema, that is R 1 R 2 R 1 or R 1 R 2 R 2 Example: In the previous example we had R = {Student, Assignment, Group, Mark}, F = {(Student, Assignment Group, Mark)}, R 1 = {Student, Group, Mark}, R 2 = {Assignment, Mark} decomposition {R 1, R 2 } is lossy because R 1 R 2 (= {M}) is not a superkey of either SGM or AM (University of Waterloo) Functional Dependencies 20 / 42

Dependency Preservation How do we test/enforce constraints on the decomposed schema? (University of Waterloo) Functional Dependencies 21 / 42

Dependency Preservation How do we test/enforce constraints on the decomposed schema? Example: A table for a company database could be R Proj Dept Div FD1: Proj Dept, FD2: Dept Div, and FD3: Proj Div and two decompositions D 1 = {R1[Proj, Dept], R2[Dept, Div]} D 2 = {R1[Proj, Dept], R3[Proj, Div]} Both are lossless. (Why?) (University of Waterloo) Functional Dependencies 21 / 42

Dependency Preservation (cont.) Which decomposition is better? Decomposition D 1 lets us test FD1 on table R1 and FD2 on table R2; if they are both satisfied, FD3 is automatically satisfied. In decomposition D 2 we can test FD1 on table R1 and FD3 on table R3. Dependency FD2 is an interrelational constraint: testing it requires joining tables R1 and R3. (University of Waterloo) Functional Dependencies 22 / 42

Dependency Preservation (cont.) Which decomposition is better? Decomposition D 1 lets us test FD1 on table R1 and FD2 on table R2; if they are both satisfied, FD3 is automatically satisfied. In decomposition D 2 we can test FD1 on table R1 and FD3 on table R3. Dependency FD2 is an interrelational constraint: testing it requires joining tables R1 and R3. D 1 is better! (University of Waterloo) Functional Dependencies 22 / 42

Dependency Preservation (cont.) Which decomposition is better? Decomposition D 1 lets us test FD1 on table R1 and FD2 on table R2; if they are both satisfied, FD3 is automatically satisfied. In decomposition D 2 we can test FD1 on table R1 and FD3 on table R3. Dependency FD2 is an interrelational constraint: testing it requires joining tables R1 and R3. D 1 is better! A decomposition D = {R 1,..., R n } of R is dependency preserving if there is an equivalent set F of functional dependencies, none of which is interrelational in D. (University of Waterloo) Functional Dependencies 22 / 42

Avoiding Anomalies What is a good relational database schema? Rule of thumb: Independent facts in separate tables: Each relation schema should consist of a primary key and a set of mutually independent attributes achieved by transformation of a schema to a normal form (University of Waterloo) Functional Dependencies 23 / 42

Avoiding Anomalies What is a good relational database schema? Rule of thumb: Independent facts in separate tables: Goals: Each relation schema should consist of a primary key and a set of mutually independent attributes achieved by transformation of a schema to a normal form Intuitive and straightforward changes Anomaly-free/Nonredundant representation of data (University of Waterloo) Functional Dependencies 23 / 42

Avoiding Anomalies What is a good relational database schema? Rule of thumb: Independent facts in separate tables: Goals: Each relation schema should consist of a primary key and a set of mutually independent attributes achieved by transformation of a schema to a normal form Intuitive and straightforward changes Anomaly-free/Nonredundant representation of data We discuss: Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF)... both based on the notion of functional dependency (University of Waterloo) Functional Dependencies 23 / 42

Boyce-Codd Normal Form (BCNF) Schema R is in BCNF (w.r.t. F) if and only if whenever (X Y ) F + and XY R, then either (X Y ) is trivial (i.e., Y X), or X is a superkey of R A database schema {R 1,..., R n } is in BCNF if each relation schema R i is in BCNF. (University of Waterloo) Functional Dependencies 24 / 42

Boyce-Codd Normal Form (BCNF) Schema R is in BCNF (w.r.t. F) if and only if whenever (X Y ) F + and XY R, then either (X Y ) is trivial (i.e., Y X), or X is a superkey of R A database schema {R 1,..., R n } is in BCNF if each relation schema R i is in BCNF. Formalization of the goal that independent relationships are stored in separate tables. (University of Waterloo) Functional Dependencies 24 / 42

BCNF (cont.) Why does BCNF avoid redundancy? For the schema Supplied_Items we had a FD: Sno Sname, City Therefore: supplier name Magna and city Ajax must be repeated for each item supplied by supplier S1. (University of Waterloo) Functional Dependencies 25 / 42

BCNF (cont.) Why does BCNF avoid redundancy? For the schema Supplied_Items we had a FD: Sno Sname, City Therefore: supplier name Magna and city Ajax must be repeated for each item supplied by supplier S1. Assume the above FD holds over a schema R that is in BCNF. Then: Sno is a superkey for R each Sno value appears on one row only no need to repeat Sname and City values (University of Waterloo) Functional Dependencies 25 / 42

Lossless-Join BCNF Decomposition function ComputeBCNF(R, F) begin Result := {R}; while some R i Result and (X Y ) F + violate the BCNF condition do begin Replace R i by R i (Y X); Add {X, Y } to Result; end; return Result; end (University of Waterloo) Functional Dependencies 26 / 42

Lossless-Join BCNF Decomposition No efficient procedure to do this exists. Results depend on sequence of FDs used to decompose the relations. It is possible that no lossless join dependency preserving BCNF decomposition exists: Consider R = {A, B, C} and F = {AB C, C B}. (University of Waterloo) Functional Dependencies 27 / 42

Third Normal Form (3NF) Schema R is in 3NF (w.r.t. F) if and only if whenever (X Y ) F + and XY R, then either (X Y ) is trivial, or X is a superkey of R, or each attribute of Y contained in a candidate key of R A schema {R 1,..., R n } is in 3NF if each relation schema R i is in 3NF. (University of Waterloo) Functional Dependencies 28 / 42

Third Normal Form (3NF) Schema R is in 3NF (w.r.t. F) if and only if whenever (X Y ) F + and XY R, then either (X Y ) is trivial, or X is a superkey of R, or each attribute of Y contained in a candidate key of R A schema {R 1,..., R n } is in 3NF if each relation schema R i is in 3NF. 3NF is looser than BCNF allows more redundancy R = {A, B, C} and F = {AB C, C B}. (University of Waterloo) Functional Dependencies 28 / 42

Third Normal Form (3NF) Schema R is in 3NF (w.r.t. F) if and only if whenever (X Y ) F + and XY R, then either (X Y ) is trivial, or X is a superkey of R, or each attribute of Y contained in a candidate key of R A schema {R 1,..., R n } is in 3NF if each relation schema R i is in 3NF. 3NF is looser than BCNF allows more redundancy R = {A, B, C} and F = {AB C, C B}. lossless-join, dependency-preserving decomposition into 3NF relation schemas always exists. (University of Waterloo) Functional Dependencies 28 / 42

Minimal Cover Definition: Two sets of dependencies F and G are equivalent iff F + = G +. There are different sets of functional dependencies that have the same logical implications. Simple sets are desirable. (University of Waterloo) Functional Dependencies 29 / 42

Minimal Cover Definition: Two sets of dependencies F and G are equivalent iff F + = G +. There are different sets of functional dependencies that have the same logical implications. Simple sets are desirable. Definition: A set of dependencies G is minimal if 1 every right-hand side of an dependency in F is a single attribute. 2 for no X A is the set F {X A} equivalent to F. 3 for no X A and Z a proper subset of X is the set F {X A} {Z A} equivalent to F. (University of Waterloo) Functional Dependencies 29 / 42

Minimal Cover Definition: Two sets of dependencies F and G are equivalent iff F + = G +. There are different sets of functional dependencies that have the same logical implications. Simple sets are desirable. Definition: A set of dependencies G is minimal if 1 every right-hand side of an dependency in F is a single attribute. 2 for no X A is the set F {X A} equivalent to F. 3 for no X A and Z a proper subset of X is the set F {X A} {Z A} equivalent to F. Theorem: For every set of dependencies F there is an equivalent minimal set of dependencies (minimal cover). (University of Waterloo) Functional Dependencies 29 / 42

Finding Minimal Covers A minimal cover for F can be computed in 3 steps (+ optimization). Note that each step must be repeated until it no longer succeeds in updating F. Step 1. Replace X YZ with the pair X Y and X Z. Step 2. Remove X A from F if A ComputeX + (X, F {X A}). Step 3. Remove A from the left-hand-side of X B in F if B is in ComputeX + (X {A}, F). [we have a minimal cover here] Step 4. Replace X Y and X Z in F by X YZ. (University of Waterloo) Functional Dependencies 30 / 42

Computing a 3NF Decomposition A lossless-join 3NF decomposition that is dependency preserving can be efficiently computed function Compute3NF (R, F) begin Result := ; F := a minimal cover for F; for each (X Y ) F do Result := Result {XY }; if there is no R i Result such that R i contains a candidate key for R then begin compute a candidate key K for R; Result := Result {K }; end; return Result; end (University of Waterloo) Functional Dependencies 31 / 42

Summary functional dependencies provide clues towards elimination of (some) redundancies in a relational schema. Goals: to decompose relational schamas in such a way that the decomposition is (1) lossless-join (2) dependency preserving (3) BCNF (and if we fail here, at least 3NF) (University of Waterloo) Functional Dependencies 32 / 42

Beyond Functional Dependencies There exist anomalies/redundancies in relational schemas that cannot be captured by FDs. Example: consider the following table: Course Teacher Book Math Smith Algebra Math Smith Calculus Math Jones Algebra Math Jones Calculus Advanced Math Smith Calculus Physics Black Mechanics Physics Black Optics There are no (non-trivial) FDs that hold on this scheme; therefore the scheme (Course, Set-of-teachers, Set-of-books) is in BCNF. (University of Waterloo) Functional Dependencies 33 / 42

Multivalued Dependencies (MVD) CTB table contains redundant information because: whenever (c, t 1, b 1 ) CTB and (c, t 2, b 2 ) CTB then also (c, t 1, b 2 ) CTB and, by symmetry, (c, t 2, b 1 ) CTB we say that a multivalued dependency (MVD) C T (and C B as well) holds on CTB. given a course, the set of teachers and the set of books are uniquely determined and independent. (University of Waterloo) Functional Dependencies 34 / 42

Another Example Course Teacher Hour Room Student Grade CS101 Jones M-9 2222 Smith A CS101 Jones W-9 3333 Smith A CS101 Jones F-9 2222 Smith A CS101 Jones M-9 2222 Black B CS101 Jones W-9 3333 Black B CS101 Jones F-9 2222 Black B FDs: C T, CS G, HR C, HT R, and HS R MVDs: C HR (University of Waterloo) Functional Dependencies 35 / 42

Axioms for MVDs 1 Y X X Y (reflexivity) 2 X Y X (R Y ) (complementation) 3 X Y XZ YZ (augmentation) 4 X Y, Y Z X (Z Y ) (transitivity) 5 X Y X Y (conversion) 6 X Y, XY Z X (Z Y ) (interaction) Theorem: Axioms for FDs (1)-(6) are sound and complete for logical implication of FDs and MVDs. (University of Waterloo) Functional Dependencies 36 / 42

Example In the CTHRSG schema, C SG can be derived as follows: 1 C HR 2 C T (from C T ) 3 C CTSG (complementation of (1)) 4 C CT (augmentation of (2) by C) 5 CT CTSG (augmentation of (3) by T ) 6 C SG (transitivity on (4) and (5)) (University of Waterloo) Functional Dependencies 37 / 42

Dependency Basis Definition: A dependency basis for X with respect to a set of FDs and MVDs F is a partition of R X to sets Y 1,..., Y k such that F = X Z if and only if Z X is a union of some of the Y i s. unlike for FDs we can t split right-hand sides of MVDs to single attributes (cf. minimal cover). the dependency basis of X w.r.t. F can be computed in PTIME [Beeri80]. The dependency basis of CTHRSG with respect to C is [T, HR, SG] (University of Waterloo) Functional Dependencies 38 / 42

Lossless-Join Decomposition similarly to the FD case we want to decompose the schema to avoid anomalies a lossless-join decomposition (R 1, R 2 ) of R with respect to a set of MVDs F: or, by symmetry F = (R 1 R 2 ) (R 1 R 2 ) F = (R 1 R 2 ) (R 2 R 1 ) this condition implies the one for FDs (in only FDs appear in F). (University of Waterloo) Functional Dependencies 39 / 42

Fourth Normal Form (4NF) Definition: Let R be a relation schema and F a set of FDs and MVDs. Schema R is in 4NF if and only if whenever (X Y ) F + and XY R, then either (X Y ) is trivial (Y X or XY = R), or X is a superkey of R A database schema {R 1,..., R n } is in 4NF if each relation schema R i is in 4NF. use BCNF-like decomposition procedure to obtain a lossless-join decomposition into 4NF. (University of Waterloo) Functional Dependencies 40 / 42

Example The CTB schema can be decomposed to 4NF (using C T ) as follows: Course Math Math Physics Advanced Math Teacher Smith Jones Black Smith Course Math Math Physics Physics Advanced Math Book Algebra Calculus Mechanics Optics Calculus no FDs here! (University of Waterloo) Functional Dependencies 41 / 42

Other Dependencies (last round) Join Dependency (on R) [R 1,..., R k ] holds if x.r(x) R 1 (x 1 )... R k (x k ) where x i.r i (x i ) y.r(x i, y) holds for all 0 < i k. generalization of an MVD X Y is the same as [XY, X(R Y )] cannot be simulated by MVDs no axiomatization exists Project-Join NF (5NF) [R 1,..., R k ] implies R i is a key. (University of Waterloo) Functional Dependencies 42 / 42