Functional Dependencies

Similar documents
Relational Database Design

Schema Refinement: Other Dependencies and Higher Normal Forms

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Relational Design Theory

Constraints: Functional Dependencies

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Functional Dependencies & Normalization. Dr. Bassam Hammo

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Relational Design: Characteristics of Well-designed DB

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Chapter 8: Relational Database Design

Constraints: Functional Dependencies

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Schema Refinement and Normal Forms

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Schema Refinement and Normal Forms Chapter 19

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Lossless Joins, Third Normal Form

Relational Database Design

Normaliza)on and Func)onal Dependencies

SCHEMA NORMALIZATION. CS 564- Fall 2015

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement and Normal Forms

CSE 132B Database Systems Applications

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Schema Refinement and Normal Forms. Why schema refinement?

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Functional Dependencies and Normalization

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

CS54100: Database Systems

Schema Refinement and Normal Forms. Chapter 19

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms

Schema Refinement. Feb 4, 2010

Functional Dependency and Algorithmic Decomposition

Relational-Database Design

Schema Refinement and Normal Forms

Chapter 3 Design Theory for Relational Databases

Databases 2012 Normalization

INF1383 -Bancos de Dados

Database Design and Implementation

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Information Systems (Informationssysteme)

Functional Dependency Theory II. Winter Lecture 21

Database Design and Normalization

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

A CORRECTED 5NF DEFINITION FOR RELATIONAL DATABASE DESIGN. Millist W. Vincent ABSTRACT

Lecture 6 Relational Database Design

Shuigeng Zhou. April 6/13, 2016 School of Computer Science Fudan University

Schema Refinement and Normalization

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CSIT5300: Advanced Database Systems

CS322: Database Systems Normalization

Design theory for relational databases

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Database Design and Normalization

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Design Theory for Relational Databases

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Functional Dependencies

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

DECOMPOSITION & SCHEMA NORMALIZATION

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CSC 261/461 Database Systems Lecture 13. Spring 2018

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Chapter 10. Normalization Ext (from E&N and my editing)

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Database Tutorial 2: Functional Dependencies and Normal Forms

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

CSC 261/461 Database Systems Lecture 11

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Transcription:

Functional Dependencies Functional Dependencies Framework for systematic design and optimization of relational schemas Generalization over the notion of Keys Crucial in obtaining correct normalized schemas 1

Definitions In any relation R, if there exists a set of attributes A 1, A 2, A n and an attribute B such that if any two tuples have the same value for A 1, A 2, A n then they also have the same value for B. A functional dependency (FD) of the above form is written as: A 1, A 2, A n B Functional dependencies define properties of the schema and not of any particular instance. The dependency must hold for all tuples in the schema. Definitions If A 1, A 2, A n can uniquely determine many attributes, they can all be clubbed together in one expression. A 1, A 2, A n B 1 A 1, A 2, A n B 2 A 1, A 2, A n B 3 A 1, A 2, A n B m A 1, A 2, A n B 1 B 2 B 3 B m 2

Definitions Keys revisited: If a subset of attributes can uniquely determine the entire tuple, then they are called super keys. Minimal super keys and candidate keys can be defined analogously. Functional Dependencies Consider the relation: Movies (title, year, length, filmtype, studio, star) We can identify some FDs as the following: title, year length title, year filmtype However, note that title, year star may not always be true! 3

Reasoning about FDs Transitivity: Example: In any relation R, if A B and B C, then the FD A C also holds for R. If Employee_Number Job and Job Salary, then Employee_Number Salary Reasoning about FDs Two FDs S = A B and T = C D are said to be equivalent if the set of relation instances satisfying S is the same as the set of relation instances satisfying T. We say that S follows T, if the set of all relation instances satisfying T also satisfies S. FDs S and T are equivalent, if S follows T and T follows S. 4

Trivial Functional Dependencies Note that in the Movies relation: title, year title An FD where the right hand side is contained within the left hand side is called a trivial FD. If there is at least one element on the RHS that is not contained in the LHS, it is called non-trivial, and if none of the elements of the RHS are contained in the LHS, it is called completely non-trivial FD. Closure of FDs In any relation R, let A be a set of attributes of R. The closure of FDs defined by A, is the set of all attributes that are eventually defined by A. Let: A B; B C, D; B D E; Then, closure(a) = A B C D E 5

Adding attributes to closure(a): Closure of FDs Let A closure(a) and A F, then closure(a) = closure(a) F Computing closure of FDs Given a relation R and a set of attributes A, closure(a) is computed by the following algorithm: 1. Initially closure(a) = A 2. For every A A, if there exists an FD of the form A B and B A, then closure(a) = closure(a) B 3. Repeat step 2 until no more attributes can be added to closure(a) The closure of a set of attributes A is denoted by A +. Note that if A + is the set of all attributes of R, then A is a super-key of R. 6

Inferred FDs In a relation R, suppose A, B, C and D be sets of attributes of R such that: A B; B C; and C D Also let D A D such that D A A and let D = D D A. Given this, we can infer a non-trivial FD: A D. FDs which are specified are called stated FDs, and FDs which are derived are called inferred FDs. Inferred FDs A given set of FDs from which the set of all FDs for a relation can be inferred, is called the basis of the relation. If the basis is such that no subset of the basis is also a basis, then it is said to be a minimal basis for the relation. 7

Armstrong s Axioms For computing the set of FDs that follow a given FD, the following rules called Armstrong s axioms are useful: 1. Reflexivity: If B A, then A B 2. Augmentation: If A B, then A C B C Note also that if A B, then A C B for any set of attributes C. 3. Transitivity: If A B and B C then A C Projecting FDs Let R be a relation and F(R) be the set of all FDs in R. Suppose relation S is projected from R, by removing some attributes. How can we infer F(S)? FDs that belong to F(S) are those which: 1. Follow from F(R) 2. Involve only attributes of S 8

Projecting FDs Given a relation R (A,B,C,D) and F(R) = {A B, B C, C D}. Suppose S is projected from R as S(A,C,D). What is F(S). To compute F(S), start by computing the closures of all attributes in S. In R, A + = {A B, A C, A D} In S, A + = {A C, A D} C + = {C D} and D + = {D} Since A + contains all attributes of S, it is not required to compute (AC) +, (AD) + or (ACD) +. Designing Relational Schemas In a carelessly designed relational schema, functional dependencies are improper. This leads to the following problems: 1. Redundancy: Information is repeated across tuples 2. Update anomalies: If information is repeated across tuples, then an update of any such information has to be performed across all tuples containing the information 3. Deletion anomalies: If information is repeated across tuples, deletion of information has to be performed across all these tuples. 9

Designing Relational Schemas Consider the Movie (title, year, length, studio, star) relation, where: title, year length title, year studio But title,year star need not be true. For each movie star of a given movie, the title, year, length and studio information has to be repeated. If any of these values have to be updated or deleted, they should consult all tuples where they occur. Decomposition Anomalies are removed from a relation R(A), by decomposing it into other relations S(B) and T(C) where B, C A, such that there are no anomalies in S and T. A decomposition that does not contain any anomalies is said to be in Boyce-Codd Normal Form (BCNF). A BCNF relation has the following property: A relation R(A) is said to be in BCNF, if any nontrivial FD of the form A A exists in R(A), it means A is a super-key for R. 10

Decomposition In a given relation R(A), let there be a functional dependency of the form A A which violates BCNF. In order to bring R into BCNF, decompose R as follows: Let B be the set of all attributes which lie in the RHS of any FD that has A in the LHS. Remove the set of all attributes A B and form a separate relation. Retain A along with A {A B} to form the other decomposed part of the relation R. Decomposition Example: Consider the Movies (title, year, length, studio, star) relation. Here the following FD holds: title, year length, studio, star However, this is a BCNF violating FD, since (title, year) is not a super-key as the attribute star is not in (title,year) +. To decompose Movies, remove (title, year) along with (length, studio, star) and put them in a separate relation. Retain (title, year) along with (star) to form the other relation. 11

Decomposition Hence: Movies (title, year, length, studio, star) is decomposed into Movies1 (title, year, length, studio) and Movies2 (title, year, star) 2-attribute Relations Any 2-attribute relation of the form R(A,B) is always in BCNF. To prove, consider the following cases: 1. There are no FDs between A and B, in which case only trivial FDs exist and R is in BCNF 2. A B, but there is no FD of the form B A. In this case, A is the key and R is in BCNF. 3. B A, but there is no FD of the form A B. This is symmetric to the case above, here, B is the key. 4. A B and B A. Both A and B are keys, this does not violate the BCNF condition. 12

Third Normal Form (3NF) Sometimes, some BCNF violating FDs cannot be removed from relations without losing information. Consider the relation Drama (title, theater, city) having the following FDs: FD1: title, city theater (title and city form the key as they uniquely determine theater) FD2: theater city (each drama theater has a unique name across cities) FD2 violates BCNF since {theater} is not a key to Drama. Third Normal Form (3NF) Based on FD2, if we decompose Drama into the relations Drama1 (title, theater) and Drama2 (theater, city) it will be incorrect! This is because in the join of the relation Drama1 and Drama2, (title, city) will no longer be the key! 13

Third Normal Form (3NF) Consider the example tables: Drama1 Drama2 Title Theater Theater City Jeans Naz Naz Lahore Jeans Jude Brave Golden Jude Brave Golden Karachi Troy Naz Third Normal Form (3NF) A Join between Drama1 and Drama2 gives the table: Title Jeans Jeans Troy Theater Naz Jude Brave Golden Naz City Lahore Karachi Lahore Note that (theater, city) no longer uniquely determine title! 14

Third Normal Form (3NF) Discrepancies in the previous example occurred because of the FD theater city where theater is not part of a key, but city is! In accommodate such cases, the third normal form (3NF) decomposition is used which relaxes BCNF as follows: Any relation R is said to be in 3NF, if for any non-trivial FD of the form A B, either A is the super-key, or B is a member of some key. An attribute that is a member of a key is called a prime attribute. Multi-valued Dependencies In some cases, even if a relation is in BCNF, there could still be redundancies. Consider the relation: Drama (title, theater, star, genre). Drama is in BCNF. A given drama may have many stars. For every entry of star, the theater and genre attributes have to be repeated. 15

Multi-valued Dependencies The notation for multivalued dependency is a double-headed arrow between two attributes, A B. In English, a multivalued dependency means that if I know a value of A, I can determine a subset of B values. This relationship was also axiomized by Beri, Fagin, and Howard (1977). Their axioms are Reflexive: X X Augmentation: if X Y then XZ Y Union: if X Y and X Z then X YZ Projection: if X Y and X Z then X (Y U Z) and X (Y Z) Multi-valued Dependencies Transitivity: if X Y and Y Z then X (Z Y) Pseudotransitivity: if X Y and YW Z then XW (Z YW) Complement: if X Y and Z = (R XY) then X Z Replication: if X Y then X Y Coalescence: if X Y and Z W where W Y and Y U Z = Ø then X W 16

Multi-valued Dependencies In a given relation R(A), we say that there is a multi-valued dependency (MVD) if the following condition exists: Suppose A be the key and suppose A B Now if B is independent of all attributes in A B, then the above dependency is said to be a multi-valued dependency denoted by: A B Fourth Normal Form (4NF) A relation that has no non-trivial multi-valued dependencies is said to be in fourth normal form (4NF). In a given relation R(A), the MVD A B is said to be non-trivial if: B A and A B A A relation R(A) is said to be in 4NF if for every non-trivial MVD of the form A B, A is the super-key. 17

Example Consider a table of departments, their projects, and the parts they stock. The MVDs in the table would be department projects department parts Assume that department d1 works on jobs j1 and j2 with parts p1 and p2; that department d2 works on jobs j3, j4, and j5 with parts p2 and p4; and that department d3 works on job j2 only with parts p5 and p6. The table would look like this: Example Contd.. Table department job part d1 j1 p1 d1 j1 p2 d1 j2 p1 d1 j2 p2 d2 j3 p2 d2 j3 p4 d2 j4 p2 d2 j4 p4 d2 j5 p2 d2 j5 p4 d3 j2 p5 d3 j2 p6 18

Example Contd.. If you want to add a part to a department, you must create more than one new row. Likewise, to remove a part or a job from a row can destroy information. Updating a part or job name will also require multiple rows to be changed. The solution is to split this table into two tables, one with (department, projects) in it and one with (department, parts) in it. The definition of 4NF is that we have no more than one MVD in a table. If a table is in 4NF, it is also in BCNF. Relationship between NFs 4NF BCNF 3NF Note that 4NF implies BCNF implies 3NF. 19

Join Dependencies A join dependency is a further generalization of MVDs. A join dependency (JD) {R1...Rn} is said to hold over a relation R if R1... Rn is a lossless-join decomposition of R. An MVD X Y over a relation R can be expressed as the join dependency {XY, X(R Y)}. Unlike FDs and MVDs, there is no set of sound and complete inference rules for JDs. course teacher book Physics101 Green Mechanics Physics101 Green Optics Physics101 Brown Mechanics Physics101 Brown Optics Math301 Green Mechanics Math301 Green Vectors Math301 Green Geometry As an example, in the CTB relation, the MVD C T can be expressed as the join dependency {CT, CB}. 20

21

SELECT BS.buyer, SL.seller, BL.lender FROM BuyerLender AS BL, SellerLender AS SL, BuyerSeller AS BS WHERE BL.buyer = BS.buyer AND BL.lender = SL.lender AND SL.seller = BS.seller; 22

Fifth Normal Form (5NF) Fifth normal form, also called the join-projection normal form (JPNF) or the projection-join normal form Based on the idea of a lossless join or the lack of a join-projection anomaly. This problem occurs when you have an n-way relationship, where n > 2. A quick check for 5NF is to see if the table is in 3NF and all the candidate keys are single columns. Domain-Key Normal Form (DKNF) Domain-key normal form was proposed by Ron Fagin (1981). The idea is that if all the constraints implied by domain restrictions and by key conditions are true, then the database is in at least 5NF. The interesting part of Fagin s paper is that there is no mention of functional dependencies, multivalued dependencies, or join dependencies. This is currently considered the stongest normal form possible. The problem is that his paper does not tell you how you can achieve DKNF and shows that in some cases it is impossible. 23