Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Similar documents
Relational Database Design

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

SCHEMA NORMALIZATION. CS 564- Fall 2015

Schema Refinement: Other Dependencies and Higher Normal Forms

Schema Refinement and Normal Forms. Why schema refinement?

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Design Theory for Relational Databases

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Constraints: Functional Dependencies

Lectures 6. Lecture 6: Design Theory

Schema Refinement. Feb 4, 2010

CS54100: Database Systems

Schema Refinement and Normal Forms

Relational Design: Characteristics of Well-designed DB

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Constraints: Functional Dependencies

Schema Refinement & Normalization Theory

Lossless Joins, Third Normal Form

Functional Dependencies and Normalization

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Functional Dependency and Algorithmic Decomposition

CSIT5300: Advanced Database Systems

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Schema Refinement and Normal Forms

CSC 261/461 Database Systems Lecture 11

Schema Refinement and Normal Forms Chapter 19

Database Design and Implementation

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Relational Design Theory

Relational-Database Design

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

INF1383 -Bancos de Dados

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Information Systems (Informationssysteme)

Schema Refinement and Normal Forms

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

DECOMPOSITION & SCHEMA NORMALIZATION

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Functional Dependencies & Normalization. Dr. Bassam Hammo

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Schema Refinement and Normal Forms. Chapter 19

Chapter 7: Relational Database Design

Database Design and Normalization

Databases 2012 Normalization

Chapter 3 Design Theory for Relational Databases

Chapter 10. Normalization Ext (from E&N and my editing)

CS322: Database Systems Normalization

CMPT 354: Database System I. Lecture 9. Design Theory

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Functional Dependencies

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Schema Refinement and Normalization

Chapter 8: Relational Database Design

Schema Refinement and Normal Forms

Functional Dependencies

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Chapter 3 Design Theory for Relational Databases

Design theory for relational databases

CSE 132B Database Systems Applications

Introduction to Data Management CSE 344

Normaliza)on and Func)onal Dependencies

CSE 344 AUGUST 3 RD NORMALIZATION

Chapter 11, Relational Database Design Algorithms and Further Dependencies

CSC 261/461 Database Systems Lecture 13. Spring 2018

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Kapitel 3: Formal Design

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Lecture #7 (Relational Design Theory, cont d.)

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Introduction to Data Management. Lecture #6 (Relational Design Theory)

CSC 261/461 Database Systems Lecture 12. Spring 2018

Design Theory for Relational Databases

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Relational Database Design

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Lecture 6 Relational Database Design

Transcription:

Applied Databases Handout 2a. Functional Dependencies and Normal Forms 20 Oct 2008 Functional Dependencies This is the most mathematical part of the course. Functional dependencies provide an alternative approach to database design. Why you need to understand them: 1. They are a mathematical, rigourous formulation. 2. They are good for checking database design and anomalies in database design. Why you don t want to understand them: 1. They are a mathematical, rigourous formulation. 2. The approach is incomplete and, if extended, gets far too instense. AD 2a.1 Not all designs are equally good! An example of the bad design Why is this design bad? Data(Id, Name, Address, CId, Description, Grade) And why is this design good? Student(Id, Name, Address) Course(CId, Description) Enrolled(Id, CId, Grade) Id Name Address CId Description Grade 124 Knox Troon Phil2 Plato A 234 McKay Skye Phil2 Plato B 789 Brown Arran Math2 Topology C 124 Knox Troon Math2 Topology A 789 Brown Arran Eng3 Chaucer B Some information is redundant, e.g. Name and Address. Without null values, some information cannot be represented, e.g, a student taking no courses. AD 2a.2 AD 2a.3

Functional Dependencies Example Recall that a key is a set of attribute names. If two tuples agree on the a key, they agree everywhere (they are the same). In our bad design, Id is not a key, but if two tuples agree on Id then they agree on Address, even though the tuples may be different. We say Id determines Address written Id Address. A functional dependency is a constraint on instances. Here are some functional dependencies that we expect to hold in our student-course database: Id Name, Address CId Description Id, CId Grade Note that an instance of any schema (good or bad) should be constrained by these dependencies. A functional dependency X Y is simply a pair of sets. We often use sloppy notation A,B C,D or AB CD when we mean {A,B} {C,D} Functional dependencies (fd s) are integrity constraints that subsume keys. AD 2a.4 AD 2a.5 Definition The Basic Intuition in Relational Design Def. Given a set of attributes R, and subsets X, Y of R, an instance rof R satisfies the functional dependency X Y if for any tuples t 1, t 2 in r, whenever t 1 [X] = t 2 [X] then t 1 [Y] = t 2 [Y]. (We use t[x] to mean the projection of the tuple t on attributes X) We say X functionally determines Y or X determines Y A database design is good if all fd s are of the form K R, where K is a key for R. Example: our bad design is bad because Id Address, but Id is not a key for the table. But it s not quite this simple. A A always holds, but we don t expect any attribute A to be a key! A superkey (a superset of a key) is simply a set X such that X R A key can now be defined, somewhat perversely, as a minimimal superkey. AD 2a.6 AD 2a.7

There are lots of functional dependencies! Consequences of Armstrong s Axioms Functional dependencies generate other functional dependencies, using Armstrong s Axioms : 1. Reflexivity: if Y X then X Y (These are called trivial dependencies.) Example: Name, Address Address 2. Augmentation: if X Y then X W Y W Example: Given CId Description, then CId,Id Description,Id. Also, CId Description,CId 3. Transitivity: if X Y and Y Z then X Z Example: Given Id,CId CId and CId Description, then Id, CId Description 1. Union: if X Y and X Z then X Y Z. 2. Pseudotransitivity: if X Y and W Y Z then X W Z. 3. Decomposition: if X Y and Z Y then X Z Try to prove these using Armstrong s Axioms! AD 2a.8 AD 2a.9 An example Closure of an fd set Proof of union. 1. X Y and X Z [Assumption] 2. X X Y [Assumption and augmentation] 3. X Y Z Y [Assumption and augmentation] 4. X Y Z [2, 3 and transitivity] Def. The closure F + of an fd set F is given by {X Y X Y can be deduced from F Armstrong s axioms} Def. Two fd sets F,G are equvalent if F + = G +. Unfortunately, the closure of an fd set is huge (how big?) so this is not a good way to test whether two fd sets are equivalent. A better way is to test whether each fd in one set follows from the other fd set and vice versa. AD 2a.10 AD 2a.11

Closure of an attribute set Implication of a fd Given a fd set set F, the closure X + of an attribute set X is given by: X + = {Y X Y F + } Example. What are the the following? {Id} + {Id,Address} + {Id,CId} + {Id,Grade} + Is X Y F +? ( Is X Y implied by the fd set F ) can be answered by checking whether Y is a subset of X +. X + can be computed as follows: X + := X while there is a fd U V in F such that U X + and V X + X + := X + V Try this with Id,CId Description,Grade AD 2a.12 AD 2a.13 The general goal, to repeat Lossless join decomposition No embedded functional dependencies. For example the table (Id, Name, CId) is not a good design, because {Id,CId} is the key; Id alone is not a key. R 1,R 2,...,R k is a lossless join decomposition with respect to a fd set F if, for every intance of R that satisfies F, Why don t we decompose into, say, {Id, Name, Address, Grade} and {CId, Description}? Or into {Id, Name, Address}, {CId, Description} and {Grade}? We need some conditions on decomposition. Example: π R1 (r) π R2 (r)...π Rk (r) = r Id Name Address CId Description Grade 124 Knox Troon Phil2 Plato A 234 McKay Skye Phil2 Plato B What happens if we decompose on {Id, Name, Address} and {CId, Description, Grade} or on {Id, Name, Address, Description, Grade} and {CId, Description}? AD 2a.14 AD 2a.15

Testing for a lossless join Dependency Preservation Fact. R 1,R 2 is a lossless join decomposition of R with respect to F if at least one of the following dependencies is in F + : Example: with respect to the fd set (R 1 R 2 ) R 1 R 2 (R 1 R 2 ) R 2 R 1 Id Name, Address CId Description Id, CId Grade is {Id, Name, Address} and {Id, CId, Description, Grade} a lossless decomposition? Given a fd set F, we d like a decomposition to preserve F. Roughly speaking we want each X Y in F to be contained within one of the attribute sets of our decomposition. Def. The projection of an fd set F onto a set of attributes Z, F Z is given by: F Z = {X Y X Y F + and X Y Z} A decomposition R 1,R 2,... R k is dependency preserving if F + = (F R1 F R2... F Rk ) + If a decomposition is dependency preserving, then we can easily check that an update on an instance R i does not violate F by just checking that it doesn t violate those fd s in F Ri. AD 2a.16 AD 2a.17 Example 1 Example 2 The scheme: {Class, Time, Room} The scheme: {Student, Time, Room, Course, Grade} The fd set: Class Room Room,Time Class The fd set: Student, Time Room Student, Course Grade The decomposition: {Class, Room} and {Room, Time} Is it lossless? Is it dependency preserving? The decomposition: {Student, Time, Room} and {Student, Course, Grade} It it lossless? Is it dependency preserving? AD 2a.18 AD 2a.19

Relational Database Design Normal forms Earlier we stated that the idea in analysing fd sets is to find a design (a decomposition) such that for each non-trivial dependency X Y (non-trivial means Y X), X is a superkey for some relation scheme in our decomposition. Example 1 shows that it is not possible to achieve this and to preserve dependencies. This leads to two notions of normal forms... Boyce-Codd Normal Form (BCNF) For every relation scheme R in the decomposition, and for every X A that holds on R (that is, X {A} R, either A X (it is trivial), or X is a superkey for R. Third Normal Form (3NF) For every relation scheme R and for every X A that holds on R, A X (it is trivial), or X is a superkey for R, or A is a member of some key of R (A is prime ) AD 2a.20 AD 2a.21 Observations on Normal Forms So what s this all for? BCNF is stronger than 3NF. BCNF is clearly desirable, but example 1 shows that it is not always achievable. There are algorithms to obtain a BCNF lossless join decomposition a 3NF lossless join, dependency preserving decomposition Even though there are algorithms for designing databases this way, they are hardly ever used. People normally use E-R diagrams and the like. But... Automated procedures (or human procedures) for generating relational schemas from diagrams often mess up. Further decomposition is sometimes needed (or sometimes thet decompose too much, so merging is needed) Understanding fd s is a good sanity check on your design. It s important to have these criteria. Bad design w.r.t. these criteria often means that there is redundancy or loss of information. For efficiency we sometimes design redundant schemes deliberately. Fd analysis allows us to identify the redundancy. AD 2a.22 AD 2a.23

Functional dependencies review Redundancy and update anomalies. Functional dependencies. Implication of fd s and Armstrong s axioms. Closure and equivalence of fd sets. Lossless join decomposition and dependency preservation. BCNF and 3NF. AD 2a.24