But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Similar documents
Database Design and Normalization

Lossless Joins, Third Normal Form

Constraints: Functional Dependencies

Constraints: Functional Dependencies

Relational Design: Characteristics of Well-designed DB

INF1383 -Bancos de Dados

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Functional Dependency Theory II. Winter Lecture 21

DECOMPOSITION & SCHEMA NORMALIZATION

Relational Database Design

Chapter 3 Design Theory for Relational Databases

CSC 261/461 Database Systems Lecture 13. Spring 2018

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Chapter 10. Normalization Ext (from E&N and my editing)

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Chapter 7: Relational Database Design

Database Design and Implementation

CSC 261/461 Database Systems Lecture 11

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Databases 2012 Normalization

Schema Refinement and Normal Forms

CMPT 354: Database System I. Lecture 9. Design Theory

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Schema Refinement and Normal Forms. Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Database Systems SQL. A.R. Hurson 323 CS Building

Schema Refinement and Normal Forms

CSE 132B Database Systems Applications

SCHEMA NORMALIZATION. CS 564- Fall 2015

Schema Refinement and Normal Forms

Lecture #7 (Relational Design Theory, cont d.)

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement and Normal Forms Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Information Systems (Informationssysteme)

Functional Dependency and Algorithmic Decomposition

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Schema Refinement: Other Dependencies and Higher Normal Forms

Functional Dependencies and Normalization

Chapter 8: Relational Database Design

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Lecture 6 Relational Database Design

CSC 261/461 Database Systems Lecture 12. Spring 2018

Schema Refinement and Normal Forms

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CS322: Database Systems Normalization

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Functional Dependencies

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Design Theory for Relational Databases

Introduction to Data Management CSE 344

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Schema Refinement and Normal Forms. Why schema refinement?

Relational Database Design

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

CS54100: Database Systems

CSE 544 Principles of Database Management Systems

Schema Refinement and Normalization

Functional Dependencies & Normalization. Dr. Bassam Hammo

Relational-Database Design

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Shuigeng Zhou. April 6/13, 2016 School of Computer Science Fudan University

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Design theory for relational databases

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Design Theory for Relational Databases

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Relational Design Theory

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Normal Forms Lossless Join.

Functional Dependencies

Schema Refinement. Feb 4, 2010

Normaliza)on and Func)onal Dependencies

CS5300 Database Systems

Comp 5311 Database Management Systems. 5. Functional Dependencies Exercises

Lectures 6. Lecture 6: Design Theory

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Transcription:

So far we have seen: RECAP How to use functional dependencies to guide the design of relations How to modify/decompose relations to achieve 1NF, 2NF and 3NF relations But How do we make sure the decompositions are lossless (equivalence preserving)? Are some decompositions better than others? What if there are multiple candidate keys? 1 2 Why is losslessness important? An Instance of Relation NEWS No information is lost or added implicitly by mistake. Any information that can be derived from the original relation can also be derived from the relations that result from the decomposition and vice versa. In other words, you get the same answers to your queries. 3 S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 10 Paris S4 Clark 20 London S5 Adams 30 Athens 4 Suppose we decompose NEWS into: NEWS (S#, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY R1(S#, Sname) R2(City, Status) So no attributes are lost. R1 R2 S# Sname City Status S1 Smith London 20 S2 Jones Paris 10 S3 Blake Athens 30 What is the status or city of Smith? 5 6

Would it be good enough if the 2 relations were to have at least one attribute in common? NEWS (S#, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY R1(S#, Sname, Status) R2(City, Status) So no attributes are lost and the R1 and R2 have an attribute in common.?????? Do you see any problems??????? 7 8 R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 Rome 20 9 10 R1 R2 S# Sname Status City Status S1 Smith 20 London 20 S2 Jones 10 Paris 10 S3 Blake 30 Athens 30 Rome 20 What is the city of Smith? London or Rome? So still not good enough. R R1 R2 Do not loose any attributes. Make sure R1 and R2 have some attribute(s) in common. Some extra condition on the shared attribute(s) to ensure losslessness. 11 12

Losslessness Definition: Lossless decomposition A decomposition of a relation R into relations R1,..., Rn is lossless (nonloss) if and only if for every instance of R and the Ri the natural join of R1,..., Rn gives the relation R. Natural Join very briefly Teaches Class Lecturer Course Course class fs logic logic msc jm ai ai msc sd C++ ai meng2 13 14 Teaches JOIN Class Teaches Class Lecturer Course Course Class fs logic logic msc jm ai ai msc sd C++ ai meng2 Lecturer Course Class fs logic msc jm ai msc jm ai meng2 15 16 Another Example of JOIN Teaches Class-number Lecturer Course Number Class fs logic 60 msc jm ai 100 meng2 Teaches JOIN Class-number Lecturer Course Number Class fs logic 60 msc fs logic 100 meng2 jm ai 60 msc jm ai 100 meng2 17 A Lossy Decomposition R A B C a1 b1 c1 a2 b1 c2 a2 b2 c2 18

Suppose we decomposed R into R1 and R2. Now consider the JOIN of R1 and R2. R1 R2 A B B C a1 b1 b1 c1 a2 b1 b1 c2 a2 b2 b2 c2 R1 R2 A B B C a1 b1 b1 c1 a2 b1 b1 c2 a2 b2 b2 c2 19 20 R1 JOIN R2 R1 JOIN R2 versus R A B C A B C a1 b1 c1 a1 b1 c1 a1 b1 c2 a2 b1 c2 a2 b1 c1 a2 b2 c2 a2 b1 c2 a2 b2 c2 R Theorem: sufficient condition for losslessness Suppose R is a relation scheme and F is a set of functional dependencies on R. Let R1 and R2 be projections of R such that the union of the sets of attributes of R1 and R2 is equal to the set of attributes of R. This decomposition of R is lossless if at least one of the following fds is in F+: R1 R2 R1 R1 R2 R2 21 22 Example: Lecturer DB Lecturer Course Number Class fs logic 60 msc fs logic 100 meng2 jm ai 60 msc jm ai 100 meng2 Class Number Only key: (Lecturer,Course,Class) Example cntd. What normal form is Lecturer DB in? Lecturer(Lecturer,Course,Number,Class) Decompose to: Degree(Class,Number) Teaches(Lecturer,Course,Class) Degree Teaches = Class Class Degree 23 24

So far we have done: How do we make sure the decompositions are lossless (equivalence preserving)? Are some decompositions better than others? Dependency Preservation It is often possible to decompose a relation in different ways. Amongst the lossless decompositions some may be better than others. What if there are multiple candidate keys? 25 26 Here is an instance of relation NEWS. Example NEWS (S #, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY 27 S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 10 Paris S4 Clark 20 London S5 Adams 30 Athens NEWS is in 2NF, but not in 3NF. It can be transformed to 3NF by two alternative decompositions. 28 Instance of A A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) Both decompositions are lossless. All resulting relations are in 3NF. Supplier City-info S# SNAME CITY CITY STATUS S1 Smith London London 20 S2 Jones Paris Paris 10 S3 Blake Paris Athens 30 S4 Clark London Rome 50 S5 Adams Athens 29 30

Supplier Instance of B Status-info S# SNAME CITY S# STATUS S1 Smith London S1 20 S2 Jones Paris S2 10 S3 Blake Paris S3 10 S4 Clark London S4 20 S5 Adams Athens S5 30 Which one is better? A or B? Example: Consider the update Change CITY of supplier S1 from London to Paris. What does this involve in A? What does this involve in B? 31 32 In A In A: All we have to do is change the relevant value in relation Supplier. In B: We have to update both relations Supplier and Status-info to ensure that the functional dependency is maintained. In Supplier: Change <S1, Smith, London> to <S1, Smith, Paris>. 33 34 In B Supplier City-info S# SNAME CITY CITY STATUS S1 Smith London London 20 S2 Jones Paris Paris 10 S3 Blake Paris Athens 30 S4 Clark London Rome 50 S5 Adams Athens In Supplier: Change <S1, Smith, London> to <S1, Smith, Paris>. In Supplier find a row with City=Paris, and read its S#. In Status-info find the Status of that S#. In Status-info change the value of the Status of S1 to this new Status. 35 36

Another problem with B is that: Supplier Status-info S# SNAME CITY S# STATUS S1 Smith London S1 20 S2 Jones Paris S2 10 S3 Blake Paris S3 10 S4 Clark London S4 20 S5 Adams Athens S5 30 (Assuming the Entity Integrity Rule) We cannot insert in B the information that a given city has a given status, unless some supplier is located in that city. 37 38 In A In A, it is the transitive dependency S # STATUS which is an inter-relational constraint. This constraint is maintained automatically as long as the constraints S # CITY are maintained in each relation, and these are just primary key constraints in each relation of A. A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) S# SNAME S# CITY S# STATUS 39 40 In B Formalisation/Generalisation of This Intuition B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) S# SNAME S# CITY S# STATUS The problem with B is that the dependency has become an inter-relational constraint. R1 F1 R (F) R2. Rn F2. Fn 41 42

Let F = F1 F2.. Fn In general F F. But if F + = F +, then to check F we only need to check F. Definition: Dependency-Preserving A decomposition R1,..., Rn of R is dependencypreserving if and only if F + = F +, where F and F are defined as above. Example: In the NEWS example A is dependency preserving, but B is not. 43 44 Example: NEWS (S #, SNAME, STATUS, CITY) S# SNAME, STATUS, CITY In A F supplier ={S# SNAME, CITY} F city-info = { } A: Supplier (S#, SNAME, CITY) City-info (CITY, STATUS) B: Supplier (S#, SNAME, CITY) Status-info (S#, STATUS) 45 So F = F supplier F city-info. So clearly F + =F +. 46 In B F supplier ={S# SNAME, CITY} F Status-info = { S# STATUS} So F = F supplier F Status-info. CITY STATUS is in F + but not in F +. So F + F +. So far we have done: How do we make sure the decompositions are lossless (equivalence preserving)? Are some decompositions better than others? What if there are multiple candidate keys? 47 48

Generalising 2NF and 3NF: Boyce-Codd Normal Form (BCNF) For 2NF Get rid of: A B C D E F 49 50 For 3NF Get rid of : What we want: D A B C D E D F A B C E F G H 51 52 With 2NF and 3NF: assumed that the relation has one candidate key. Now we generalise to cater for multiple candidate keys. This more general normal form is called the Boyce- Codd Normal Form (BCNF). CK1 CK2 CK3 attribute1 attribute2 attribute1 attribute3 attribute2 attribute4 attribute3 53 54

Definition: Determinant A determinant is any attribute, or set of attributes, on which some other attribute is fully functionally dependent. Example: R(A,B,C,D,E) AB C A B C D DE A Here A and C are determinants. There are 2 others. Can you see what they are? Definition: BCNF A relation is in Boyce-Codd Normal Form (BCNF) if and only if every determinant is a candidate key. Any relation can be nonloss decomposed into a collection of BCNF relations. 55 56 Example: Enrols(Student #, Subject, Teacher) Teacher Subject (student #, subject) Teacher 57 An Instance of the Relation Scheme Enrols Student # Subject Teacher 100 maths smith 101 maths jones 102 maths smith 103 maths smith 104 physics brown 101 physics brown 100 physics green 58 Each student is taught by several teachers. Each teacher teaches only one subject. Each student takes several subjects and has only one teacher for a given subject. What are the candidate keys of Enrols? What normal form is Enrols in? What problems do you see in the design of Enrols? 59 60

Candidate keys of Enrols are (Teacher, Student#) (Subject, Student#) Teacher Student# Subject Student# Subject Teacher 61 62 Problems with Enrols We cannot insert the fact that a teacher teaches a certain subject until at least one student enrols for that subject. The fact that a teacher teaches a certain subject is recorded with a lot of redundancy, for every student to whom he teaches that subject. Teacher Student# Subject Student# Subject Teacher 63 64 Solution Decompose Enrols into Courses(Teacher, Subject) Students(Student #,Teacher) In Enrols: Teacher is a determinant, but not a candidate key. In Courses(Teacher, Subject) : The only dependency is Teacher Subject. So Teacher is the only determinant. It is also the only candidate key. 65 66

Exercise In Students(Student#,Teacher) Teacher Subject (student #, subject) Teacher No determinant. So BCNF. WHY?? Only Candidate Key: (Student#,Teacher) What are the candidate keys and the determinants of Students? 67 68 Exercise Is the decomposition lossless? Is the decomposition dependency preserving? Enrols (Student #, Subject, Teacher) Courses(Teacher, Subject) Students(Student#,Teacher) Teacher Subject (student #, subject) Teacher Exercise S (S#, Sname, Status, City) with FDs S # Status, City, Sname Sname City, Status, S# Is S in BCNF? 69 70 Determinants: In S: S# and Sname Exercise SSP (S#, Sname, P#, Qty) Candidate keys: S# and Sname So all determinants are candidate keys. So S is in BCNF. 71 with FDs S# Sname Sname S# S#, P# Qty Sname, P# Qty Is SSP in BCNF? 72

In SSP: Determinants: S# and Sname Candidate keys: (S#, P#) and (Sname, P#) SSP Decomposing SSP to BCNF relations S1(S#, Sname) S2(S#,P#,Qty) Lossless? Dependency Preserving? So there are determinants that are not candidate keys. So S is not in BCNF. SSP R1(S#, Sname) R2(Sname,P#,Qty) Lossless? Dependency Preserving? 73 74 An Algorithm for BCNF Decomposition Input: A relation R, the closure, F +, of the set of functional dependencies on R. Output (result): A set of relations R i, such that each R i is in BCNF and the decomposition of R into the R i is lossless. 75 begin result : = { R } ; done : = false ; while (not done) do if (there is a scheme Ri in result that is not in end; BCNF) then begin let A B be a nontrivial ffd that holds on Ri, such that A is not a candidate key of Ri; result : = (result - Ri) (Ri - B) (A, B); end else done : = true ; 76 Same algorithm we have been using for 2NF and 3NF begin result : = { R } ; done : = false ; while (not done) do if (there is a scheme Ri in result that is not in the required normal form) then begin let A B be an fd that holds on Ri, that shows Ri is not in the required normal form; result : = (result - Ri) (Ri - B) (A, B); end else done : = true ; end; 77 Normalisation - Conclusion Objectives of normalisation: Eliminate redundancy Avoid update anomalies (From 5NF upwards) Simplify the enforcement of certain integrity constraints 78

Some Limitations of Normalisation Full normalisation not always desirable. Example: Customer(Name,Street,City, Postcode) Postcode City, Street So Customer is not in 3NF. Normalisation often facilitates update, but tends to have an adverse effect on query evaluation. Related data which may have been retrievable from one relation in an unnormalised schema may have to be retrieved from several relations in the normalised form. 79 80 Decomposition into normal forms is not always unique. But there is not much guidance which decomposition to choose. 81