CSC 261/461 Database Systems Lecture 13. Spring 2018

Similar documents
Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CMPT 354: Database System I. Lecture 9. Design Theory

CSC 261/461 Database Systems Lecture 12. Spring 2018

CSC 261/461 Database Systems Lecture 11

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Database Design and Implementation

Design theory for relational databases

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Introduction to Data Management CSE 344

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

SCHEMA NORMALIZATION. CS 564- Fall 2015

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Lectures 6. Lecture 6: Design Theory

CSE 344 MAY 16 TH NORMALIZATION

Schema Refinement and Normal Forms

CSE 544 Principles of Database Management Systems

Introduction to Management CSE 344

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Introduction to Data Management CSE 344

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Schema Refinement. Feb 4, 2010

CSE 344 AUGUST 3 RD NORMALIZATION

Design Theory for Relational Databases

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

Functional Dependency and Algorithmic Decomposition

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Relational Database Design

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Design Theory for Relational Databases

Lossless Joins, Third Normal Form

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

DECOMPOSITION & SCHEMA NORMALIZATION

Chapter 3 Design Theory for Relational Databases

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Schema Refinement & Normalization Theory

CS54100: Database Systems

Functional Dependency Theory II. Winter Lecture 21

Database Design and Normalization

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Chapter 3 Design Theory for Relational Databases

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement and Normalization

INF1383 -Bancos de Dados

Schema Refinement and Normal Forms

Relational Database Design

Functional Dependencies and Normalization

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Databases 2012 Normalization

CS322: Database Systems Normalization

Relational Design Theory

Constraints: Functional Dependencies

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Chapter 19

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Normal Forms Lossless Join.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement and Normal Forms. Why schema refinement?

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Information Systems (Informationssysteme)

Functional Dependencies & Normalization. Dr. Bassam Hammo

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Relational-Database Design

Lecture #7 (Relational Design Theory, cont d.)

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CSE 132B Database Systems Applications

Relational Database Design

Constraints: Functional Dependencies

Schema Refinement and Normal Forms

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Normaliza)on and Func)onal Dependencies

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Schema Refinement: Other Dependencies and Higher Normal Forms

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

Transcription:

CSC 261/461 Database Systems Lecture 13 Spring 2018

BCNF Decomposition Algorithm BCNFDecomp(R): Find X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2)

BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R Find a set of attributes X which has non-trivial bad FDs, i.e. is not a superkey, using closures let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2)

BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R If no bad FDs found, in BCNF! let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2)

BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Let Y be the attributes that X functionally determines (+ that are not in X) And let Z be the other attributes that it doesn t

BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] Split into one relation (table) with X plus the attributes that X determines (Y) if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) Y X Z Return BCNFDecomp(R1), BCNFDecomp(R2) R 1 R 2

BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] And one relation with X plus the attributes it does not determine (Z) if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) Y X Z Return BCNFDecomp(R1), BCNFDecomp(R2) R 1 R 2

BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) Proceed recursively until no more bad FDs!

Another way of representing the same concept BCNFDecomp(R): If Xà A causes BCNF violation: Decompose R into R1= XA R2 = R A (Note: X is present in both R1 and R2) Continue decomposing until no BCNF violation

BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R Example let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) BCNFDecomp(R): If Xà A causes BCNF violation: Decompose R into R1= XA R2 = R A (Note: X is present in both R1 and R2) R(A,B,C,D,E) {A} à {B,C} {C} à {D} Continue decomposing until no BCNF violation

Example R(A,B,C,D,E) R(A,B,C,D,E) {A} + = {A,B,C,D} {A,B,C,D,E} {A} à {B,C} {C} à {D} R 1 (A,B,C,D) {C} + = {C,D} {A,B,C,D} R 11 (C,D) R 12 (A,B,C) R 2 (A,E)

DECOMPOSITIONS 12

Recap: Decompose to remove redundancies 1. We saw that redundancies in the data ( bad FDs ) can lead to data anomalies 2. We developed mechanisms to detect and remove redundancies by decomposing tables into BCNF 1. BCNF decomposition is standard practice- very powerful & widely used! 3. However, sometimes decompositions can lead to more subtle unwanted effects When does this happen? 13

Decompositions in General R(A 1,...,A n,b 1,...,B m,c 1,...,C p ) R 1 (A 1,...,A n,b 1,...,B m ) R 2 (A 1,...,A n,c 1,...,C p ) R 1 = the projection of R on A 1,..., A n, B 1,..., B m R 2 = the projection of R on A 1,..., A n, C 1,..., C p 14

Theory of Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Sometimes a decomposition is correct I.e. it is a Lossless decomposition Name Price Name Category Gizmo 19.99 Gizmo Gadget OneClick 24.99 OneClick Camera Gizmo 19.99 Gizmo Camera 15

Lossy Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera However sometimes it isn t What s wrong here? Name Gizmo OneClick Gizmo Category Gadget Camera Camera Price Category 19.99 Gadget 24.99 Camera 19.99 Camera 16

Lossless Decompositions R(A 1,...,A n,b 1,...,B m,c 1,...,C p ) R 1 (A 1,...,A n,b 1,...,B m ) R 2 (A 1,...,A n,c 1,...,C p ) A decomposition R to (R1, R2) is lossless if R = R1 Join R2

Lossless Decompositions R(A 1,...,A n,b 1,...,B m,c 1,...,C p ) R 1 (A 1,...,A n,b 1,...,B m ) R 2 (A 1,...,A n,c 1,...,C p ) If {A 1,..., A n } à {B 1,..., B m } Then the decomposition is lossless Note: don t need {A 1,..., A n } à {C 1,..., C p } BCNF decomposition is always lossless. Why? 18

A relation TEACH that is in 3NF but not in BCNF Two FDs exist in the relation TEACH: X à A {student, course} à instructor instructor à course {student, course} is a candidate key for this relation So this relation is in 3NF but not in BCNF A relation NOT in BCNF should be decomposed Slide 14-19

Achieving the BCNF by Decomposition (2) Three possible decompositions for relation TEACH D1: {student, instructor} and {student, course} D2: {course, instructor } and {course, student} ü D3: {instructor, course } and {instructor, student} Slide 14-20

A problem with BCNF Problem: To enforce a FD, must reconstruct original relation on each insert!

A Problem with BCNF Unit Company Product {Unit} à {Company} {Company,Product} à {Unit} Unit Company Unit Product We do a BCNF decomposition on a bad FD: {Unit} + = {Unit, Company} {Unit} à {Company} We lose the FD {Company,Product} à {Unit}!! 22

So Why is that a Problem? Unit Company Galaga99 UW Bingo UW Unit Galaga99 Bingo Product Databases Databases No problem so far. All local FD s are satisfied. {Unit} à {Company} Unit Company Product Galaga99 UW Databases Bingo UW Databases Let s put all the data back into a single table again: Violates the FD {Company,Product} à {Unit}!! 23

The Problem We started with a table R and FDs F We decomposed R into BCNF tables R 1, R 2, with their own FDs F 1, F 2, We insert some tuples into each of the relations which satisfy their local FDs but when reconstruct it violates some FD across tables! Practical Problem: To enforce FD, must reconstruct R on each insert! 24

Possible Solutions Various ways to handle so that decompositions are all lossless / no FDs lost For example 3NF- stop short of full BCNF decompositions. Usually a tradeoff between redundancy / data anomalies and FD preservation BCNF still most common- with additional steps to keep track of lost FDs 25

Summary Constraints allow one to reason about redundancy in the data Normal forms describe how to remove this redundancy by decomposing relations Elegant by representing data appropriately certain errors are essentially impossible For FDs, BCNF is the normal form. A tradeoff for insert performance: 3NF

Another Example From Problem Set 5 Let the relation schema R(A,B,C,D) is given. For each of the following set of FDs do the following: i) indicate all the BCNF violations. Do not forget to consider FD s that are not in the given set. However, it is not required to give violations that have more than one attribute on the right side. ii) Decompose the relations, as necessary, into collections of relations that are in BCNF. 1. FDs AB C, C D, and D A

Find Non-trivial dependencies There are 14 nontrivial dependencies, They are: C A, C D, D A, AB D, AB C, AC D,BC A, BC D, BD A, BD C, CD A, ABC D, ABD C, and BCD A.

Proceed From There One choice is to decompose using the violation C D. Using the above FDs, we get ACD (Because of C D and C A) and BC as decomposed relations. BC is surely in BCNF, since any two-attribute relation is. we discover that ACD is not in BCNF since C is its only key. We must further decompose ACD into AD and CD. Thus, the three relations of the decomposition are BC, AD, and CD.

Two other topics to Study Cover and Minimal Cover Let F = {A C, AC D, E AD, E H}. Let G = {A CD, E AH}. Show that: 1. G covers F 2. F covers G 3. F and G are equivalent

Cover We say that a set of functional dependencies F covers another set of functional dependencies G, if every functional dependency in G can be inferred from F. More formally, F covers G if G+ F+ F is a minimal cover of G if F is the smallest set of functional dependencies that cover G

Example of Cover

Problem Set 5 http://www.cs.rochester.edu/courses/261/spring2018/ps/ps5.pdf Solution: http://www.cs.rochester.edu/courses/261/spring2018/ps/ps5sol.pdf Really important for Midterm!

Acknowledgement Some of the slides in this presentation are taken from the slides provided by the authors. Many of these slides are taken from cs145 course offered by Stanford University. Thanks to YouTube, especially to Dr. Daniel Soper for his useful videos.