Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Similar documents
Database Design and Implementation

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Introduction to Data Management CSE 344

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CSE 344 MAY 16 TH NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CSE 544 Principles of Database Management Systems

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Introduction to Management CSE 344

CSC 261/461 Database Systems Lecture 11

CSC 261/461 Database Systems Lecture 13. Spring 2018

CMPT 354: Database System I. Lecture 9. Design Theory

SCHEMA NORMALIZATION. CS 564- Fall 2015

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

DECOMPOSITION & SCHEMA NORMALIZATION

Schema Refinement. Feb 4, 2010

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Lectures 6. Lecture 6: Design Theory

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Schema Refinement and Normal Forms

CSC 261/461 Database Systems Lecture 12. Spring 2018

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Introduction to Data Management CSE 344

Chapter 3 Design Theory for Relational Databases

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

Lossless Joins, Third Normal Form

Design Theory for Relational Databases

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

CS322: Database Systems Normalization

Chapter 3 Design Theory for Relational Databases

Relational Database Design

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Design Theory for Relational Databases

CS54100: Database Systems

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Databases 2012 Normalization

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Constraints: Functional Dependencies

INF1383 -Bancos de Dados

Relational Design Theory

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Functional Dependency Theory II. Winter Lecture 21

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Design theory for relational databases

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Database Design and Normalization

Constraints: Functional Dependencies

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Relational Database Design

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Schema Refinement and Normal Forms

Lecture #7 (Relational Design Theory, cont d.)

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Functional Dependencies and Normalization

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Schema Refinement and Normal Forms. Why schema refinement?

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Functional Dependencies

Chapter 8: Relational Database Design

Normaliza)on and Func)onal Dependencies

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Schema Refinement and Normal Forms Chapter 19

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Information Systems (Informationssysteme)

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Schema Refinement and Normal Forms. Chapter 19

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Chapter 10. Normalization Ext (from E&N and my editing)

Schema Refinement: Other Dependencies and Higher Normal Forms

Schema Refinement and Normalization

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Relational Database Design

Transcription:

Practice and Applications of Data Management CMPSCI 345 Lecture 16: Schema Design and Normalization

Keys } A superkey is a set of a/ributes A 1,..., A n s.t. for any other a/ribute B, we have A 1,..., A n à B } A key is a minimal superkey } I.e. set of a/ributes which is a superkey and for which no subset is a superkey 2

Computing (Super)Keys } Compute X + for all sets X } If X + = all a/ributes, then X is a superkey } List only the minimal X s to get the keys 3

Example Product(name, price, category, color) name, category à price category à color What is the key? 4

Example Product(name, price, category, color) name, category à price category à color What is the key? (name, category) + = { name, category, price, color } Hence (name, category) is a key 5

Examples of Keys Enrollment(student, address, course, room, Ome) Find the keys student à address room, time à course student, course à room, time 6

Eliminating Anomalies Main idea: } X A is OK if X is a (super)key } X A is not OK otherwise 7

Example SSN à Name, City Name SSN PhoneNumber City Fred 123-45-6789 413-555-1234 Amherst Fred 123-45-6789 413-555-6543 Amherst Joe 987-65-4321 908-555-2121 Wesaield Joe 987-65-4321 908-555-1234 Wesaield What is the key? {SSN, PhoneNumber} Hence SSN à Name,City is a bad dependency 8

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s, s.t. there are two or more keys 9

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s, s.t. there are two or more keys AB à C BC à A or A à BC B à AC what are the keys here? Can you design FDs such that there are three keys? 10

Boyce-Codd Normal Form (BCNF) A simple condioon for removing anomalies from relaoons: A relaoon R is in BCNF if: If A 1,..., A n à B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R In other words: there are no bad FDs Equivalently: for all X, either (X + = X) or (X + = all a/ributes) 11

BCNF Decomposition Algorithm repeat choose A 1,, A m à B 1,, B n that violates BCNF split R into R 1 (A 1,, A m, B 1,, B n ) and R 2 (A 1,, A m, [others]) cononue with both R 1 and R 2 un(l no more violaoons B s A s Others Is there a 2-a/ribute relaoon that is not in BCNF? R 1 R 2 In pracoce, we have a be/er algorithm (coming up) 12

Example (revisited) SSN à Name, City Name SSN PhoneNumber City Fred 123-45-6789 413-555-1234 Amherst Fred 123-45-6789 413-555-6543 Amherst Joe 987-65-4321 908-555-2121 Wesaield Joe 987-65-4321 908-555-1234 Wesaield What is the key? {SSN, PhoneNumber} Hence SSN à Name,City is a bad dependency 13

Example (revisited) SSN à Name, City Name SSN City Fred 123-45-6789 Amherst Joe 987-65-4321 Wesaield SSN PhoneNumber 123-45-6789 413-555-1234 123-45-6789 413-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 Let s check anomalies: Redundancy? Update? Delete? 14

Example Decomposition Person(name, SSN, age, haircolor, phonenumber) FD1: SSN à name, age FD2: age à haircolor Decompose into BCNF: 15

Example Decomposition Person(name, SSN, age, haircolor, phonenumber) FD1: SSN à name, age FD2: age à haircolor Decompose into BCNF: What is the key? {SSN, phonenumber} But how to decompose? Person(SSN, name, age) Phone(SSN, haircolor, phonenumber) or Person(SSN, name, age, haircolor) Phone(SSN, phonenumber) SSN à name, age, haircolor or. 16

BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X X + [all a/ributes] if (not found) then R is in BCNF let Y = X + - X let Z = [all a/ributes] - X + decompose R into R 1 (X Y) and R 2 (X Z) cononue to decompose recursively R 1 and R 2 17

Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) FD1: SSN à name, age FD2: age à haircolor Find X s.t.: X X + [all a/ributes] IteraOon 1: Person SSN + = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) IteraOon 2: P age + = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber) What are the keys? 18

Example R(A,B,C,D) A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) What are the keys? What happens if in R we first pick B +? Or AB +? 19

Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) R 1 = projecoon of R on A 1,..., A n, B 1,..., B m R 2 = projecoon of R on A 1,..., A n, C 1,..., C p 20

Theory of Decomposition } SomeOmes it is correct: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name à Price Name Price Name Category Gizmo 19.99 OneClick 24.99 Gizmo OneClick Gizmo Gadget Camera Camera Lossless decomposioon 21 x

Incorrect Decomposition } SomeOmes it is not: Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name à Price What s incorrect?? Name Category Price Category Gizmo OneClick Gizmo Gadget Camera Camera 19.99 Gadget 24.99 Camera 19.99 Camera Lossy decomposioon 22

Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) If A 1,..., A n à B 1,..., B m Then the decomposioon is lossless Note: don t need A 1,..., A n à C 1,..., C p BCNF decomposioon is always lossless. WHY? 23

General Decomposition Goals 1. EliminaOon of anomalies BCNF 2. Recoverability of informaoon } Can we get the original relaoon back? 3NF 3. PreservaOon of dependencies } Want to enforce FDs without performing joins SomeOmes cannot decompose into BCNF without losing ability to check some FDs in single relaoon 24 x

BCNF and Dependencies Unit Company Product Unit à Company Company, Product à Unit So, there is a BCNF violaoon, and we decompose. Unit Company Unit à Company Unit Product No FDs In BCNF we lose the FD Company, Product à Unit 25

3NF Motivation A relaoon R is in 3rd normal form if : Whenever there is a nontrivial dep. A 1, A 2,..., A n B for R, then {A 1, A 2,..., A n } is a super-key for R, or B is part of a key. Tradeoffs: BCNF: no anomalies, but may lose some FDs 3NF: keeps all FDs, but may have some anomalies 26