Introduction to Data Management CSE 344

Similar documents
CSE 344 MAY 16 TH NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION

CSE 544 Principles of Database Management Systems

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Management CSE 344

Database Design and Implementation

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

CSC 261/461 Database Systems Lecture 11

CMPT 354: Database System I. Lecture 9. Design Theory

CSE 344 AUGUST 6 TH LOSS AND VIEWS

SCHEMA NORMALIZATION. CS 564- Fall 2015

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Lectures 6. Lecture 6: Design Theory

Schema Refinement. Feb 4, 2010

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSC 261/461 Database Systems Lecture 13. Spring 2018

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Introduction to Data Management CSE 344

CSC 261/461 Database Systems Lecture 12. Spring 2018

DECOMPOSITION & SCHEMA NORMALIZATION

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

Schema Refinement and Normal Forms

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Chapter 3 Design Theory for Relational Databases

Design Theory for Relational Databases

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

CS322: Database Systems Normalization

Relational Database Design

Chapter 3 Design Theory for Relational Databases

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Lossless Joins, Third Normal Form

Relational Design Theory

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Functional Dependency Theory II. Winter Lecture 21

Design Theory for Relational Databases

Databases 2012 Normalization

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

CS54100: Database Systems

Functional Dependencies and Normalization

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Design theory for relational databases

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

INF1383 -Bancos de Dados

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Relational Database Design

Constraints: Functional Dependencies

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Information Systems (Informationssysteme)

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Chapter 10. Normalization Ext (from E&N and my editing)

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Schema Refinement and Normal Forms

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement and Normal Forms Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Functional Dependency and Algorithmic Decomposition

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Functional Dependencies & Normalization. Dr. Bassam Hammo

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Database Design and Normalization

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Lecture #7 (Relational Design Theory, cont d.)

Constraints: Functional Dependencies

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. Why schema refinement?

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Relational Design: Characteristics of Well-designed DB

CSE 132B Database Systems Applications

Functional Dependencies

Chapter 8: Relational Database Design

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Schema Refinement and Normalization

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Transcription:

Introduction to Data Management CSE 344 Lectures 18: BCNF 1

What makes good schemas? 2

Review: Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name SSN City Fred Joe 123-45-6789 Seattle 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone numbers (how?) 3

Review: Functional Dependencies (FDs) Definition If two tuples agree on the attributes A 1, A 2,, A n then they must also agree on the attributes Formally: B 1, B 2,, B m A 1 A n determines B 1..B m A 1, A 2,, A n à B 1, B 2,, B m 4

Review: Functional Dependencies (FDs) Definition A 1,..., A m à B 1,..., B n holds in R if: t, t R, (t.a 1 = t.a 1... t.a m = t.a m à t.b 1 = t.b 1... t.b n = t.b n ) R A 1... A m B 1... B n t t if t, t agree here then t, t agree here 5

Review: An Interesting Observation If all these FDs are true: name à color category à department color, category à price Then this FD also holds: name, category à price If we find out from application domain that a relation satisfies some FDs, it doesn t mean that we found all the FDs that it satisfies! There could be more FDs implied by the ones we have. 6

Review: Closure of a set of Attributes Given a set of attributes A 1,, A n The closure is the set of attributes B, notated {A 1,, A n } +, s.t. A 1,, A n à B Example: 1. name à color 2. category à department 3. color, category à price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color} 7

Closure Algorithm X={A 1,, A n }. Repeat until X doesn t change do: if B 1,, B n à C is a FD and B 1,, B n are all in X then add C to X. Example: 1. name à color 2. category à department 3. color, category à price {name, category } + = { name, category, color, department, price } Hence: name, category à color, department, price 8

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A, B} + X = {A, B, } Compute {A, F} + X = {A, F, } 9

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A, B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, } 10

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A, B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } 11

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A, B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } What is a key of 12 R?

Keys A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B in the same relation, we have A 1,..., A n à B A key is a minimal superkey A superkey and for which no subset is a superkey 15

Computing (Super)Keys For all sets X, compute X + If X + = [all attributes], then X is a superkey Try reducing to the minimal X s to get the key 16

Example Product(name, price, category, color) name, category à price category à color What is the key? 17

Example Product(name, price, category, color) name, category à price category à color What is the key? (name, category) + = { name, category, price, color } Hence (name, category) is a key 18

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys 19

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys A à B B à C C à A or ABàC BCàA or AàBC BàAC what are the keys here? 20

Eliminating Anomalies Main idea: X à A is OK if X is a (super)key X à A is not OK otherwise Need to decompose the table, but how? Boyce-Codd Normal Form 21

Boyce-Codd Normal Form Dr. Raymond F. Boyce 22

Turing Awards in Data Management Charles Bachman, 1973 IDS and CODASYL Ted Codd, 1981 Relational model Jim Gray, 1998 Transaction processing Michael Stonebraker, 2014 INGRES and Postgres 23

Boyce-Codd Normal Form There are no bad FDs: Definition. A relation R is in BCNF if: Whenever Xà B is a non-trivial dependency, then X is a superkey. Equivalently: Definition. A relation R is in BCNF if: X in XàB, either X + = X or X + = [all attributes] 24

BCNF Decomposition Algorithm Normalize(R) find X in XàB s.t.: X X + and X + [all attributes] if (not found) then R is in BCNF let Y = X + - X; Z = [all attributes] - X + decompose R into R1(X Y) and R2(X Z) Normalize(R1); Normalize(R2); Y X Z X + 25

Want X in XàB s.t.: X =X + or X + = [all attributes] Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield SSN à Name, City The only key is: {SSN, PhoneNumber} Hence SSN à Name, City is a bad dependency Name, City SSN + SSN Phone- Number In other words: SSN+ = SSN, Name, City and is neither SSN nor All Attributes

Want X in XàB s.t.: X =X + or X + = [all attributes] Example BCNF Decomposition Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN à Name, City SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 Name, City SSN + SSN Phone- Number Let s check anomalies: Redundancy? Update? Delete? 27

Find X in XàB s.t.: X X + and X + [all attributes] Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor 28

Find X in XàB s.t.: X X + and X + [all attributes] Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) name, age, haircolor SSN phonenumber 29

Find X in XàB s.t.: X X + and X + [all attributes] Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN à name, age What are age à haircolor the keys? Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber) 30

Find X in XàB s.t.: X X + and X + [all attributes] Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) Note the keys! Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber) 31

R(A,B,C,D) Example: BCNF A à B B à C R(A,B,C,D) 32

R(A,B,C,D) Example: BCNF A à B B à C Recall: find X s.t. X X + [all-attrs] R(A,B,C,D) 33

R(A,B,C,D) Example: BCNF A à B B à C Recall: find X s.t. X X + [all-attrs] R(A,B,C,D) A + = ABC ABCD 34

R(A,B,C,D) Example: BCNF A à B B à C Recall: find X s.t. X X + [all-attrs] R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) R 2 (A,D) 35

R(A,B,C,D) Example: BCNF A à B B à C Recall: find X s.t. X X + [all-attrs] R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D) 36

R(A,B,C,D) Example: BCNF A à B B à C Recall: find X s.t. X X + [all-attrs] R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) What are the keys? What happens if in R we first pick B +? Or AB +? 37

Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) S 1 (A 1,..., A n, B 1,..., B m ) S 2 (A 1,..., A n, C 1,..., C p ) S 1 = projection of R on A 1,..., A n, B 1,..., B m S 2 = projection of R on A 1,..., A n, C 1,..., C p 38

Lossless Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name Price Gizmo 19.99 OneClick 24.99 Gizmo 19.99 Name Gizmo OneClick Gizmo Category Gadget Camera Camera 39

Lossy Decomposition What is lossy here? Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name Gizmo OneClick Gizmo Category Gadget Camera Camera Price Category 19.99 Gadget 24.99 Camera 19.99 Camera 40

Lossy Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Name Gizmo OneClick Gizmo Category Gadget Camera Camera Price Category 19.99 Gadget 24.99 Camera 19.99 Camera 41

Decomposition in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) S 1 (A 1,..., A n, B 1,..., B m ) S 2 (A 1,..., A n, C 1,..., C p ) S 1 = projection of R on A 1,..., A n, B 1,..., B m S 2 = projection of R on A 1,..., A n, C 1,..., C p The decomposition is called lossless if R = S 1 S 2 Let: Fact: If A 1,..., A n à B 1,..., B m then the decomposition is lossless It follows that every BCNF decomposition is lossless 42

Schema Refinements = Normal Forms 1st Normal Form = all tables are flat 2nd Normal Form = obsolete Boyce Codd Normal Form = no bad FDs 3rd and 4th Normal Form = see book BCNF is lossless but can cause loss of ability to check some FDs (see book 3.4.4) 3NF fixes that (is lossless and dependencypreserving), but some tables might not be in BCNF i.e., they may have redundancy anomalies 50