DECOMPOSITION & SCHEMA NORMALIZATION

Similar documents
SCHEMA NORMALIZATION. CS 564- Fall 2015

CSC 261/461 Database Systems Lecture 11

Lossless Joins, Third Normal Form

Database Design and Implementation

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Introduction to Data Management CSE 344

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Relational Database Design

Schema Refinement and Normal Forms

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Functional Dependency Theory II. Winter Lecture 21

CSC 261/461 Database Systems Lecture 13. Spring 2018

CS322: Database Systems Normalization

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

CSC 261/461 Database Systems Lecture 12. Spring 2018

CMPT 354: Database System I. Lecture 9. Design Theory

Relational Database Design

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Chapter 10. Normalization Ext (from E&N and my editing)

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

CSE 544 Principles of Database Management Systems

Constraints: Functional Dependencies

Relational Design: Characteristics of Well-designed DB

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Chapter 3 Design Theory for Relational Databases

Databases 2012 Normalization

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Lecture #7 (Relational Design Theory, cont d.)

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Chapter 19

Design Theory for Relational Databases

Schema Refinement: Other Dependencies and Higher Normal Forms

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement & Normalization Theory

Constraints: Functional Dependencies

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Schema Refinement and Normal Forms Chapter 19

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Schema Refinement and Normalization

Relational Design Theory

Functional Dependencies and Normalization

Schema Refinement and Normal Forms. Why schema refinement?

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Schema Refinement and Normal Forms

Chapter 8: Relational Database Design

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

CSE 344 MAY 16 TH NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Database Design and Normalization

Schema Refinement and Normal Forms

Introduction to Management CSE 344

Information Systems (Informationssysteme)

Design Theory for Relational Databases

INF1383 -Bancos de Dados

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Schema Refinement. Feb 4, 2010

CS54100: Database Systems

Lectures 6. Lecture 6: Design Theory

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Chapter 7: Relational Database Design

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Functional Dependencies & Normalization. Dr. Bassam Hammo

CSE 132B Database Systems Applications

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

Introduction to Data Management CSE 344

Normal Forms Lossless Join.

Functional Dependencies

Design theory for relational databases

Transcription:

DECOMPOSITION & SCHEMA NORMALIZATION CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

WHAT IS THIS LECTURE ABOUT? Bad schemas lead to redundancy To correct bad schemas: decompose relations lossless-join dependency preserving Desired normal forms BCNF 3NF 2

DB DESIGN THEORY Helps us identify the bad schemas and improve them 1. express constraints on the data: functional dependencies (FDs) 2. use the FDs to decompose the relations The process, called normalization, obtains a schema in a normal form that guarantees certain properties examples of normal forms: BCNF, 3NF, 3

SCHEMA DECOMPOSITION 4

WHAT IS A DECOMPOSITION? We decompose a relation R(A 1,, A n ) by creating R 1 (B 1,.., B m ) R 2 (C 1,, C l ) where {B #,, B & } {C #,, C + } = {A #, A. } The instance of R 1 is the projection of R onto B 1,.., B m The instance of R 2 is the projection of R onto C 1,.., C l 5

EXAMPLE: DECOMPOSITION SSN name age phonenumber 934729837 Paris 24 608-374-8422 934729837 Paris 24 603-534-8399 123123645 John 30 608-321-1163 384475687 Arun 20 206-473-8221 SSN name age 934729837 Paris 24 123123645 John 30 384475687 Arun 20 SSN phonenumber 934729837 608-374-8422 934729837 603-534-8399 123123645 608-321-1163 384475687 206-473-8221 6

DECOMPOSITION DESIDERATA What should a good decomposition achieve? 1. minimize redundancy 2. avoid information loss (lossless-join) 3. preserve the FDs (dependencypreserving) 4. ensure good query performance 7

EXAMPLE: INFORMATION LOSS name age phonenumber Paris 24 608-374-8422 John 24 608-321-1163 Arun 20 206-473-8221 Decompose into: R 1 (name, age) R 2 (age, phonenumber) name age Paris 24 John 24 Arun 20 age phonenumber 24 608-374-8422 24 608-321-1163 20 206-473-8221 We can t figure out which phonenumber corresponds to which person! 8

LOSSLESS-JOIN DECOMPOSITION R(A, B, C) R 1 (A, B) R 2 (B, C) R (A, B, C) decompose (projection) recover (natural join) A natural join is a join on the same attribute names A schema decomposition is lossless-join if for any initial instance R, R = R 9

A LOSSLESS-JOIN CRITERION Starting with: a relation R(A) + set F of FDs a decomposition of R into R 1 (A 1 ) and R 2 (A 2 ) we say that a decomposition is lossless-join if and only if at least one of the following FDs is in F + (the closure of F) : 1. A 1 A 2 A 1 2. A 1 A 2 A 2 10

EXAMPLE relation R(A, B, C, D) FD A B, C Lossless-join decomposition into R 1 (A, B, C) and R 2 (A, D) A, B, C A, D = A For R 1 we have indeed A B, C Not lossless-join decomposition into R 1 (A, B, C) and R 2 (D) 11

DEPENDENCY PRESERVING Given R and a set of FDs F, we decompose R into R 1 and R 2. Suppose: R 1 has a set of FDs F 1 R 2 has a set of FDs F 2 F 1 and F 2 are computed from F A decomposition is dependency preserving if by enforcing F 1 over R 1 and F 2 over R 2, we can enforce F over R 12

GOOD EXAMPLE Person(SSN, name, age, candrink) SSN name, age age candrink decomposes into R 1 (SSN, name, age) SSN name, age R 2 (age, candrink) age candrink 13

BAD EXAMPLE R(A, B, C) A B R 1 A a 1 B b R 2 A a 1 C c B, C A a 2 b a 2 c Decomposes into: R 1 (A, B) A B R 2 (A, C) no FDs here!! A B C a 1 b c recover a 2 b c The recovered table violates B, C A 14

NORMAL FORMS A normal form represents a good schema design: 1NF (flat tables/atomic values) 2NF 3NF BCNF 4NF more restrictive 15

BCNF DECOMPOSITION 16

BOYCE-CODD NORMAL FORM (BCNF) A relation R is in BCNF if whenever X B is a non-trivial FD, then X is a superkey in R Equivalent definition: for every attribute set X either X D = X or X D = all attributes 17

BCNF EXAMPLE 1 SSN name age phonenumber 934729837 Paris 24 608-374-8422 934729837 Paris 24 603-534-8399 123123645 John 30 608-321-1163 384475687 Arun 20 206-473-8221 SSN name, age key = {SSN, phonenumber} SSN name, age is a bad FD The above relation is not in BCNF! 18

BCNF EXAMPLE 2 SSN name age 934729837 Paris 24 123123645 John 30 384475687 Arun 20 SSN name, age key = {SSN} The above relation is in BCNF! 19

BCNF EXAMPLE 3 SSN phonenumber 934729837 608-374-8422 934729837 603-534-8399 123123645 608-321-1163 384475687 206-473-8221 key = {SSN, phonenumber} The above relation is in BCNF! Is it possible that a binary relation is not in BCNF? 20

BCNF DECOMPOSITION Find an FD that violates the BCNF condition A #, A M,, A. B #, B M,, B & Decompose R to R 1 and R 2 : R 1 R 2 B s A s remaining attributes Continue until no BCNF violations are left 21

EXAMPLE SSN name age phonenumber 934729837 Paris 24 608-374-8422 934729837 Paris 24 603-534-8399 123123645 John 30 608-321-1163 384475687 Arun 20 206-473-8221 The FD SSN name, age violates BCNF Split into two relations R 1, R 2 as follows: R 1 R 2 name age SSN phonenumber 22

EXAMPLE CONT D R 1 R 2 name age SSN phonenumber SSN name, age SSN name age 934729837 Paris 24 123123645 John 30 384475687 Arun 20 SSN phonenumber 934729837 608-374-8422 934729837 603-534-8399 123123645 608-321-1163 384475687 206-473-8221 23

BCNF DECOMPOSITION PROPERTIES The BCNF decomposition: removes certain types of redundancy is lossless-join is not always dependency preserving 24

BCNF IS LOSSLESS-JOIN Example: R(A, B, C) with A B decomposes into: R 1 (A, B) and R 2 (A, C) The BCNF decomposition always satisfies the lossless-join criterion! 25

BCNF IS NOT DEPENDENCY PRESERVING R(A, B, C) A B B, C A There may not exist any BCNF decomposition that is FD preserving! The BCNF decomposition is: R 1 (A, B) with FD A B R 2 (A, C) with no FDs 26

BCNF EXAMPLE (1) Books (author, gender, booktitle, genre, price) author gender booktitle genre, price What is the candidate key? (author, booktitle) is the only one! Is is in BCNF? No, because the left hand side of both (not trivial) FDs is not a superkey! 27

BCNF EXAMPLE (2) Books (author, gender, booktitle, genre, price) author gender booktitle genre, price Splitting Books using the FD author gender: Author (author, gender) FD: author gender in BCNF! Books2 (authos, booktitle, genre, price) FD: booktitle genre, price not in BCNF! 28

BCNF EXAMPLE (3) Books (author, gender, booktitle, genre, price) author gender booktitle genre, price Splitting Books using the FD author gender: Author (author, gender) FD: author gender in BCNF! Splitting Books2 (author, booktitle, genre, price): BookInfo (booktitle, genre, price) FD: booktitle genre, price in BCNF! BookAuthor (author, booktitle) in BCNF! 29

THIRD NORMAL FORM (3NF) 30

3NF DEFINITION A relation R is in 3NF if whenever X A, one of the following is true: A X (trivial FD) X is a superkey A is part of some key of R (prime attribute) BCNF implies 3NF!! 31

3NF CONT D Example: R(A, B, C) with A, B C and C A is in 3NF. Why? is not in BCNF. Why? Compromise used when BCNF not achievable: aim for BCNF and settle for 3NF Lossless-join and dependency preserving decomposition into a collection of 3NF relations is always possible! 32

3NF ALGORITHM 1. Apply the algorithm for BCNF decomposition until all relations are in 3NF (we can stop earlier than BCNF) 2. Compute a minimal basis F of F 3. For each non-preserved FD X A in F, add a new relation R(X, A) 33

3NF EXAMPLE (1) Start with relation R (A, B, C, D) with FDs: A D A, B C A, D C B C D A, B Step 1: find a BCNF decomposition R1 (B, C) R2 (A, B, D) 34

3NF EXAMPLE (2) Start with relation R (A, B, C, D) with FDs: A D A, B C A, D C B C D A, B Step 2: compute a minimal basis of the original set of FDs: A D B C D A D B 35

3NF EXAMPLE (3) Start with relation R (A, B, C, D) with FDs: A D A, B C A, D C B C D A, B Step 3: add a new relation for any FD in the basis that is not satisfied: all the dependencies in F are satisfied! the resulting decomposition R1, R2 is also BCNF! 36

IS NORMALIZATION ALWAYS GOOD? Example: suppose A and B are always used together, but normalization says they should be in different tables decomposition might produce unacceptable performance loss Example: data warehouses huge historical DBs, rarely updated after creation joins expensive or impractical 37