Database Design and Normalization

Similar documents
But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

INF1383 -Bancos de Dados

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Schema Refinement and Normal Forms

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Constraints: Functional Dependencies

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Constraints: Functional Dependencies

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms Chapter 19

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Relational Database Design

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement and Normal Forms

Schema Refinement and Normalization

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms. Why schema refinement?

Schema Refinement and Normal Forms. Chapter 19

CSC 261/461 Database Systems Lecture 13. Spring 2018

Lecture #7 (Relational Design Theory, cont d.)

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

CSE 132B Database Systems Applications

Database Design and Implementation

Schema Refinement: Other Dependencies and Higher Normal Forms

SCHEMA NORMALIZATION. CS 564- Fall 2015

Relational Design: Characteristics of Well-designed DB

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Database Design and Normalization

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

CMPT 354: Database System I. Lecture 9. Design Theory

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Lossless Joins, Third Normal Form

Databases 2012 Normalization

Chapter 7: Relational Database Design

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

DECOMPOSITION & SCHEMA NORMALIZATION

CS322: Database Systems Normalization

Functional Dependencies and Normalization

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Normaliza)on and Func)onal Dependencies

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CS54100: Database Systems

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Design Theory for Relational Databases

Functional Dependencies

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Database Systems SQL. A.R. Hurson 323 CS Building

Functional Dependency and Algorithmic Decomposition

CSC 261/461 Database Systems Lecture 11

Functional Dependency Theory II. Winter Lecture 21

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Chapter 3 Design Theory for Relational Databases

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Tuple Relational Calculus

Relational Database Design

Chapter 8: Relational Database Design

Introduction to Data Management. Lecture #10 (DB Design Wrap-up)

Introduction to Data Management. Lecture #9 (Relational Design Theory, cont.)

Introduction to Data Management CSE 344

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Problem about anomalies

CSE 544 Principles of Database Management Systems

Functional Dependencies & Normalization. Dr. Bassam Hammo

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Information Systems (Informationssysteme)

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Relational Design Theory

Schema Refinement. Feb 4, 2010

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

BCNF revisited: 40 Years Normal Forms

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

HKBU: Tutorial 4

Transcription:

Database Design and Normalization Chapter 11 (Week 12) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1

1NF FIRST S# Status City P# Qty S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400 Sample tabulation of FIRST EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 2

Second Normal Form (2NF) A B, B C A C S# 1 3 Status city 2 R1 (SECOND) S# Status City S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens Suppose transitive FD (dependency between non-key attributes) exists. S# P# Qty R2 (SP) S# P# Qty S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 3

Problem with 2NF - Insertion in SECOND - Update of SECOND - Deletion in SECOND Relation SP has no problem. It is in 3NF EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 4

Third Normal Form (3NF) 1)Full dependency on the P.key 2)No mutual dependency among non-p.key attributes. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 5

Decomposition of a Relation Scheme Suppose that relation R contains attributes A1... An. A decomposition of R consists of replacing R by two or more relations such that: Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and Every attribute of R appears as an attribute of one of the new relations. Intuitively, decomposing R means we will store instances of the relation schemes produced by the decomposition, instead of instances of R. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 6

Problems with Decompositions There are three potential problems to consider: ❶ Some queries become more expensive. (joining) ❷ Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! (lossy decomposition). ❷ Checking some dependencies may require joining the instances of the decomposed relations. (dependency preserving) Tradeoff: Must consider these issues vs. redundancy. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 7

Lossless Join Decompositions Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r that satisfies F: (r) (r) = r π X >< π Y It is always true that r π (r) (r) X >< π Y In general, the other direction does not hold! If it does, the decomposition is lossless-join. Definition extended to decomposition into 3 or more relations in a straightforward way. It is essential that all decompositions used to deal with redundancy be lossless! (Avoids Problem (2).) Minimal cover M.C. Lossless-Join EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 8

More on Lossless Join The decomposition of R into X and Y is lossless-join wrt F if and only if the closure of F contains: X Y X, or X Y Y In particular, the decomposition of R into UV and R - V is lossless-join if U V holds over R. A B C 1 2 3 4 5 6 7 2 8 A B C 1 2 3 4 5 6 7 2 8 1 2 8 7 2 3 A B 1 2 4 5 7 2 B C 2 3 5 6 2 8 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 9

Decomposition from 2NF to 3NF R(A,B,C) P.K. (A) B C Decomposition (loss-less) R1 (B,C) R2 (A,B) P.K (B) P.K (A) F.K (B) reference R1 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 10

Additional Consideration: Dependency Preservation Issue Solution1 (loss-less) SC (S#,city): P.K (S#) F.K (city) CS (city,status): P.K (city) Solution2 (loss-less) SC(S#,city) SS(S#,status) Solution 2 is bad. We cannot insert The status information for a city unless some supplier is located in that city. city status. Loss of an FD (inter-relation database constraint problem) S# Status city EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 11

Dependency Preserving Decompositions (Contd.) Decomposition of R into X and Y is dependency preserving if (F X union F Y ) + = F + i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +. Important to consider F +, not F, in this definition: ABC: A B, B C, C A, decomposed into AB and BC. Is this dependency preserving? Is C A preserved? Dependency preserving does not imply lossless join, and vice-versa. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 12

Decomposition into 3NF To ensure dependency preservation, one idea: If X Y is not preserved, add relation XY. Problem is that XY may violate 3NF Refinement: Instead of the given set of FDs F, use a minimal cover for F. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 13

Lossy Decomposition S# Status S1 20 S2 10 S3 10 S4 20 S5 30 City status Athens 30 London 20 Paris 20 Rome 50 NY 20 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 14

Dependency Preservation Issue Independent Projection: Update can be made to either relation without regard for the other Theorem: Projections R1 and R2 of R are independent iff: Every FD in R can be logically deduced from those given in R1 and R2 The common attribute of R1 and R2 form a candidate key for at least one of them. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 15

BCNF(Boyce/Codd Normal Form) 3 NF deals with exactly one candidate key (always arrows out of a candidate key) Relations with Multiple Candidate keys Keys are composite Key overlap Update anomalies EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 16

Multiple Keys with Overlap Sname S# P# Qty EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 17

Determinant Any attribute on which some other attribute is fully FD. Qty S# P# Status city S#, city, (S#,P#) are determinants. EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 18

BCNF(Boyce/Codd Normal Form) Definition:A relation is in BCNF iff every determinant is a candidate key (only arrows in the whole diagram are arrows out of candidate keys) Alternate definition: A relation R(A1, A2,, An) is in BCNF iff the existence of a non-trivial FD X Y implies the existence of FD s X Ai, for all I = 1,2, n) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 19

Example of FIRST Relation S#, city, (S#,P#) are determinants. What are the candidate keys? Is this relation in BCNF? Qty S# P# Status city EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 20

BCNF(Boyce/Codd Normal Form) Ex: SSP (S#,Sname,P#,Qty) (in 3NF) Candidate keys are (S#,P#) (Sname,P#) Update anomalies S# Sname P# Qty S1 Smith P1 300 S1 Smith P2 200 S1 Smith P3 400 S1 Smith P4 200 EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 21

BCNF(Boyce/Codd Normal Form) Original S relation (S#,Sname, Status, Qty) is in BCNF. S# Status Sname city Reason: Only determinants are candidate keys (although multiple candidate keys exist) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 22

BCNF(Boyce/Codd Normal Form) 3 cases of overlapping composite candidate keys Sname S# Case A: Two composite candidate P# keys with a FD Case B: Inter-relational Qty Constraints, an attribute is a determinant but not a candidate key S Case C: S P T J J EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 23

Case A: BCNF Case A: Two composite candidate keys with a FD Solutions 1: SS (S#,Sname) SP (S#,P#,Qty) Solutions 2: SS (S#,Sname) SP (Sname,P#,Qty) Sname Qty S# P# EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 24

Case B: BCNF Case B: Inter-relational T Constraints, an attribute is a determinant but not a candidate key J S SJT(Student, Subject, Teacher) (in 3NF but not in BCNF) Problem: T is determinant But not a candidate key. S J T Smith Math White Jones Math White Jones Physics Brown Smith Physics Green EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 25

Case B: Decomposing into BCNF Solution ST(S,T) TJ(T,J) loss-less Not independent projection (inter-relational constraint). Two projections cannot be independently updated. Note : Sometimes decomposing a relation into BCNF and decomposition into independent relations may conflict. T Green Brown Kirk J Physics Physics Chem S Smith Jones Smith Smith T White Brown Brown Don Kirk EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 26

Case C: BCNF Case C: Exam(S,J,P) S J P Candidate keys are the only determinants. No anomalies. Assumption:?? EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 27