Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Similar documents
Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Chapter 19

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms

Schema Refinement and Normalization

Schema Refinement and Normal Forms

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms. Chapter 19

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Lecture #7 (Relational Design Theory, cont d.)

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Schema Refinement and Normal Forms. Why schema refinement?

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Functional Dependencies

Schema Refinement. Feb 4, 2010

Introduction to Data Management. Lecture #9 (Relational Design Theory, cont.)

MIS Database Systems Schema Refinement and Normal Forms

MIS Database Systems Schema Refinement and Normal Forms

Functional Dependencies, Schema Refinement, and Normalization for Relational Databases. CSC 375, Fall Chapter 19

Relational Database Design

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Database Design and Normalization

Constraints: Functional Dependencies

Introduction to Data Management. Lecture #10 (DB Design Wrap-up)

Lecture 15 10/02/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

SCHEMA NORMALIZATION. CS 564- Fall 2015

Func8onal Dependencies

Constraints: Functional Dependencies

Functional Dependencies & Normalization. Dr. Bassam Hammo

Functional Dependencies and Normalization

Database Design and Implementation

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

INF1383 -Bancos de Dados

Design Theory for Relational Databases

CS322: Database Systems Normalization

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Normaliza)on and Func)onal Dependencies

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

CS54100: Database Systems

Relational-Database Design

CSC 261/461 Database Systems Lecture 11

Schema Refinement: Other Dependencies and Higher Normal Forms

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Relational Design: Characteristics of Well-designed DB

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Lossless Joins, Third Normal Form

CSE 132B Database Systems Applications

Introduction to Data Management CSE 344

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Functional Dependencies

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Functional Dependencies

CSC 261/461 Database Systems Lecture 13. Spring 2018

DECOMPOSITION & SCHEMA NORMALIZATION

Chapter 3 Design Theory for Relational Databases

Databases 2012 Normalization

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Chapter 8: Relational Database Design

Functional Dependency and Algorithmic Decomposition

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Relational Algebra & Calculus

Introduction to Management CSE 344

CSC 261/461 Database Systems Lecture 12. Spring 2018

Functional Dependency Theory II. Winter Lecture 21

Chapter 3 Design Theory for Relational Databases

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Introduction to Data Management. Lecture #10 (Relational Design Theory, cont.)

Functional Dependencies and Normalization

Relational Design Theory

Database Design and Normalization

Transcription:

Schema Refinement Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke 1

Revisit a Previous Example ssn name Lot Employees rating hourly_wages hours_worked ISA contractid Hourly_Emps Contract_Emps v Consider relation obtained from Hourly_Emps: Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked) Denote the schema by listing all its attributes: SNLRWH 2

Example (Contd.) S N L R W H 123-22-3666 Attishoo 48 8 10 40 231-31-5368 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 v Rating (R) determines hourly wages (W): Redundant storage Update: Can we change W in just the 1st tuple of rating 8? Insertion: Insert an employee without knowing the hourly wage for his rating? Insert the hourly wage for rating 10 with no employee? Deletion: What if we have deleted all employees with rating 5. 3

Will Two Smaller Tables be Better? S N L R W H 123-22-3666 Attishoo 48 8 10 40 231-31-5368 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 Hourly_Emps2 S N L R H 123-22-3666 Attishoo 48 8 40 231-31-5368 Smiley 22 8 30 131-24-3650 Smethurst 35 5 30 434-26-3751 Guldu 35 5 32 612-67-4134 Madayan 35 8 40 Wages R W 8 10 5 7 4

The Evils of Redundancy v Redundant storage causes several operation anomalies: Insert and delete anomalies v Functional dependencies, a new type of integrity constraint, can be used to identify schemas with such problems. IC s we have seen: domain constraints, key constraints, foreign key constraints, general constraints (checks or assertions) A new type of IC, functional dependencies, which can be checked using SQL checks or assertions. 5

1. Functional Dependencies (FDs) v A functional dependency X à Y holds over relation R if: X and Y are two sets of attributes of R; allowable instance r of R: t1 r, t2 r, π X (t1) = π X (t2) implies π Y (t1) = π Y (t2) E.g., X = {R}, Y = {W}, and R à W v An FD holds for all allowable instances of a schema. v Key constraint is a special from of FD: K is a super key for R means that K à R. K à R does not require K to be minimal! 6

FDs in the Hourly_Emps Example v Hourly_Emps(ssn, name, office, rating, hrly_wages, hrs_worked) Denoted by SNLRWH v Some FDs on Hourly_Emps: ssn is the key: S à SNLRWH rating determines hrly_wages: R à W 7

Reasoning About FDs v Given some FDs, we can usually infer additional FDs: {ssn à did, did à building} implies ssn à building v Given a set of FDs F, closure of F (F + ) is the set of all FDs that are implied by F. All FDs in F + hold over the relation R. 8

Axioms and Rules v Armstrong s Axioms (X, Y, Z are sets of attributes): Reflexivity: If X Y, then Y à X Augmentation: If X à Y, then XZ à YZ for any Z Transitivity: If X à Y and Y à Z, then X à Z v Couple of additional rules (that follow from AA): Union: If X à Y and X à Z, then X à YZ Decomposition: If X à YZ, then X à Y and X à Z v Computing the closure F + using the axioms/rules: Compute for all FD s. Size of closure is exponential in number of attributes! 9

Attribute Closure v What if we just want to check if a given FD X à Y is in F +? v Simple algorithm for attribute closure X + : X + := {X} DO if there is U à V in F, s.t. U X +, then X + = X + V UNTIL no change Check if a given FD X à Y is in F + : Simply check if Y X +. v Does F = {A à B, B à C, C D à E } imply A à E? Is A à E in the closure F +? Equivalently, is E in A +? 10

2. Normal Forms v Role of FDs in detecting redundancy: R(A,B,C) Given A à B: No FDs (except key constraints) hold: Two tuples have the same A value will have the same B value! No redundancy here. v Normal forms: If a reln does not have certain kinds of FDs, certain redundancy related problems are known not to occur. 11

Boyce-Codd Normal Form (BCNF) v Rewrite every FD in the form of X à A, X is a set of attributes, A is a single attribute Use the decomposition rule v Reln R with FDs F is in BCNF if X à A in F + : 1. A X (called a trivial FD), or 2. X is a superkey (i.e., contains a key) for R. v In BCNF, the only non-trivial FDs are key constraints! 12

Boyce-Codd Normal Form (contd.) v Can we infer the value marked by? If X A, then the relation is not in BCNF A reln. in BCNF can t have X A X Y A x y 1 a x y 2? v Relation in BCNF: Every field of every tuple records information that can t be inferred using FD s from other fields. No redundancy can be detected using FDs! 13

Another Example v When is redundancy possible? Reserves( Sailor, Boat, Date, Credit_card) with S à C, C à S Keys are SBD and CBD. It is not in BCNF. For each reservation of sailor S, same (S, C) is stored. 14

3. Decomposing a Relation Scheme v A decomposition of R breaks R into two or more relns s.t. Each new reln contains a subset of the attributes of R. Every attribute of R appears in at least one new reln. v Decompositions should be used only when: R has redundancy related problems (not in BCNF), We can afford the joins in queries later (performance penalty). 18

Example Decomposition v Hourly_Emps (SNLRWH) FDs: S à SNLRWH and R à W. R à W violates BCNF. And it causes repeated (R,W) storage. v To fix this, create a relation RW, remove W from the main schema. (SNLRWH) (SNLRH) and (RW). 19

(1) Lossless Join Decompositions v Decomposition of schema R into R1 and R2 is lossless-join w.r.t. a set of FDs F if reln. instance r that satisfies F: r = π R1 (r) π R2 (r) v It is always true that r π R1 (r) π R2 (r). A bad decomposition can cause r π R1 (r) π R2 (r). A B C 1 2 3 4 5 6 7 2 8 A B 1 2 4 5 7 2 B C 2 3 5 6 2 8 A B C 1 2 3 4 5 6 7 2 8 1 2 8 7 2 3 20

A Simple Test for Lossless Join v Theorem: Decomposition of R into R1 and R2 is losslessjoin wrt F iff the F + contains: R1 R2 à R1 or R1 R2 à R2 (intersection of R1 and R2 is a (super) key of one of them.) v Algorithm for a lossless join is based on the above result: If U à V holds over R and violates a BCNF definition, the decomposition into UV and R - V is lossless-join. 21

(2) Dependency Preserving Decomposition v Contracts(Contractid, Supplierid, projectid, Deptid, Partid, Qty, Value), CSJDPQV, with FDs: C is key. JP à C: a project buys a given part using a single contract. SD à P: a department buys at most one part from a supplier. v What are the keys? Is it in BCNF? Keys: C, JP, SDJ. Not in BCNF due to SDà P v Lossless-join BCNF decomposition: SDP, CSJDQV Problem: Checking JP à C requires an assertion (using join) How to write this assertion in SQL? 22

Dependency Preserving Decomposition v The projection of a FD set F onto a decomposed reln R1, F R1 = F + R1, contains all U à V s.t. (i) U, V are both in R1, (ii) U à V is in closure F +. v Decomposition of R into R1, R2 is dependency preserving if (F R1 UNION F R2 ) + = F + v Important to consider F + (not F!) in this definition: ABC, A à B, B à C, C à A, decomposed into AB and BC. Is this dependency preserving? Is C à A preserved? 23

Algorithm for Decomposition into BCNF v Relation R with FDs F. If X à Y violates BCNF, decompose R into R1 = XY and R2 = R Y. For each Ri, compute F Ri and check if it is in BCNF. If not, pick a FD violating BCNF and keep composing Ri. v Repeated application of this process yields a lossless join decomposition into BCNF relations. But not necessarily dependency-preserving! So need to check whether some FDs are lost. 24

Steps of BCNF Decomposition v Contracts(CSJDPQV), key C, JP à C, SD à P, J à S. 1. Keys. C, JP, DJ. 2. Normal form. Not in BCNF; SD à P and J à S violate BCNF. 3. Decomposition. To deal with SD à P, decompose into SDP, CSJDQV. n SDP is in BCNF. But CSJDQV is not because: 1. Projection of FDs and keys. Projection of FDs: keys C and DJ, J à S. 2. Normal form. Not BCNF; J à S violates BCNF. 3. Decomposition. For J à S, decompose CSJDQV into JS and CJDQV. n JS is in BCNF. So is CJDQV. v If several FDs violate BCNF, the order of dealing with them could lead to very different sets of relations! 25

BCNF and Dependency Preservation v Is a lossless-join BCNF decomposition dependency-preserving? CSJDPQV with key C, JP à C, SD à P and J à S. CSJDPQV decomposed into SDP, JS, and CJDQV. What about the lost FD, JP à C? a) Use assertion. Bad performance on inserts/updates. b) Adding JPC as a new relation to preserve JP à C introduces redundancy across relations. In general, the new relation may not be in BCNF. c) Don t decompose. Deal with insertion/deletion anomalies. DBA should examine the tradeoffs. v There may not exist a lossless join, dependency-preserving decomposition into BCNF. But there is always a lossless join, dependency-preserving decomposition into 3NF (more see the textbook) 26