Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Similar documents
Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Schema Refinement and Normal Forms

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement and Normal Forms Chapter 19

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Relational Database Design

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normal Forms

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Schema Refinement and Normalization

Schema Refinement and Normal Forms

Schema Refinement & Normalization Theory

Schema Refinement. Feb 4, 2010

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

SCHEMA NORMALIZATION. CS 564- Fall 2015

Lecture #7 (Relational Design Theory, cont d.)

Schema Refinement and Normal Forms. Why schema refinement?

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Constraints: Functional Dependencies

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Functional Dependencies

INF1383 -Bancos de Dados

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Lossless Joins, Third Normal Form

CSC 261/461 Database Systems Lecture 11

Database Design and Implementation

Functional Dependencies & Normalization. Dr. Bassam Hammo

Constraints: Functional Dependencies

CS322: Database Systems Normalization

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Lecture 15 10/02/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Functional Dependency and Algorithmic Decomposition

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Relational Design: Characteristics of Well-designed DB

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CS54100: Database Systems

Introduction to Data Management. Lecture #9 (Relational Design Theory, cont.)

Chapter 10. Normalization Ext (from E&N and my editing)

Func8onal Dependencies

DECOMPOSITION & SCHEMA NORMALIZATION

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Database Design and Normalization

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Normaliza)on and Func)onal Dependencies

Databases 2012 Normalization

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Design Theory for Relational Databases

Functional Dependencies

CSC 261/461 Database Systems Lecture 13. Spring 2018

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Chapter 8: Relational Database Design

Relational Design Theory

Functional Dependency Theory II. Winter Lecture 21

Information Systems (Informationssysteme)

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Schema Refinement: Other Dependencies and Higher Normal Forms

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Functional Dependencies and Normalization

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Lectures 6. Lecture 6: Design Theory

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

CSE 132B Database Systems Applications

MIS Database Systems Schema Refinement and Normal Forms

Chapter 3 Design Theory for Relational Databases

CSC 261/461 Database Systems Lecture 12. Spring 2018

Design theory for relational databases

Chapter 3 Design Theory for Relational Databases

Functional Dependencies, Schema Refinement, and Normalization for Relational Databases. CSC 375, Fall Chapter 19

CSIT5300: Advanced Database Systems

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Introduction to Data Management. Lecture #10 (DB Design Wrap-up)

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Data Management CSE 344

Relational Database Design

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

CSE 344 MAY 16 TH NORMALIZATION

HKBU: Tutorial 4

Transcription:

Normalization Introduction What problems are caused by redundancy? UVic C SC 370 Dr. Daniel M. German Department of Computer Science What are functional dependencies? What are normal forms? What are the benefits of 3NF and BCNF? What is the process to decompose a relation into its normal forms? July 30, 2003 Version: 1.1.0 11 1 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 2 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Redundancy Example Redundant storage Update anomalies: all repeated data needs to be updated when one copy is updated Insertion anomalies: It might not be possible to store certain data unless some other, unrelated info is also stored Deletion anomalies: It may not be possible to delete certain info without losing some other, unrelated info ssn name lot rating hourwages hoursworked 123-45-789 Smiley 48 8 10 40 234-45-789 Smethurst 22 8 10 30 451-78-123 Guldu 35 5 7 30 457-89-023 Madayan 35 5 7 32 The hourly rate is determined by the rating (this is an example of a functional dependency) Problems: Redundant Storage: (8, 10) and (5,7) are repeated Update Anomalies: The hourwages in the first tuple could be updated without making a similar update to the second tuple 11 3 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 4 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

Example... Decomposition Problems (cont): ssn name lot rating hourwages hoursworked 123-45-789 Smiley 48 8 10 40 234-45-789 Smethurst 22 8 10 30 451-78-123 Guldu 35 5 7 30 457-89-023 Madayan 35 5 7 32 Insertion Anomalies: We need to know the hourwage in order to insert a tuple (this could be fixed with a NULL value) Delete Anomalies: If we delete all the tuples with a given (rating, hourwages) we might lose that association Functional dependencies can be used to refine the schema A relation is replaced with smaller relations Decomposition of a relation schema R consists in replacing the relation schema by two or more relation schemas that each contain a subset of the attributes of R. For example: ssn name lot rating hoursworked 123-45-789 Smiley 48 8 40 234-45-789 Smethurst 22 8 30 451-78-123 Guldu 35 5 30 457-89-023 Madayan 35 5 32 rating hourwages 8 10 5 7 11 5 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 6 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Problems related to Decomposition Functional Dependencies Do we need to decompose? We use normal forms to answer this question What problems arise with a given decomposition? We are interested in the following properties of decompositions: Loss-less join: Can we rebuild the original relation? Dependency preservation: Do we preserve the ICs?? A functional dependency (FD) is a kind of IC that generalizes the concept of a key. Let R be a relation schema, with X and Y be nonempty sets of attributes in R. For an instance r of R, we say that the FD X Y (X functionally determines Y) is satisfied if: t 1,t 2 r,t 1.X = t 2.X = t 1.Y = t 2.Y 11 7 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 8 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

Example of a Functional Dependency Reasoning about FDs An instance that satisfies AB C: A B C D a1 b1 c1 d1 a1 b1 c1 d2 a1 b2 c2 d1 a2 b1 c3 d1 By looking at an instance, we can tell which FDs do not hold But we cannot tell which FDs will hold for any instance of the relation Given a set of FDs we can usually find additional FDs that also hold. Example: Given a key we can always find a superkey We say that an FD f is implied by a set F of FDs for relation schema R, if r R, g F,g holds = f holds A primary key is a special case of a FD: If X Y holds, where Y is the set of all attributes, then X is a superkey FDs should hold for any instance of a relation 11 9 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 10 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Closure of a Set of FDs Closure... The set of all FDs implied by a given set F of FDs is called the closure of F, denoted by F + Armstrong s Axioms for FDs. X,Y,Z are sets of attributes over a relation schema R. Reflexivity: Y X = X Y Augmentation: X Y = Z,XZ Y Z Transitivity: X Y Y Z = X Z Armstrong s Axioms are sound and complete Two more rules non-essential rules: Union: X Y X Z = X Y Z Decomposition: X Y Z = X Y X Z 11 11 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 12 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

Examples Attribute Closure What FDs can be inferred from: A C and B C? We are given a relation Contracts(contractid, supplierid, projectid, deptid, partid, qty, va contractid is the key A project purchases a given part using a single contract A department purchases at most one part from a supplier The attribute closure X + with respect to a set F of FDs is the set of attributes A s.t. X A can be inferred using Armstrong s Axioms. Algorithm to compute the closure: closure = X; repeat until there is no change: { if U V F s.t.u closure, then set closure = closure V } This algorithm can be used to determine if X Y is in F + 11 13 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 14 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Normal Forms First Normal Form We need to know, for a given schema, if it is a good design If we decompose, is the decomposition good? A relation is in first normal form if every field contains only atomic values (no lists nor sets) If a given relation schema is in a normal form we know some problems cannot arise. We are mainly interested in: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) 11 15 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 16 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

Boyce-Codd Normal Form Third Normal Form Let R be a schema F be a set of FDs that hold over R X be a subset of the attributes of R A be an attribute of R R is in Boyce-Codd Normal Form, if for every FD X A, one of the following is true: A X (it is a trivial FD) or X is a superkey Let R be a schema F be a set of FDs that hold over R X be a subset of the attributes of R A be an attribute of R R is in third normal form, if for every FD X A, one of the following is true: A X (it is a trivial FD), or X is a superkey, or A is part of some key for R 11 17 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 18 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Third Normal Form... Lossless join decomposition In order to test if a relation is in 3NF, we need to find all the keys of the relation Finding all keys in a relation is NP complete! So is finding out if a relation is in 3NF Why do we want to use 3NF? Because any relation can be decomposed into a set of 3NF relations Let R be a schema F be a set of FDs that hold over R a decomposition of R into two schemas with attributes X and Y is a lossless-join decomposition with respect to F if r R that satisfies F : π x (r) π y (r) = r 11 19 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 20 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

Observation Theorem 3 In general, for any relation r R and set of attributes X, Y such that X Y = attributes(r) r π x (r) π y (r) Let R be a schema F be a set of FDs that hold over R a decomposition of R into two attributes sets R 1 and R 2 is lossless-join iff R 1 R 2 R 1 F +, or R 1 R 2 R 2 F + In other words, the attributes common to R 1 and R 2 should contain a key to either R 1 or R 2 11 21 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 22 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Corollary 1 Corollary 2 If a X Y holds over R, and X Y =, then the decomposition R into R Y and XY is loss-less join If the relation R is decomposed into R 1 and R 2 is loss-less join, and then R 1 is decomposed into R 11 and R 12 loss-less join, then R = (R 11 R 12 ) R 2 11 23 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 24 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

Projection of F Dependency Preserving Decomposition Let R be a relation schema, that is decomposed into 2 schemas with attribute sets X, Y Let F be a set of FDs over R The projection F on X is: F X = {A B A B F + A,B X} A decomposition of R with FDs F into schemas with atts X, Y is dependency preserving if (F X F Y ) + = F + In other words, we only need to enforce F X and F Y to guarantee that we are enforcing F + 11 25 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 26 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Normalization Decomposition into BCNF If a relation is not in BCNF, we can always find a loss-less join decomposition into BCNF relations If a relation is not in 3NF, we can find a loss-less join, dependency preserving decomposition of 3NF relations 1. If R is not in BCNF, then using a FD X A that is a violation of BCNF, decompose into R A and XA 2. If eitherr A or XA is not in BCNF, decompose them further by a recursive application of this algorithm This decomposion is not FD preserving! 11 27 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 28 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

BCNF and Dependency Preservation Minimal Cover of Set of FDs Sometimes there is no BCNF decomposition that preserves FDs Example: SBD, with FD SB D, and D B Decompose using D B We can t preserve SB D One way to fix this is by adding a relation SBD with a FD SB D, but this defeats the purpose of decomposition A minimal cover for a set of F of FDs is a set G of FDs such that: 1. Every FD is of the form X A for an attribute A 2. F + = G + 3. If we obtain H such that H G by deleting a FD, or an attribute from a FD, then H + G + In other words, every FD is as small as possible, and every FD is required 11 29 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 30 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca Algorithm to obtain minimal cover Dependency Preserving Decomp. into 3NF 1. Put the FDs in standard form (one attribute in the right hand side) 2. Minimize the left side of each FD: For each FD in G, check each attribute in the left side to see if it can be deleted while preserving equivalence with F + 3. Delete redundant FDs: Check each remaining FD in G to see if it can be deleted while preserving equivalence There could be several minimal covers for a given set of FDs. Given a relation R with a set of FDs that is a minimal cover, R 1,R 2,...,R n is a loss-less join decomposition of R, F i denotes the projection of F onto the attributes of R i : Apply the following algorithm to find a Dependency Preserving Decomp. into 3NF: 1. Identify the set N of dependencies in F that is not preserved 2. For each FD X A N, create a relation schema XA and add it to the decomposition 11 31 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 32 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca

Dep. Pres. Decomp. into 3NF... Schema Refinement in Database Design As an optimization, if N contains several FDs of with the same left side X A 1, X A 2,..., X A n, then replace it with a single X A 1...A n to produce a relation XA 1...A n Normalization can eliminate redundancy Conceptual design methodologies, such as ER arrive to an initial design But might not arrive to an optimal design Furthermore, they might not be able to express all FDs 11 33 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca 11 34 Normalization (1.1.0) CSC 370 dmgerman@uvic.ca