UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Similar documents
Relational Database Design

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Constraints: Functional Dependencies

Schema Refinement and Normal Forms Chapter 19

Functional Dependency and Algorithmic Decomposition

Constraints: Functional Dependencies

SCHEMA NORMALIZATION. CS 564- Fall 2015

Schema Refinement: Other Dependencies and Higher Normal Forms

Functional Dependencies & Normalization. Dr. Bassam Hammo

Schema Refinement and Normal Forms

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

CS54100: Database Systems

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Schema Refinement. Feb 4, 2010

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Functional Dependencies and Normalization

CSE 132B Database Systems Applications

Lossless Joins, Third Normal Form

Schema Refinement and Normal Forms. Chapter 19

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Relational Design Theory

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Relational Design: Characteristics of Well-designed DB

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement and Normal Forms

LOGICAL DATABASE DESIGN Part #1/2

Lecture 6 Relational Database Design

Functional Dependencies

Chapter 3 Design Theory for Relational Databases

Shuigeng Zhou. April 6/13, 2016 School of Computer Science Fudan University

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

CSIT5300: Advanced Database Systems

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. Why schema refinement?

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Chapter 8: Relational Database Design

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Database Design and Implementation

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Database Design and Normalization

Schema Refinement and Normal Forms

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Databases 2012 Normalization

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Functional Dependency Theory II. Winter Lecture 21

Schema Refinement and Normal Forms

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

CS322: Database Systems Normalization

Design Theory for Relational Databases

Schema Refinement and Normalization

CSC 261/461 Database Systems Lecture 13. Spring 2018

Relational Database Design

INF1383 -Bancos de Dados

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Functional Dependencies

DECOMPOSITION & SCHEMA NORMALIZATION

Normaliza)on and Func)onal Dependencies

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Design theory for relational databases

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds

Lectures 6. Lecture 6: Design Theory

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Relational-Database Design

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Chapter 3 Design Theory for Relational Databases

Information Systems (Informationssysteme)

Chapter 10. Normalization Ext (from E&N and my editing)

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Design Theory for Relational Databases

CMPT 354: Database System I. Lecture 9. Design Theory

Kapitel 3: Formal Design

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Relational Database Design

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Functional Dependencies and Normalization

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield

Introduction to Data Management CSE 344

Transcription:

Relational Database Design Database Design To generate a set of relation schemas that allows - to store information without unnecessary redundancy - to retrieve desired information easily Approach - design schema in appropriate normal form How to determine whether a schema is in normal form? - functional dependency: a collection of constraints - a key notion in relational database design Undesirable properties of bad design 1) repetition of information, resulting in wasted space and complicated updates 2) inability to represent certain information: introduction of null values 3) loss of information To avoid undesirable features - properties of information repetition and null values suggest decomposition of relation schema - property of information loss implies lossy-join decomposition Major concern in DB design - how to specify constraints on the database and how to obtain lossless-join decomposition that avoids the undesirable properties above FD-1 FD-2 Loss of Information Functional Dependency Borrow = (B-name, Loan#, C-name, Amount) Amt = Amount,C name (Borrow) Loan = B name,loan#,amount (Borrow) Amt Loan contains more tuples that the original relation B name ( C name =Jones (Amt may introduce a wrong answer Loan)) More tuples in Amt Loan implies less information lossy-join decomposition Reason for such anomaly - Amount does not uniquely relate B-name and C-name - customers may have loans in the same amount, but not necessarily at the same branch - uniqueness is critical: notion of key Functional dependency is a generalization of the notion of key (uniqueness) For a relation scheme R(A 1,..., A n ), let X and Y be subsets of attributes A 1,..., A n. X functionally determines Y (X Y) if relation r(r) represents the current instance of the schema R, and it is not possible to have two tuples in r that agree in components of all attributes X and disagree on some attributes in Y. Properties - if X is a key, then X Y for any possible set of attributes of R - FD allows to express constraints that cannot be expressed just using keys <ex> Lending (B-name, Assets, B-city, Loan#, C-name, Amount) B-name is not a superkey, since a branch may have many loans to many customers, but we can express the dependency B-name B-city FD-3 FD-4

Functional Dependency Notes on Functional Dependency Assign Pilot Flight Date Time ------------------------------------------- A 112 3/11 1325 B 301 3/10 0600 B 105 3/11 1325 - Flight functionally determines Time Flight Time - For any given Pilot, Date, and Time, there is only one flight Pilot, Date, Time Formal definition Flight A set of attributes X functionally determines the set Y if t 1 (X) = t 2 (X) implies t 1 (Y) = t 2 (Y) or, equivalently t 1 (Y) t 2 (Y) implies t 1 (X) t 2 (X) or, equivalently for all X, Y R, Y( X =x (r)) 1 FD is a statement about the universe as we understand it. - FD is a semantic integrity constraints, which must hold for all the tuples in the relation - all relations must satisfy all FDs; otherwise they are not correct It is not true to say "Since X functionally determines Y, if we know X, we know Y." Or, "If X Y, then X identifies Y." Let R be a schema, then X Some FDs are trivial A A X Y if Y X R iff X is a superkey of R FD-5 FD-6 Use of Functional Dependency Logical Implications of Functional Dependency Legality test - check whether the relations are legal under a given set of FDs - if r is legal under a given set of FDs F, we say r satisfies F. Constraints specification - express constraints on the set of legal relations - rules to be used in database design <ex> r A B C D --------------------- a1 b1 c1 d1 a1 b2 c1 d2 A C a2 b2 c2 d2 AB D a2 b3 c2 d3 C A a3 b3 c2 d4 Not enough to consider only the given set of FDs - need to consider all FDs that hold - for a given set of FDs F, certain other FDs also hold: they are logically implied by F <ex> R(A, B, C) FD: {A B, B C} A B B C A C <Proof> If t 1 (A)=t 2 (A), then t 1 (B)=t 2 (B) by A B. Since B C, t 1 (C)=t 2 (C). Therefore if t 1 (A)=t 2 (A), then t 1 (C)=t 2 (C). Hence A C. Closure of F (F + ) - set of all FDs logically implied by F - if F = F +, F is called a full family of dependencies - how to compute F +? use inference rules FD-7 FD-8

Inference Rules Armstrong s Axioms A means of inferring the existence of FD from the given set Completeness and soundness - completeness: given F, the rules allow us to determine all dependencies in F + - soundness: we cannot generate any FD not in F + <ex> R=(A, B, C, D) F={A B, B C} F + = {A B, B C, A C} A D? If we get it by the rules, they are not sound. If we cannot get A C by the rules, they are not complete. Are there complete and sound inference rules to compute F +? Armstrong s axioms (1) reflexivity rule if Y X, then X Y holds (trivial dependency) (2) augmentation rule if X Y, then XZ YZ (3) transitivity rule if X Y and Y Z, then X Z Although these three rules are complete, there are additional three rules to compute F + directly (4) additivity (union) rule if X Y and X Z, then X YZ (5) projectivity (decomposition) rule if X YZ, then X Y and X Z (6) pseudo-transitivity rule if X Y and WY Z, then XW Z FD-9 FD-10 Additional Rules Computing Closure Additional rules can be derived from the original rules union rule: 1. X Y given 2. X XY augment X 3. X Z given 4. XY YZ augment Y 5. X YZ transitivity 2 & 4 decomposition: 1. X YZ given 2. YZ Z reflexivity 3. X Z transitivity 1 & 2 pseudo-transitivity: 1. X Y given 2. XW YW augment W 3. WY Z given 4. XW Z transitivity 2 & 3 Theorem: Armstrong s axioms are sound and complete Let X be a set of attributes, then X + is the set of all attributes functionally determined by X under a set of FDs F. <Algorithm> Input: F and X Output: X + (the closure of X with respect to F) Compute a sequence of sets of attributes X (0), X (1),... by the following rule: (1) X (0) = X (2) X (i+1) = X (i) Z if Y Z F and Y X (i) (3) stop when X (i+1) = X (i) <ex> {A C, B C, C D, DE C, CE A} 1. X = AD 2. X = BC X (0) = {AD} X (0) = {BC} X (1) = {AD} {C} = {ACD} X (1) = {BC} {CD} = {BCD} X (2) = {ACD} = X (1) X (2) = {BCD} = X (1) stop stop FD-11 FD-12

Relation Decomposition Lossless Join One of the properties of bad design suggests to decompose a relation into smaller relations - must achieve lossless-join decomposition (non-additive join) <ex> R = (A, B, C) r A B C F = {A B} --------------- a1 b1 c1 a2 b1 c2 Decomposition 1: r1 A B r2 B C r1 r2 A B C --------- --------- --------------- a1 b1 b1 c1 a1 b1 c1 a2 b1 b1 c2 a1 b1 c2 a2 b1 c1 a2 b1 c2 Decomposition 2: r1 A B r2 A C r1 r2 A B C --------- --------- --------------- a1 b1 a1 c1 a1 b1 c1 a2 b1 a2 c2 a2 b1 c2 If R is a relation schema decomposed into R 1,..., R K, and F is a set of FDs, the decomposition is called a lossless join with respect to F, if for every relation r(r) satisfying F r = R1 (r) R K (r) r is the natural join of its projection onto R i Recoverability - lossless join property is necessary if the decomposed relation is to be recovered from its decomposition Testing lossless join Let R be a schema and F be a set of FDs on R, and =(R 1, R 2 ) be a decomposition of R. Then has a lossless join w.r.t. F iff either R 1 R 2 R 1 (or R 1 R 2 ) or R 1 R 2 R 2 (or R 2 R 1 ) where such FD F + FD-13 FD-14 Lossless Join Decomposition Dependency Preservation Decomposition From the previous example: R=(ABC) F={A B} (1) R 1 = (AB) R 2 = (AC) R 1 R 2 = A R 1 R 2 = B A B in F? Yes. (1) R 1 = (AB) R 2 = (BC) R 1 R 2 = B R 1 R 2 = A R 2 R 1 = C B A in F? No. F +? No. B C in F? No. F +? No Therefore lossy join <ex> R=(City, Street, Zip) F={CS Z, Z C} R 1 = (CZ) R 2 = (SZ) R 1 R 2 = Z R 1 R 2 = C Z C in F? Yes. Another desirable property of decomposition - each FD specified in F either appears directly in one of the relations in the decomposition, or be inferred from FDs that appear in some relation Why desirable? - when updating the DB, the system must check all the FDs are satisfied - for efficiency, violation detection can be done without performing join operation FDs need to be tested by checking one relation A decomposition preserves a set of FDs F, if the union of all FDs in R 1 (F) logically implies all FDs in F F i = R1 (F + ) F = F i check if (F ) + = F + FD-15 FD-16

Testing Dependency Preservation Minimal Redundancy <ex> R=(City, Street, Zip) F={CS Z, Z C} R 1 =(S,Z) R 2 =(C,Z) (1) lossless join? R 1 R 2 = Z, R 1 R 2 = C, Z C in F? Yes (2) dependency preserving? R 1 : only trivial FD R 2 : Z C and trivial FD ( R 1 (F) R 2 (F)) + F + They do not imply CS Z. Hence the decomposition does not preserve dependency. Algorithm for testing dependency preservation - given in the text book, but is not very practical since it requires computing F + that takes exponential time Another desirable property of decomposition - decomposition should contain as little redundant information as possible - degrees to which we can achieve the lack of redundancy is represented by several normal forms Normalization process - first introduced by Codd in 1972 - a series of tests to certify whether or not a relation schema belongs to a certain normal form - Codd proposed three normal forms: 1NF, 2NF, and 3NF - stronger 3NF was proposed by Boyce and Codd: BCNF - 1NF, 2NF, 3NF, and BCNF are all based on FDs - 4NF is based on multivalue dependency - 5NF is based on join dependency - domain-key normal form represents an ultimate normal form FD-17 FD-18