Relational Design: Characteristics of Well-designed DB

Similar documents
Relational Database Design

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

SCHEMA NORMALIZATION. CS 564- Fall 2015

Constraints: Functional Dependencies

Constraints: Functional Dependencies

Functional Dependency Theory II. Winter Lecture 21

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Relational Database Design

Functional Dependencies

Chapter 7: Relational Database Design

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

CS54100: Database Systems

Lossless Joins, Third Normal Form

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Relational-Database Design

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

CS322: Database Systems Normalization

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Chapter 8: Relational Database Design

Functional Dependencies and Normalization

Schema Refinement: Other Dependencies and Higher Normal Forms

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Databases 2012 Normalization

Schema Refinement and Normal Forms

DECOMPOSITION & SCHEMA NORMALIZATION

But RECAP. Why is losslessness important? An Instance of Relation NEWS. Suppose we decompose NEWS into: R1(S#, Sname) R2(City, Status)

Database Design and Normalization

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Functional Dependency and Algorithmic Decomposition

Schema Refinement & Normalization Theory

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Relational Design Theory

Chapter 10. Normalization Ext (from E&N and my editing)

Schema Refinement and Normal Forms. Why schema refinement?

Kapitel 3: Formal Design

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Functional Dependencies & Normalization. Dr. Bassam Hammo

INF1383 -Bancos de Dados

CSIT5300: Advanced Database Systems

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Lecture 6 Relational Database Design

Normaliza)on and Func)onal Dependencies

Schema Refinement. Feb 4, 2010

Schema Refinement and Normal Forms

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Chapter 3 Design Theory for Relational Databases

Schema Refinement and Normal Forms

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Information Systems (Informationssysteme)

Shuigeng Zhou. April 6/13, 2016 School of Computer Science Fudan University

Schema Refinement and Normal Forms Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

CSE 132B Database Systems Applications

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. Chapter 19

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Database Design and Implementation

Functional Dependencies

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Schema Refinement and Normalization

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Database Design and Normalization

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013

Design Theory for Relational Databases

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Design Theory for Relational Databases

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

CSC 261/461 Database Systems Lecture 13. Spring 2018

CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

CSC 261/461 Database Systems Lecture 11

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Lecture #7 (Relational Design Theory, cont d.)

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Introduction to Data Management CSE 344

Relational Normalization: Contents

Normal Forms Lossless Join.

Transcription:

Relational Design: Characteristics of Well-designed DB 1. Minimal duplication Consider table newfaculty (Result of F aculty T each Course) Id Lname Off Bldg Phone Salary Numb Dept Lvl MaxSz 20000 Cotts 103 DuPont 1234 45000 867 EE 5 25 20000 Cotts 103 DuPont 1234 45000 652 CIS 5 25 00333 Garth 423 DuPont 4321 87000 323 EE 3 25 00333 Garth 423 DuPont 4321 87000 413 EE 4 25 55555 Jones 211 Ewing 9876 55000 230 MATH 2 60 20001 Clarke 103 DuPont 1235 96000 120 AST 1 60 20001 Clarke 103 DuPont 1235 96000 450 AST 4 15... 2. Represent all info in specs 3. Prevent info from being lost Duplication can be minimized by decomposition: Consider tables newfac: Id Lname Off Bldg Phone Salary 20000 Cotts 103 DuPont 1234 45000 00333 Garth 423 DuPont 4321 87000 55555 Jones 211 Ewing 9876 55000 20001 Clarke 103 DuPont 1235 96000... and newcourse: Off Bldg Numb Dept Lvl MaxSz 103 DuPont 867 EE 5 25 103 DuPont 652 CIS 5 25 423 DuPont 323 EE 3 25 423 DuPont 413 EE 4 25 211 Ewing 230 MATH 2 60 103 DuPont 120 AST 1 60 103 DuPont 450 AST 4 15... Now, consider their join newf ac newcourse: Id Lname Off Bldg Phone Salary Numb Dept Lvl MaxSz 20000 Cotts 103 DuPont 1234 45000 867 EE 5 25 20000 Cotts 103 DuPont 1234 45000 652 CIS 5 25 20000 Cotts 103 DuPont 1234 45000 120 AST 1 60 20000 Cotts 103 DuPont 1234 45000 450 AST 4 15 00333 Garth 423 DuPont 4321 87000 323 EE 3 25 00333 Garth 423 DuPont 4321 87000 413 EE 4 25 55555 Jones 211 Ewing 9876 55000 230 MATH 2 60 20001 Clarke 103 DuPont 1235 96000 120 AST 1 60 20001 Clarke 103 DuPont 1235 96000 450 AST 4 15 20001 Clarke 103 DuPont 1235 96000 450 EE 5 25 20001 Clarke 103 DuPont 1235 96000 450 CIS 5 25... 1

Relational Design: Functional Dependencies (FDs) FD expresses a constraint on values of a set of attributes imposed by another set Formally: Let X, Y R Y is functionally dependent on X, denoted X Y, iff for all tuples t 1, t 2 r(r), t 1 [Y ] = t 2 [Y ] whenever t 1 [X] = t 2 [X] Denoted X Y Relational Design: Closure of a Set of FDs Generally, given a set of FDs, there are additional FDs that can be derived from the set The closure of a set FDs F : Set of FDs derivable from F Denoted F + Armstrong s Axioms allow computation of F + 1. Reflexivity rule: Given set of attributes X and Y X, then X Y 2. Augmentation rule: If X Y, and Z is a set of attributes, then XZ Y Z (OR, If X Y, then XZ Y ) 3. Transitivity rule: If X Y, and Y Z, then X Z Supplemental axioms: 1. Union rule: If X Y, and X Z, then X Y Z 2. Decomposition rule: If X Y Z, then X Y and X Z 3. Pseudotransitivity rule: If X Y, and Y W Z, then XW Z Full family of FDs: A set of FDs F is said to be a full family of FDs if F = F + Closure of X under F Let X be a set of attributes Relational Design: Closure of a Set of Attributes Closure of X under F is the set of attributes functionally dependent on X as determined by a set of FDs F when applied to X Denoted X + To determine X + with respect to F : X + X do { oldx + X + for (each Y Z F ) if (Y X + ) X + X + Z } until (oldx + = X + ) 2

K schema R is a superkey of R if for all tuples t 1, t 2 r(r), t 1 = t 2 whenever t 1 [K] = t 2 [K] I.e., K R Full functional dependency: Y is fully functionally dependent on X in FD X Y if there is no subset of X on which Y is dependent I.e., for any Z X, Z Y X is said to be irreducible C R is a candidate key of R if C R and C is irreducible To find a key K for scheme R: K R for (each a i K) { T (K a i ) + if ((K a i ) + = R K K a i } Given sets of FDs F and G F covers G if G F + Relational Design: Equivalence of sets of FDs F and G are equivalent if F covers G and G covers F I.e., F + G + To determine whether X Y F +, compute X + WRT F If Y X +, then X Y F + To determine equivalence of F and G 1. For each X Y F, compute X + WRT G (a) If Y X +, then X Y G + (b) If fail for any FD, stop. G does not cover F 2. For each A B G, compute A + WRT F (a) If B A +, then A B F + (b) If fail for any FD, stop. F does not cover G 3. If succeed for all FDs, F G 3

Minimal cover of a set of FDs F : Smallest set of FD s that is equivalent to F Denoted F c Minimal cover has the following properties 1. Every FD has a single attribute on right side 2. No left-hand side has extraneous attributes Relational Design: Minimal (Canonical) Covers I.e., every left-hand side is irreducible a is extraneous in X if (F c (X y)) ((X a) y) F 3. No FD is redundant; I.e., X y is redundant if (F c (X y)) F To compute the minimal cover G for F : 1. G F 2. For each FD of the form X a 1, a 2,..., a n, replace by X a 1, X a 2,..., X a n 3. For each FD X a G, delete all extraneous attributes 4. Delete each redundant FD X a from G Relational Design: Decomposition A decomposition of a scheme R is a set of subschemas derived from R Formally: Let R be a relational scheme Then {R 1, R 2,..., R n } is a decomposition of R if R 1 R 2... R n = R Formally: Given Relational Design: Lossless Join Decomposition 1. Scheme R, 2. Relation r(r), 3. Decomposition D = {R 1, R 2,..., R n }, and 4. Relations r 1 (R 1 ), r 2 (R 2 ),..., r n (R n ), where r 1 = π R1 (R) Then D is a lossless join decomposition of R if r 1 r 2... r n = r A decomposition of R is lossless if either 1. R 1 R2 (R 1 R 2 ) 2. R 1 R2 (R 2 R 1 ) 4

Algorithm to determine whether decomposition is lossless: Given 1. A set of FDs F, 2. schema R(A 1, A 2,..., A n ), and 3. decomposition D = R 1, R 2,..., R k Steps: Consider 1. Construct table with n columns and k rows Rows correspond to k subschemas R i Columns correspond to n attributes A j 2. In table[i, j], put a j if A j R i Otherwise, put b ij 3. For each FD α β F Look for 2 rows that have matching values for every A j α Set the column values that correspond to the attributes in β to the same values for these 2 rows The goal is to replace b ij with a j 4. Continue until either (a) No more changes can be made, or (b) A row contains α 1, α 2,..., α n 5. If a row contains α 1, α 2,..., α n, The decomposition is lossless R Snum City Status s1 London 20 s2 Paris 10 s3 Paris 10 s4 London 20 and FDs Snum City City Status Now consider the following decompositions Relational Design: Dependency Preservation - Motivation 1. 2. S1 Snum City s1 London s2 Paris s3 Paris s4 London T1 Snum City s1 London s2 Paris s3 Paris s4 London S2 City Status London 20 Paris 10 T2 Snum Status s1 20 s2 10 s3 10 s4 20 Both decompositions are LLJ 5

Suppose you wanted to insert the data (s5, London, 30) into each decomposition For decomposition S This would require inserting 1. < s5, London > into S1, and 2. < London, 30 > into S2 The insert into S2 would violate FD 2 For decomposition T This would require inserting 1. < s5, London > into T 1, and 2. < s5, 30 > into T 2 The fact that FD 2 is violated is not obvious from an examination of the individual tables The only way to determnine whether FD 2 is violated in T is to join T 1 and T 2 Restriction of set of FDs Given 1. set of FDs F, 2. schema R, 3. decomposition D = {R 1, R 2,...} Relational Design: Dependency Preservation The restriction of F to R i is the set of FDs in F + that are wholly contained in R i I.e., X Y F i if X Y R i and X Y F Denoted F i Let F = n i=1 F i Generally, F F But if F + F +, checking against F is equivalent to checking against F Dependency preservation A decomposition is dependency preserving if F + F + Rissamon s Theorem: A decomposition {R 1, R 2 } of R is DP if 1. {F 1 F 2 } + = F + 2. R 1 R 2 is candidate key of R 1 or R 2 Given scheme R, decomposition D = R 1, R 2,...t, and F To determine dependency preservation of F compute F + for (each R i in D) F i restriction of F to R i F F i compute F + if (F + = F + ) return TRUE else return FALSE Relational Design: Dependency Preservation - Algorithms 6

To determine dependency preservation of α β in F oldresult φ result α while (oldresult!= result) { oldresult result for (each R i ) { I result R i C I + T C R i result result T } } if (β in Result) return TRUE else return FALSE Relational Design: Normalization - Intro Normal form is a set of constraints on a DB schema Normal forms: Form Alt Name Restrictiveness Duplication 1NF least most 2NF 3NF Boyce-Codd NF 4NF 5NF Project-Join 6NF Domain-Key most least Except for 1NF, normal forms based on dependencies (2, 3, BNF: FDs; 4: MVDs; 5: JDs; 6: DK) A schema R is in 1NF if all attributes are atomic Composites: flatten (ala ER-RM mapping) Multivalued: Relational Design: Normalization - 1NF 1. Decompose into 2 tables (ala ER-RM mapping): R 1 contains PK + MV attribute R 2 contains R MV attribute 2. Use 1 table: For each key, have one row for each value of the MV attribute 7

Relational Design: Normalization - 2NF A non-prime attribute is not part of a candidate key An attribute is fully dependent on a set of FDs if it is not dependent on a subset of those attributes A schema R is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on the PK of R Alternative definition: No non-prime attribute is partially dependent on the PK To normalize to 2NF: 1. For every schema R in which FD X Y F + violates 2NF (a) Replace schema R with schemas i. R 1 = X Y ii. R 2 = R Y Transitive dependency Relational Design: Normalization - 3NF Y is transitively dependent on PK X if there is a Z such that 1. X Z, 2. Z Y, and 3. Z P K X Y is a transitive dependency on the PK A schema R is in 3NF if it is in 2NF and every non-prime attribute is non-transitively dependent on the PK of R To normalize to 3NF: 1. For every table R in which FD Z Y violates 3NF (a) create 2 tables: i. R 1 = Z Y ii. R 2 = R Y Codd s definition of 3NF is based on 2NF Given schema R and set of FDs F, create a 3NF decomposition directly from 1NF by: 1. Find minimal cover F c of F 2. For each unique set of attributes appearing on the lefthand side of an FD X Y i F c (a) Create a schema consisting of X n i=1 Y i 3. Create a schema containing any attributes of R not included in the previous step 4. If none of the schemas created contain a candidate key (a) Create a schema containing a candidate key Resulting DB schema guaranteed to be 1. Lossless join 2. Dependency preserving Not unique 8

Relational Design: Normalization - General Definitions for 2NF and 3NF A schema R is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent on every CK of R A schema R is in 3NF if it is in 2NF and every non-prime attribute is non-transitively dependent on every CK of R Alternative 3NF def: A schema R is in 3NF if for every non-trivial FD X Y F +, either 1. X is a superkey of R, or 2. Every attribute in Y is prime Relational Design: Normalization - BCNF A schema R is in BCNF if every attribute is fully functionally dependent on every CK of R Alternative BCNF def: A schema R is in BCNF if, for every non-trivial FD X Y F +, X is a superkey of R To normalize to BCNF: 1. For every schema R in which FD X Y F + violates BCNF (a) Replace schema R with schemas i. R 1 = X Y ii. R 2 = R Y Resulting DB schema guaranteed to be Lossless join Resulting DB schema not guaranteed to be Dependency preserving Unique 9