CS54100: Database Systems

Similar documents
Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases

Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Relational Database Design

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

SCHEMA NORMALIZATION. CS 564- Fall 2015

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Schema Refinement & Normalization Theory

CS322: Database Systems Normalization

INF1383 -Bancos de Dados

Relational Design Theory

CS54100: Database Systems

Constraints: Functional Dependencies

Schema Refinement and Normal Forms

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Relational Database Design

Chapter 8: Relational Database Design

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Lossless Joins, Third Normal Form

Functional Dependencies & Normalization. Dr. Bassam Hammo

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Database Design and Implementation

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Schema Refinement. Feb 4, 2010

CSC 261/461 Database Systems Lecture 11

Schema Refinement and Normal Forms. Why schema refinement?

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Constraints: Functional Dependencies

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Databases 2012 Normalization

Relational Design: Characteristics of Well-designed DB

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Chapter 19

Chapter 7: Relational Database Design

CSC 261/461 Database Systems Lecture 13. Spring 2018

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Lectures 6. Lecture 6: Design Theory

Design theory for relational databases

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CSC 261/461 Database Systems Lecture 12. Spring 2018

Functional Dependencies and Normalization

Information Systems (Informationssysteme)

CSE 132B Database Systems Applications

Schema Refinement and Normal Forms

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Schema Refinement: Other Dependencies and Higher Normal Forms

Relational-Database Design

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schedule. Today: Jan. 17 (TH) Jan. 24 (TH) Jan. 29 (T) Jan. 22 (T) Read Sections Assignment 2 due. Read Sections Assignment 3 due.

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Functional Dependency Theory II. Winter Lecture 21

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

Schema Refinement and Normal Forms Chapter 19

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Functional Dependency and Algorithmic Decomposition

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Chapter 10. Normalization Ext (from E&N and my editing)

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

CSE 344 AUGUST 3 RD NORMALIZATION

Schema Refinement and Normalization

CSIT5300: Advanced Database Systems

CSE 344 AUGUST 6 TH LOSS AND VIEWS

CMPT 354: Database System I. Lecture 9. Design Theory

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Functional Dependencies

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Problem about anomalies

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

CSE 344 MAY 16 TH NORMALIZATION

Normal Forms Lossless Join.

Schema Refinement and Normal Forms

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Introduction to Data Management CSE 344

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Functional Dependencies and Normalization

Transcription:

CS54100: Database Systems Keys and Dependencies 18 January 2012 Prof. Chris Clifton Functional Dependencies X A = assertion about a relation R that whenever two tuples agree on all the attributes of X, then they must also agree on attribute A Why do we care? Knowing functional dependencies provides a formal mechanism to divide up relations (normalization) Saves space Prevents storing data that violates dependencies Fall 2007 Chris Clifton - CS541 2 2012 Chris Clifton 1

Example Drinkers(name, addr, beersliked, manf, favoritebeer) name addr beersliked manf f av oritebeer Janeway Voy ager Bud A.B. WickedAle Janeway Voy ager WickedAle Pete's WickedAle Spock Enterprise Bud A.B. Bud Reasonable FD's to assert: 1. name addr 2. name favoritebeer 3. beersliked manf Shorthand: combine FD's with common left side by concatenating their right sides. Sometimes, several attributes jointly determine another attribute, although neither does by itself. Example: beer bar price Fall 2007 Chris Clifton - CS541 4 2012 Chris Clifton 2

Keys of Relations K is a key for relation R if: 1. K all attributes of R. (Uniqueness) 2. For no proper subset of K is (1) true. (Minimality) If K at least satisfies (1), then K is a superkey. Conventions Pick one key; underline key attributes in the relation schema. X, etc., represent sets of attributes; A etc., represent single attributes. No set formers in FD s, e.g., ABC instead of {A, B, C}. Fall 2007 Chris Clifton - CS541 5 Example Drinkers(name, addr, beersliked, manf, favoritebeer) {name, beersliked} FD s all attributes, as seen. Shows {name, beersliked} is a superkey. name beersliked is false, so name not a superkey. beersliked name also false, so beersliked not a superkey. Thus, {name, beersliked} is a key. No other keys in this example. Neither name nor beersliked is on the right of any observed FD, so they must be part of any superkey. Important point: key in a relation refers to tuples, not the entities they represent. If an entity is represented by several tuples, then entity-key will not be the same as relation-key. 2012 Chris Clifton 3

Example 2 Lastname Firstname Student ID Major Key (2 attributes) Key Superkey Keys are {Lastname, Firstname} and {StudentID} Note: There are alternate keys Fall 2007 Chris Clifton - CS541 7 Who Determines Keys/FD s? We could assert a key K. Then the only FD s asserted are that K A for every attribute A. No surprise: K is then the only key for those FD s, according to the formal definition of key. Or, we could assert some FD s and deduce one or more keys by the formal definition. E/R diagram implies FD s by key declarations and many-one relationship declarations. Rule of thumb: FD s either come from keyness, many-1 relationship, or from physics. E.g., no two courses can meet in the same room at the same time yields room time course. Fall 2007 Chris Clifton - CS541 8 2012 Chris Clifton 4

CS54100: Database Systems Keys and Dependencies 20 January 2012 Prof. Chris Clifton Functional Dependencies (FD s) and Many-One Relationships Consider R(A1,, An) and X is a key then X Y for any attributes Y in A1,, An even if they overlap with X. Why? Suppose R is used to represent a many one relationship: E1 entity set E2 entity set where X key for E1, Y key for E2, Then, X Y holds, And Y X does not hold unless the relationship is oneone. What about many-many relationships? Fall 2007 Chris Clifton - CS541 10 2012 Chris Clifton 5

Inferring FD s And this is important because When we talk about improving relational designs, we often need to ask does this FD hold in this relation? Given FD s X1 A1, X2 A2,, Xn An, does FD Y B necessarily hold in the same relation? Start by assuming two tuples agree in Y. Use given FD s to infer other attributes on which they must agree. If B is among them, then yes, else no. Fall 2007 Chris Clifton - CS541 11 Algorithm Define Y + = closure of Y = set of attributes functionally determined by Y: Basis: Y + :=Y. Induction: If X Y +, and X A is a given FD, then add A to Y +. X A Y+ new Y+ End when Y + cannot be changed. 2012 Chris Clifton 6

A B, BC D. A + = AB. C + =C. (AC) + = ABCD. Example A C B D Fall 2007 Chris Clifton - CS541 13 Given Versus Implied FD s Typically, we state a few FD s that are known to hold for a relation R. Other FD s may follow logically from the given FD s; these are implied FD s. We are free to choose any basis for the FD s of R a set of FD s that imply all the FD s that hold for R. Fall 2007 Chris Clifton - CS541 14 2012 Chris Clifton 7

Finding All Implied FD s Motivation: Suppose we have a relation ABCD with some FD s F. If we decide to decompose ABCD into ABC and AD, what are the FD s for ABC, AD? Example: F = AB C, C D, D A. It looks like just AB C holds in ABC, but in fact C A follows from F and applies to relation ABC. Problem is exponential in worst case. Fall 2007 Chris Clifton - CS541 15 Algorithm For each set of attributes X compute X +. But skip X =, X = all attributes. Add X A for each A in X + X. Drop XY A if X A holds. Consequence: If X + is all attributes, then there is no point in computing closure of supersets of X. Finally, project the FD s by selecting only those FD s that involve only the attributes of the projection. Notice that after we project the discovered FD s onto some relation, the eliminated FD s can be inferred in the projected relation. 2012 Chris Clifton 8

Example F = AB C, C D, D A. What FD s follow? A + = A; B + =B (nothing). C + =ACD (add C A). D + =AD (nothing new). (AB) + =ABCD (add AB D; skip all supersets of AB). (BC) + =ABCD (nothing new; skip all supersets of BC). (BD) + =ABCD (add BD C; skip all supersets of BD). (AC) + =ACD; (AD) + =AD; (CD) + =ACD (nothing new). (ACD) + =ACD (nothing new). All other sets contain AB, BC, or BD, so skip. Thus, the only interesting FD s that follow from F are: C A, AB D, BD C. Set of FD s in ABCGHI: A B A C CG H CG I B H Example 2 Compute (CG) +, (BG) +, (AG) + Fall 2007 Chris Clifton - CS541 18 2012 Chris Clifton 9

Example 3 In ABC with FD s A B, B C, project onto AC. 1. A + = ABC; yields A B, A C. 2. B + = BC; yields B C. 3. AB + = ABC; yields AB C; drop in favor of A C 4. AC + = ABC yields AC B; drop in favor of A B 5. C + = C and BC + = BC; adds nothing. Resulting FD s: A B, A C, B C. Projection onto AC: A C. Fall 2007 Chris Clifton - CS541 19 FDs: Armstrong s Axioms Reflexivity: If {B 1, B 2,, B m } {A 1, A 2,, A n } A 1 A 2 A n B 1 B 2 B m Also called trivial FDs Augmentation: A 1 A 2 A n B 1 B 2 B m A 1 A 2 A n C 1 C 2 C k B 1 B 2 B m C 1 C 2 C k Transitivity: A 1 A 2 A n B 1 B 2 B m and B 1 B 2 B m C 1 C 2 C k A 1 A 2 A n C 1 C 2 C k Fall 2007 Chris Clifton - CS541 21 2012 Chris Clifton 10

CS54100: Database Systems Normalization 23 January 2012 Prof. Chris Clifton Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD s follow from the fact key everything. Formally, R is in BCNF if for every nontrivial FD for R, say X A, then X is a superkey. Nontrivial = right-side attribute not in left side. Why? 1. Guarantees no redundancy due to FD s. 2. Guarantees no update anomalies = one occurrence of a fact is updated, not all. 3. Guarantees no deletion anomalies = valid fact is lost when tuple is deleted. 2012 Chris Clifton 11

Example of Problems Drinkers(name, addr, beersliked, manf, favoritebeer) name addr beersliked manf f av oritebeer Janeway Voy ager Bud A.B. WickedAle Janeway??? WickedAle Pete's??? Spock Enterprise Bud??? Bud FD s: 1. name addr 2. name favoritebeer 3. beersliked manf??? s are redundant, since we can figure them out from the FD s. Update anomalies: If Janeway gets transferred to the Intrepid, will we change addr in each of her tuples? Deletion anomalies: If nobody likes Bud, we lose track of Bud s manufacturer. Why are these problems? Each of the given FD s is a BCNF violation: Key = {name, beersliked} Each of the given FD s has a left side that is a proper subset of the key. Another Example Beers(name, manf, manfaddr). FD s = name manf, manf manfaddr. Only key is name. Manf manfaddr violates BCNF with a left side unrelated to any key. 2012 Chris Clifton 12

Lossless Join Goal: All legal values can be stored in relations Recover originals through join Formally: X, Y is a lossless join decomposition of R w.r.t. F if r R satisfying dependencies in F, π X (r) π Y (r) = r CS54100 Decomposition to Reach BCNF Setting: relation R, given FD s F. Suppose relation R has BCNF violation X B. We need only look among FD s of F for a BCNF violation, not those that follow from F. Proof: If Y A is a BCNF violation and follows from F, then the computation of Y + used at least one FD X B from F. X must be a subset of Y. Thus, if Y is not a superkey, X cannot be a superkey either, and X B is also a BCNF violation. Fall 2007 Chris Clifton - CS541 29 2012 Chris Clifton 13

1. Compute X +. Cannot be all attributes why? 2. Decompose R into X + and (R X + ) X. R X X + 3. Find the FD s for the decomposed relations. Project the FD s from F = calculate all consequents of F that involve only attributes from X + or only from (R X + ) X. Example R = Drinkers(name, addr, beersliked, manf, favoritebeer) F = 1. name addr 2. name favoritebeer 3. beersliked manf Pick BCNF violation name addr. Close the left side: name + = name addr favoritebeer. Decomposed relations: Drinkers1(name, addr, favoritebeer) Drinkers2(name, beersliked, manf) Projected FD s (skipping a lot of work that leads nowhere interesting): For Drinkers1: name addr and name favoritebeer. For Drinkers2: beersliked manf. 2012 Chris Clifton 14

(Repeating) Decomposed relations: Drinkers1(name, addr, favoritebeer) Drinkers2(name, beersliked, manf) Projected FD s: For Drinkers1: name addr and name favoritebeer. For Drinkers2: beersliked manf. BCNF violations? For Drinkers1, name is key and all left sides of FD s are superkeys. For Drinkers2, {name, beersliked} is the key, and beersliked manf violates BCNF. Decompose Drinkers2 First set of decomposed relations: Drinkers1(name, addr, favoritebeer) Drinkers2(name, beersliked, manf) Close beersliked + = beersliked, manf. Decompose Drinkers2 into: Drinkers3(beersLiked, manf) Drinkers4(name, beersliked) Resulting relations are all in BCNF: Drinkers1(name, addr, favoritebeer) Drinkers3(beersLiked, manf) Drinkers4(name, beersliked) 2012 Chris Clifton 15

3NF One FD structure causes problems: If you decompose, you can t check all the FD s only in the decomposed relations. If you don t decompose, you violate BCNF. Abstractly: AB C and C B. Example 1: title city theatre and theatre city. Example 2: street city zip, zip city. Keys: {A, B} and {A, C}, but C B has a left side that is not a superkey. Suggests decomposition into BC and AC. But you can t check the FD AB C in only these relations. Example A = street, B = city, C = zip. street zip 545 Tech Sq. 02138 545 Tech Sq. 02139 Join: city zip Cambridge 02138 Cambridge 02139 city street zip Cambridge 545 Tech Sq. 02138 Cambridge 545 Tech Sq. 02139 zip city street city zip Fall 2007 Chris Clifton - CS541 35 2012 Chris Clifton 16

Elegant Workaround Define the problem away. A relation R is in 3NF iff (if and only if) for every nontrivial FD X A, either: 1. X is a superkey, or 2. A is prime = member of at least one key. Thus, the canonical problem goes away: you don t have to decompose because all attributes are prime. Fall 2007 Chris Clifton - CS541 36 What 3NF Gives You There are two important properties of a decomposition: 1. We should be able to recover from the decomposed relations the data of the original. Recovery involves projection and join, which we shall defer until we ve discussed relational algebra. 2. We should be able to check that the FD s for the original relation are satisfied by checking the projections of those FD s in the decomposed relations. Without proof, we assert that it is always possible to decompose into BCNF and satisfy (1). Also without proof, we can decompose into 3NF and satisfy both (1) and (2). But it is not possible to decompose into BNCF and get both (1) and (2). Street-city-zip is an example of this point. 2012 Chris Clifton 17

3NF Synthesis Given a canonical cover F C for F Schema S = A B Fc If there is no R i S such that AB R i S = S + AB If there is no R i S containing a candidate key for R S = S + (any candidate key for R) CS54100 CS54100: Database Systems Multivalued Dependencies 25 January 2012 Prof. Chris Clifton 2012 Chris Clifton 18

Multivalued Dependencies The multivalued dependency X Y holds in a relation R if whenever we have two tuples of R that agree in all the attributes of X, then we can swap their Y components and get two new tuples that are also in R. X Y others Fall 2007 Chris Clifton - CS541 40 Example Drinkers(name, addr, phones, beersliked) with MVD Name phones. If Drinkers has the two tuples: name addr phones beersliked sue a p1 b1 sue a p2 b2 it must also have the same tuples with phones components swapped: name addr phones beersliked sue a p2 b1 sue a p1 b2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair. 2012 Chris Clifton 19

MVD Rules 1. Every FD is an MVD. Because if X Y, then swapping Y s between tuples that agree on X doesn t create new tuples. Example, in Drinkers: name addr. 2. Complementation: if X Y, then X Z, where Z is all attributes not in X or Y. Example: since name phones holds in Drinkers, so does name addr beersliked. Splitting Doesn t Hold Sometimes you need to have several attributes on the right of an MVD. For example: Drinkers(name, areacode, phones, beersliked, beermanf) name areacode phones beersliked beermanf Sue 831 555-1111 Bud A.B. Sue 831 555-1111 Wicked Ale Pete s Sue 408 555-9999 Bud A.B. Sue 408 555-9999 Wicked Ale Pete s name areacode phones holds, but neither name areacode nor name phones do. 2012 Chris Clifton 20

4NF Eliminate redundancy due to multiplicative effect of MVD s. Roughly: treat MVD s as FD's for decomposition, but not for finding keys. Formally: R is in Fourth Normal Form if whenever MVD X Y is nontrivial (Y is not a subset of X, and X Y is not all attributes), then X is a superkey. Remember, X Y implies X Y, so 4NF is more stringent than BCNF. Decompose R, using 4NF violation X Y, R X Y into XY and X (R Y). Example Drinkers(name, addr, phones, beersliked) FD: name addr Nontrivial MVD s: name phones and name beersliked. Only key: {name, phones, beersliked} All three dependencies above violate 4NF. Successive decomposition yields 4NF relations: D1(name, addr) D2(name, phones) D3(name, beersliked) 2012 Chris Clifton 21

4NF Decomposition Schema S = R, D+ be the closure of the functional and multivalued dependencies While R i S not in 4NF w.r.t. D+ Choose a nontrivial multivalued dependency A B that holds on R i, where A R i D+, and A B = S = (S R i ) (R i -B) (A,B) CS54100 2012 Chris Clifton 22