Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Similar documents
Lectures 6. Lecture 6: Design Theory

CMPT 354: Database System I. Lecture 9. Design Theory

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 13. Spring 2018

Database Design and Implementation

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CSE 344 MAY 16 TH NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

CSC 261/461 Database Systems Lecture 11

Introduction to Management CSE 344

CSE 544 Principles of Database Management Systems

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Introduction to Data Management CSE 344

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Schema Refinement. Feb 4, 2010

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

CSC 261/461 Database Systems Lecture 12. Spring 2018

SCHEMA NORMALIZATION. CS 564- Fall 2015

Schema Refinement and Normal Forms

CSE 344 AUGUST 6 TH LOSS AND VIEWS

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Relational Design Theory

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Schema Refinement and Normal Forms

CS54100: Database Systems

Relational Database Design

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Introduction to Data Management CSE 344

Schema Refinement and Normal Forms. Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

DECOMPOSITION & SCHEMA NORMALIZATION

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Design Theory for Relational Databases

Schema Refinement and Normal Forms Chapter 19

Design Theory for Relational Databases

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Schema Refinement & Normalization Theory

Information Systems (Informationssysteme)

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Design theory for relational databases

Schema Refinement and Normal Forms

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Chapter 3 Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases

CS322: Database Systems Normalization

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Constraints: Functional Dependencies

Databases 2012 Normalization

Constraints: Functional Dependencies

Functional Dependency and Algorithmic Decomposition

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Relational Database Design

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Schema Refinement and Normalization

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Lossless Joins, Third Normal Form

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Introduction to Data Management. Lecture #6 (Relational Design Theory)

INF1383 -Bancos de Dados

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Functional Dependencies & Normalization. Dr. Bassam Hammo

Schema Refinement and Normal Forms. Why schema refinement?

Functional Dependency Theory II. Winter Lecture 21

Schema Refinement and Normal Forms

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Database Design and Normalization

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Lecture #7 (Relational Design Theory, cont d.)

Functional Dependencies and Normalization

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Chapter 10. Normalization Ext (from E&N and my editing)

Normaliza)on and Func)onal Dependencies

Chapter 8: Relational Database Design

Chapter 7: Relational Database Design

Relational Design: Characteristics of Well-designed DB

Global Database Design based on Storage Space and Update Time Minimization

Consider a relation R with attributes ABCDEF GH and functional dependencies S:

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Transcription:

Design Theory BBM471 Database Management Systems Dr. Fuat Akal akal@hacettepe.edu.tr Design Theory I 2 Today s Lecture 1. Normal forms & functional dependencies 2. Finding functional dependencies 3. Closures, superkeys & keys 1. Normal forms & functional dependencies 3

What you will learn about in this section Design Theory 1. Overview of design theory & normal forms 2. Data anomalies & constraints 3. Functional dependencies Design theory is about how to represent your data to avoid anomalies. It is a mostly mechanical process Tools can carry out routine portions 5 6 Normal Forms 1 st Normal Form (1NF) 1 st Normal Form (1NF) = All tables are flat 2 nd Normal Form = disused Boyce-Codd Normal Form (BCNF) 3 rd Normal Form (3NF) DB designs based on functional dependencies, intended to prevent data anomalies Our focus in this lecture Student Courses Mary {CS145,CS229} Joe {CS145,CS106} Violates 1NF. Student Courses Mary CS145 Mary CS229 Joe CS145 Joe CS106 In 1 st NF 4 th and 5 th Normal Forms = see text books 1NF Constraint: Types must be atomic! 7 8

Constraints Prevent (some) Anomalies in the Data A poorly designed database causes anomalies: Data Anomalies & Constraints Student Course Room Mary CS145 B01 Joe CS145 B01 Sam CS145 B01...... If every course is in only one room, contains redundant information! 9 10 Constraints Prevent (some) Anomalies in the Data Constraints Prevent (some) Anomalies in the Data A poorly designed database causes anomalies: A poorly designed database causes anomalies: Student Course Room Mary CS145 B01 Joe CS145 C12 Sam CS145 B01...... If we update the room number for one tuple, we get inconsistent data = an update anomaly Student Course Room...... If everyone drops the class, we lose what room the class is in! = a delete anomaly 11 12

Constraints Prevent (some) Anomalies in the Data Constraints Prevent (some) Anomalies in the Data CS229 C12 A poorly designed database causes anomalies: Student Course Room Mary CS145 B01 Joe CS145 B01 Sam CS145 B01...... Similarly, we can t reserve a room without students = an insert anomaly Student Mary Joe Sam.... Course CS145 CS145 CS145 Course CS145 CS229 Room B01 C12 Is this form better? Redundancy? Update anomaly? Delete anomaly? Insert anomaly? Today: develop theory to understand why this design may be better and how to find this decomposition 13 14 Functional Dependency Functional Dependencies Def: Let A,B be sets of attributes We write A à B or say A functionally determines B if, for any tuples t 1 and t 2 : t 1 [A] = t 2 [A] implies t 1 [B] = t 2 [B] and we call A à B a functional dependency In another words: A->B means that whenever two tuples agree on A then they agree on B. 15 16

A Picture Of FDs Defn (again): Given attribute sets A={A 1,,A m } and B = {B 1, B n } in R, A Picture Of FDs Defn (again): Given attribute sets A={A 1,,A m } and B = {B 1, B n } in R, A 1 A m B 1 B n t i A 1 A m B 1 B n The functional dependency Aà B on R holds if for any t i,t j in R: t j 17 18 A Picture Of FDs Defn (again): Given attribute sets A={A 1,,A m } and B = {B 1, B n } in R, A Picture Of FDs Defn (again): Given attribute sets A={A 1,,A m } and B = {B 1, B n } in R, t i A 1 A m B 1 B n The functional dependency Aà B on R holds if for any t i,t j in R: t i A 1 A m B 1 B n The functional dependency Aà B on R holds if for any t i,t j in R: t j t i [A 1 ] = t j [A 1 ] AND t i [A 2 ]=t j [A 2 ] AND AND t i [A m ] = t j [A m ] t j if t i [A 1 ] = t j [A 1 ] AND t i [A 2 ]=t j [A 2 ] AND AND t i [A m ] = t j [A m ] If t i, t j agree here.. If t i,t j agree here.. they also agree here! then t i [B 1 ] = t j [B 1 ] AND t i [B 2 ]=t j [B 2 ] AND AND t i [B n ] = t j [B n ] 19 20

FDs for Relational Schema Design Functional Dependencies as Constraints High-level idea: why do we care about FDs? 1. Start with some relational schema 2. Model its functional dependencies (FDs) 3. Use these to design a better schema One which minimizes the possibility of anomalies A functional dependency is a form of constraint Holds on some instances not others. Part of the schema, helps define a valid instance. Recall: an instance of a schema is a multiset of tuples conforming to that schema, i.e. a table Student Course Room Mary CS145 B01 Joe CS145 B01 Sam CS145 B01...... Note: The FD {Course} -> {Room} holds on this instance 21 22 Functional Dependencies as Constraints More Examples Note that: You can check if an FD is violated by examining a single instance; However, you cannot prove that an FD is part of the schema by examining a single instance. This would require checking every valid instance Student Course Room Mary CS145 B01 Joe CS145 B01 Sam CS145 B01...... However, cannot prove that the FD {Course} -> {Room} is part of the schema An FD is a constraint which holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer 23 24

More Examples More Examples EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer {Position} à {Phone} but not {Phone} à {Position} 25 26 ACTIVITY A B C D E 1 2 4 3 6 3 2 5 1 8 1 4 4 5 7 1 2 4 3 6 3 2 5 1 8 Find at least three FDs which are violated on this instance: { } à { } { } à { } { } à { } 2. Finding functional dependencies

What you will learn about in this section 1. Good vs. Bad FDs: Intuition Good vs. Bad FDs We can start to develop a notion of good vs. bad FDs: 2. Finding FDs 3. Closures EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Intuitively: EmpID -> Name, Phone, Position is good FD Minimal redundancy, less possibility of anomalies 29 30 Good vs. Bad FDs We can start to develop a notion of good vs. bad FDs: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Intuitively: EmpID -> Name, Phone, Position is good FD But Position -> Phone is a bad FD Redundancy! Possibility of data anomalies Good vs. Bad FDs 31 Student Course Room Mary CS145 B01 Joe CS145 B01 Sam CS145 B01...... Returning to our original example can you see how the bad FD {Course} -> {Room} could lead to an: Update Anomaly Insert Anomaly Delete Anomaly Given a set of FDs (from user) our goal is to: 1. Find all FDs, and 2. Eliminate the Bad Ones". 32

FDs for Relational Schema Design Finding Functional Dependencies High-level idea: why do we care about FDs? 1. Start with some relational schema There can be a very large number of FDs How to find them all efficiently? 2. Find out its functional dependencies (FDs) 3. Use these to design a better schema One which minimizes possibility of anomalies This part can be tricky! We can t necessarily show that any FD will hold on all instances How to do this? We will start with this problem: Given a set of FDs, F, what other FDs must hold? 33 34 Finding Functional Dependencies Finding Functional Dependencies Equivalent to asking: Given a set of FDs, F = {f 1, f n }, does an FD g hold? Inference problem: How do we decide? Example: Products Name Color Category Dep Price Gizmo Green Gadget Toys 49 Widget Black Gadget Toys 59 Gizmo Green Whatsit Garden 99 Provided FDs: 1. {Name} à {Color} 2. {Category} à {Department} 3. {Color, Category} à {Price} Given the provided FDs, we can see that {Name, Category} à {Price} must also hold on any instance Which / how many other FDs do?!? 35 36

Finding Functional Dependencies Equivalent to asking: Given a set of FDs, F = {f 1, f n }, does an FD g hold? 1. Split/Combine A 1 A m B 1 B n Inference problem: How do we decide? Answer: Three simple rules called Armstrong s Rules. 1. Split/Combine, 2. Reduction (Trivial), and 3. Transitivity ideas by picture A 1,, A m à B 1,,B n 37 38 1. Split/Combine 1. Split/Combine A 1 A m B 1 B n A 1 A m B 1 B n A 1,, A m à B 1,,B n is equivalent to the following n FDs A 1,,A m à B i for i=1,,n 39 And vice-versa, A 1,,A m à B i for i=1,,n is equivalent to A 1,, A m à B 1,,B n 40

Reduction/Trivial 3. Transitive Closure A 1 A m A 1 A m B 1 B n C 1 C k A 1,,A m à A j for any j=1,,m A 1,, A m à B 1,,B n and B 1,,B n à C 1,,C k 41 42 3. Transitive Closure A 1 A m B 1 B n C 1 C k Finding Functional Dependencies Example: A 1,, A m à B 1,,B n and B 1,,B n à C 1,,C k Products Name Color Category Dept Price Gizmo Green Gadget Toys 49 Widget Black Gadget Toys 59 Gizmo Green Whatsit Garden 99 Provided FDs: 1. {Name} à {Color} 2. {Category} à {Dept} 3. {Color, Category} à {Price} implies Which / how many other FDs hold? A 1,,A m à C 1,,C k 43 44

Finding Functional Dependencies Finding Functional Dependencies Example: Example: Inferred FDs: Inferred FD Rule used 4. {Name, Category} -> {Name}? 5. {Name, Category} -> {Color}? 6. {Name, Category} -> {Category}? 7. {Name, Category -> {Color, Category}? 8. {Name, Category} -> {Price}? Provided FDs: 1. {Name} à {Color} 2. {Category} à {Dept.} 3. {Color, Category} à {Price} Inferred FDs: Inferred FD Rule used 4. {Name, Category} -> {Name} Trivial 5. {Name, Category} -> {Color} Transitive (4 -> 1) 6. {Name, Category} -> {Category} Trivial 7. {Name, Category -> {Color, Category} Split/combine (5 + 6) 8. {Name, Category} -> {Price} Transitive (7 -> 3) Provided FDs: 1. {Name} à {Color} 2. {Category} à {Dept.} 3. {Color, Category} à {Price} Which / how many other FDs hold? Can we find an algorithmic way to do this? 45 46 Closure of a set of Attributes Given a set of attributes A 1,, A n and a set of FDs F: Then the closure, {A 1,, A n } + is the set of attributes B s.t. {A 1,, A n } à B Closures Example: F = Example Closures: {name} à {color} {category} à {dept} {color, category} à {price} {name} + = {name, color} {name, category} + = {name, category, color, dept, price} {color} + = {color} 47 48

Closure Algorithm Closure Algorithm Start with X = {A 1,, A n } and set of FDs F. Repeat until X doesn t change; do: if {B 1,, B n } à C is entailed by F and {B 1,, B n } X F = Start with X = {A 1,, A n }, FDs F. Repeat until X doesn t change; do: if {B 1,, B n } à C is in F and {B 1,, B n } X: then add C to X. Return X as X + {name} à {color} {name, category} + = {name, category} then add C to X. {category} à {dept} Return X as X + {color, category} à {price} 49 50 Closure Algorithm Closure Algorithm Start with X = {A 1,, A n }, FDs F. Repeat until X doesn t change; do: if {B 1,, B n } à C is in F and {B 1,, B n } X: then add C to X. Return X as X + {name, category} + = {name, category} {name, category} + = {name, category, color} Start with X = {A 1,, A n }, FDs F. Repeat until X doesn t change; do: if {B 1,, B n } à C is in F and {B 1,, B n } X: then add C to X. Return X as X + {name, category} + = {name, category} {name, category} + = {name, category, color} F = {name} à {color} F = {name} à {color} {name, category} + = {name, category, color, dept} {category} à {dept} {category} à {dept} {color, category} à {price} {color, category} à {price} 51 52

F = Closure Algorithm Start with X = {A 1,, A n }, FDs F. Repeat until X doesn t change; do: if {B 1,, B n } à C is in F and {B 1,, B n } X: then add C to X. Return X as X + {name} à {color} {category} à {dept} {color, category} à {price} {name, category} + = {name, category} {name, category} + = {name, category, color} {name, category} + = {name, category, color, dept} {name, category} + = {name, category, color, dept, price} Example R(A,B,C,D,E,F) {A,B} à {C} {A,D} à {E} {B} à {D} {A,F} à {B} Compute {A,B} + = {A, B, } Compute {A, F} + = {A, F, } 53 Example Example R(A,B,C,D,E,F) {A,B} à {C} {A,D} à {E} {B} à {D} {A,F} à {B} R(A,B,C,D,E,F) {A,B} à {C} {A,D} à {E} {B} à {D} {A,F} à {B} Compute {A,B} + = {A, B, C, D } Compute {A, F} + = {A, F, B } Compute {A, B} + = {A, B, C, D, E} Compute {A, F} + = {A, B, C, D, E, F}

What you will learn about in this section 1. Closures Pt. II 3. Closures, Superkeys & Keys 2. Superkeys & Keys 58 Why Do We Need the Closure? With closure we can find all FD s easily To check if X A 1. Compute X + 2. Check if A Î X + Note here that X is a set of attributes, but A is a single attribute. Why does considering FDs of this form suffice? Recall the Split/combine rule: X à A 1,, X à A n implies X à {A 1,, A n } Using Closure to Infer ALL FDs Step 1: Compute X +, for every set of attributes X: {A} + = {A} {B} + = {B,D} {C} + = {C} {D} + = {D} {A,B} + = {A,B,C,D} {A,C} + = {A,C} {A,D} + = {A,B,C,D} {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D} {B,C,D} + = {B,C,D} {A,B,C,D} + = {A,B,C,D} Example: Given F = {A,B} à C {A,D} à B {B} à D No need to compute thesewhy? 59 60

Using Closure to Infer ALL FDs Step 1: Compute X +, for every set of attributes X: {A} + = {A}, {B} + = {B,D}, {C} + = {C}, {D} + = {D}, {A,B} + = {A,B,C,D}, {A,C} + = {A,C}, {A,D} + = {A,B,C,D}, {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D}, {B,C,D} + = {B,C,D}, {A,B,C,D} + = {A,B,C,D} Example: Given F = {A,B} à C {A,D} à B {B} à D Using Closure to Infer ALL FDs Step 1: Compute X +, for every set of attributes X: {A} + = {A}, {B} + = {B,D}, {C} + = {C}, {D} + = {D}, {A,B} + = {A,B,C,D}, {A,C} + = {A,C}, {A,D} + = {A,B,C,D}, {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D}, {B,C,D} + = {B,C,D}, {A,B,C,D} + = {A,B,C,D} Example: Given F = {A,B} à C {A,D} à B {B} à D Step 2: Enumerate all FDs X à Y, s.t. Y Í X + and X Ç Y = Æ: {A,B} à {C,D}, {A,D} à {B,C}, {A,B,C} à {D}, {A,B,D} à {C}, {A,C,D} à {B} Step 2: Enumerate all FDs X à Y, s.t. Y Í X + and X Ç Y = Æ: {A,B} à {C,D}, {A,D} à {B,C}, {A,B,C} à {D}, {A,B,D} à {C}, {A,C,D} à {B} Y is in the closure of X 61 62 Using Closure to Infer ALL FDs Step 1: Compute X +, for every set of attributes X: {A} + = {A}, {B} + = {B,D}, {C} + = {C}, {D} + = {D}, {A,B} + = {A,B,C,D}, {A,C} + = {A,C}, {A,D} + = {A,B,C,D}, {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D}, {B,C,D} + = {B,C,D}, {A,B,C,D} + = {A,B,C,D} Example: Given F = Step 2: Enumerate all FDs X à Y, s.t. Y Í X + and X Ç Y = Æ: {A,B} à {C,D}, {A,D} à {B,C}, {A,B,C} à {D}, {A,B,D} à {C}, {A,C,D} à {B} {A,B} à C {A,D} à B {B} à D The FD X à Y is non-trivial Superkeys and Keys 63 64

Keys and Superkeys A superkey is a set of attributes A 1,, A n s.t. for any other attribute B in R, we have {A 1,, A n } à B i.e. all attributes are functionally determined by a superkey. Note: superkey is short for super set of keys. Finding Keys and Superkeys For each set of attributes X 1. Compute X + 2. If X + = set of all attributes then X is a superkey A key is a minimal superkey Meaning that no subset of a key is also a superkey 3. If X is minimal, then it is a key Do we need to check all sets of attributes? Which sets? 65 66 Example of Finding Keys Product(name, price, category, color) Example of Keys Product(name, price, category, color) {name, category} à price {category} à color {name, category} à price {category} à color What is a key? {name, category} + = {name, price, category, color} = the set of all attributes this is a superkey this is a key, since neither name nor category alone is a superkey 67 68

Today s Lecture 1. Boyce-Codd Normal Form Design Theory II 2. Decompositions & 3NF 3. MVDs 69 70 What you will learn about in this section 1. Conceptual Design 1. Boyce-Codd Normal Form 2. Boyce-Codd Normal Form 3. The BCNF Decomposition Algorithm 72

Back to Conceptual Design Now that we know how to find FDs, it s a straight-forward process: Conceptual Design 1. Search for bad FDs 2. If there are any, then keep decomposing the table into sub-tables until no more bad FDs 3. When done, the database schema is normalized 73 74 Boyce-Codd Normal Form (BCNF) Main idea is that we define good and bad FDs as follows: X à A is a good FD if X is a (super)key In other words, if A is the set of all attributes Boyce-Codd Normal Form (BCNF) Why does this definition of good and bad FDs make sense? If X is not a (super)key, it functionally determines some of the attributes X à A is a bad FD otherwise We will try to eliminate the bad FDs! Recall: this means there is redundancy And redundancy like this can lead to data anomalies! EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer 75 76

Boyce-Codd Normal Form Example BCNF is a simple condition for removing anomalies from relations: A relation R is in BCNF if: if {A 1,..., A n } à B is a non-trivial FD in R then {A 1,..., A n } is a superkey for R Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield {SSN} à {Name,City} This FD is bad because it is not a superkey Equivalently: sets of attributes X, either (X + = X) or (X + = all attributes) In other words: there are no bad FDs Not in BCNF What is the key? {SSN, PhoneNumber} 77 78 Example BCNF Decomposition Algorithm Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Madison SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 Now in BCNF! {SSN} à {Name,City} This FD is now good because it is the key Let s check anomalies: Redundancy? Update? Delete? BCNFDecomp(R): Find X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2) 79 80

BCNF Decomposition Algorithm BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R Find a set of attributes X which has non-trivial bad FDs, i.e. is not a superkey, using closures BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R If no bad FDs found, in BCNF! let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Return BCNFDecomp(R1), BCNFDecomp(R2) 81 82 BCNF Decomposition Algorithm BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] Split into one relation (table) with X plus the attributes that X determines (Y) if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X È Y) and R2(X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Let Y be the attributes that X functionally determines (+ that are not in X) And let Z be the other attributes that it doesn t if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Y X R 1 R 2 Z 83 84

BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] And one relation with X plus the attributes it does not determine (Z) BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Y X R 1 R 2 Z if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) Proceed recursively until no more bad FDs! 85 86 Example Example R(A,B,C,D,E) BCNFDecomp(R): Find a set of attributes X s.t.: X + X and X + [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X È Y) and R 2 (X È Z) R(A,B,C,D,E) {A} à {B,C} {C} à {D} R 1 (A,B,C,D) {C} + = {C,D} {A,B,C,D} R(A,B,C,D,E) {A} + = {A,B,C,D} {A,B,C,D,E} {A} à {B,C} {C} à {D} Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) R 11 (C,D) R 12 (A,B,C) R 2 (A,E) 87 88

Recap: Decompose to remove redundancies 1. We saw that redundancies in the data ( bad FDs ) can lead to data anomalies 2. Decompositions 2. We developed mechanisms to detect and remove redundancies by decomposing tables into BCNF 1. BCNF decomposition is standard practice- very powerful & widely used! 3. However, sometimes decompositions can lead to more subtle unwanted effects When does this happen? 90 Decompositions in General R(A 1,...,A n,b 1,...,B m,c 1,...,C p ) R 1 (A 1,...,A n,b 1,...,B m ) R 2 (A 1,...,A n,c 1,...,C p ) Theory of Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera Sometimes a decomposition is correct i.e. it is a Lossless decomposition R 1 = the projection of R on A 1,..., A n, B 1,..., B m R 2 = the projection of R on A 1,..., A n, C 1,..., C p Name Price Gizmo 19.99 OneClick 24.99 Gizmo 19.99 Name Gizmo OneClick Gizmo Category Gadget Camera Camera 91 92

Lossy Decomposition Name Price Category Gizmo 19.99 Gadget OneClick 24.99 Camera Gizmo 19.99 Camera However sometimes it isn t What s wrong here? Lossless Decompositions R(A 1,...,A n,b 1,...,B m,c 1,...,C p ) R 1 (A 1,...,A n,b 1,...,B m ) R 2 (A 1,...,A n,c 1,...,C p ) Name Gizmo OneClick Gizmo Category Gadget Camera Camera Price Category 19.99 Gadget 24.99 Camera 19.99 Camera What (set) relationship holds between R1 Join R2 and R if lossless? Hint: Which tuples of R will be present? It s lossless if we have equality! 93 94 Lossless Decompositions R(A 1,...,A n,b 1,...,B m,c 1,...,C p ) Lossless Decompositions R(A 1,...,A n,b 1,...,B m,c 1,...,C p ) R 1 (A 1,...,A n,b 1,...,B m ) R 2 (A 1,...,A n,c 1,...,C p ) R 1 (A 1,...,A n,b 1,...,B m ) R 2 (A 1,...,A n,c 1,...,C p ) A decomposition R to (R1, R2) is lossless if R = R1 Join R2 If {A 1,..., A n } à {B 1,..., B m } Then the decomposition is lossless Note: don t need {A 1,..., A n } à {C 1,..., C p } BCNF decomposition is always lossless. Why? 95 96

A problem with BCNF A Problem with BCNF Problem: To enforce a FD, must reconstruct original relation on each insert! Note: This is historically inaccurate, but it makes it easier to explain Unit Unit Company Product Company Unit Product {Unit} à {Company} {Company,Product} à {Unit} We do a BCNF decomposition on a bad FD: {Unit} + = {Unit, Company} {Unit} à {Company} We lose the FD {Company,Product} à {Unit}!! 97 98 So Why is that a Problem? The Problem Unit Company Unit Galaga99 UW Galaga99 Bingo UW Bingo {Unit} à {Company} Product Databases Databases No problem so far. All local FD s are satisfied. We started with a table R and FDs F We decomposed R into BCNF tables R 1, R 2, with their own FDs F 1, F 2, Unit Company Product Galaga99 UW Databases Bingo UW Databases Let s put all the data back into a single table again: Violates the FD {Company,Product} à {Unit}!! We insert some tuples into each of the relations which satisfy their local FDs but when reconstruct it violates some FD across tables! 99 Practical Problem: To enforce FD, must reconstruct R on each insert! 100

Possible Solutions Various ways to handle so that decompositions are all lossless / no FDs lost For example 3NF- stop short of full BCNF decompositions. Usually a tradeoff between redundancy / data anomalies and FD preservation 3. MVDs BCNF still most common- with additional steps to keep track of lost FDs 101 What you will learn about in this section 1. MVDs Multi-Value Dependencies (MVDs) A multi-value dependency (MVD) is another type of dependency that could hold in our data, which is not captured by FDs Formal definition: Given a relation R having attribute set A, and two sets of attributes %, ' ( The multi-value dependency (MVD) ) ' holds on R if for any tuples +,, + - / s.t. +, ) = + - [)], there exists a tuple t 3 s.t.: t 1 [X] = t 2 [X] = t 3 [X] t 1 [Y] = t 3 [Y] t 2 [A\Y] = t 3 [A\Y] Where A \ B means elements of set A not in set B 103 104

Multi-Value Dependencies (MVDs) One less formal, literal way to phrase the definition of an MVD: The MVD ) ' holds on R if for any pair of tuples with the same X values, the swapped pair of tuples with the same X values, but the other permutations of Y and A\Y values, is also in R Multi-Value Dependencies (MVDs) Another way to understand MVDs, in terms of conditional independence: The MVD ) ' holds on R if given X, Y is conditionally independent of A \ Y and vice versa Ex: X = {x}, Y = {y}: x y z 1 0 1 1 1 0 For ) ' to hold must have x y z 1 0 1 1 1 0 1 0 0 1 1 1 Note the connection to a local crossproduct Here, given x = 1, we know for ex. that: y = 0 à z = 1 I.e. z is conditionally dependent on y given x x y z 1 0 1 1 1 0 Here, this is not the case! I.e. z is conditionally independent of y given x x y z 1 0 1 1 1 0 1 0 0 1 1 1 105 106 MVDs: Movie Theatre Example MVDs: Movie Theatre Example Movie_theater film_name snack Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito Are there any functional dependencies that might hold here? Movie_theater film_name snack Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito For a given movie theatre Kale Chips Burrito No Kale Chips Burrito Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta And yet it seems like there is some pattern / dependency 107 108

MVDs: Movie Theatre Example MVDs: Movie Theatre Example Movie_theater film_name snack Star Trek: The Wrath of Kahn Kale Chips For a given movie theatre Movie_theater film_name snack Star Trek: The Wrath of Kahn Kale Chips For a given movie theatre Star Trek: The Wrath of Kahn Burrito Kale Chips Given a set of movies and snacks Star Trek: The Wrath of Kahn Burrito Kale Chips Given a set of movies and snacks Burrito Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta Burrito Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta Any movie / snack combination is possible! 109 110 MVDs: Movie Theatre Example MVDs: Movie Theatre Example t 1 Movie_theater (A) film_name (B) Snack (C) Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito Kale Chips More formally, we write {A} {B} if for any tuples t 1,t 2 s.t. t 1 [A] = t 2 [A] t 1 t 3 Movie_theater (A) film_name (B) Snack (C) Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito Kale Chips More formally, we write {A} {B} if for any tuples t 1,t 2 s.t. t 1 [A] = t 2 [A] there is a tuple t 3 s.t. t 3 [A] = t 1 [A] t 2 Burrito t 2 Burrito Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta 111 112

MVDs: Movie Theatre Example MVDs: Movie Theatre Example t 1 t 3 t 2 Movie_theater (A) film_name (B) Snack (C) Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito Kale Chips Burrito More formally, we write {A} {B} if for any tuples t 1,t 2 s.t. t 1 [A] = t 2 [A] there is a tuple t 3 s.t. t 3 [A] = t 1 [A] t 3 [B] = t 1 [B] t 1 t 3 t 2 Movie_theater (A) film_name (B) Snack (C) Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito Kale Chips Burrito More formally, we write {A} {B} if for any tuples t 1,t 2 s.t. t 1 [A] = t 2 [A] there is a tuple t 3 s.t. t 3 [A] = t 1 [A] t 3 [B] = t 1 [B] and t 3 [R\B] = t 2 [R\B] Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta Where R\B is R minus B i.e. the attributes of R not in B. 113 114 MVDs: Movie Theatre Example MVDs: Movie Theatre Example t 2 t 3 t 1 Movie_theater (A) film_name (B) Snack (C) Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito Kale Chips Burrito Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta Note this also works! Remember, an MVD holds over a relation or an instance, so defn. must hold for every applicable pair t 2 t 3 t 1 Movie_theater (A) film_name (B) Snack (C) Star Trek: The Wrath of Kahn Kale Chips Star Trek: The Wrath of Kahn Burrito Kale Chips Burrito Rains 218 Star Wars: The Boba Fett Prequel Ramen Rains 218 Star Wars: The Boba Fett Prequel Plain Pasta This expresses a sort of dependency (= data redundancy) that we can t express with FDs *Actually, it expresses conditional independence (between film and snack given movie theatre)! 115 116

Connection to FDs If A à B does A B? Comments on MVDs MVDs have rules too! Experts: Axiomatizable 4 th Normal Form is non-trivial MVD Hint: sort of like multiplying by one For AI nerds: MVD is conditional independence in graphical models! 117 118 Summary Constraints allow one to reason about redundancy in the data Normal forms describe how to remove this redundancy by decomposing relations Elegant by representing data appropriately certain errors are essentially impossible For FDs, BCNF is the normal form. A tradeoff for insert performance: 3NF Acknowledgements The course material used for this lecture is mostly taken and/or adopted from the course materials of the CS145 Introduction to Databases lecture given by Christopher Ré at Stanford University (http://web.stanford.edu/class/cs145/). 119 120