CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Similar documents
10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Management CSE 344

CSE 344 MAY 16 TH NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION

Database Design and Implementation

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Schema Refinement. Feb 4, 2010

Introduction to Data Management CSE 344

CSE 544 Principles of Database Management Systems

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

CMPT 354: Database System I. Lecture 9. Design Theory

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Lectures 6. Lecture 6: Design Theory

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Relational Design Theory

CSC 261/461 Database Systems Lecture 11

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

SCHEMA NORMALIZATION. CS 564- Fall 2015

CSC 261/461 Database Systems Lecture 13. Spring 2018

CSC 261/461 Database Systems Lecture 12. Spring 2018

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Schema Refinement and Normal Forms

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Design Theory for Relational Databases

Design theory for relational databases

CSE 344 AUGUST 6 TH LOSS AND VIEWS

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Problem about anomalies

Chapter 3 Design Theory for Relational Databases

Relational Database Design

Design Theory for Relational Databases

CS54100: Database Systems

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

DECOMPOSITION & SCHEMA NORMALIZATION

MIS Database Systems Schema Refinement and Normal Forms

Introduction to Data Management CSE 344

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Chapter 3 Design Theory for Relational Databases

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

MIS Database Systems Schema Refinement and Normal Forms

Schema Refinement & Normalization Theory

Functional Dependencies & Normalization. Dr. Bassam Hammo

Functional Dependencies

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

INF1383 -Bancos de Dados

Relational Database Design

Functional Dependency Theory II. Winter Lecture 21

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

CS322: Database Systems Normalization

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Schema Refinement and Normal Forms. Chapter 19

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Constraints: Functional Dependencies

Information Systems (Informationssysteme)

CSE 132B Database Systems Applications

Lossless Joins, Third Normal Form

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Constraints: Functional Dependencies

Schema Refinement and Normal Forms

Databases 2012 Normalization

Chapter 7: Relational Database Design

Schema Refinement and Normal Forms

Chapter 8: Relational Database Design

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Schema Refinement and Normal Forms Chapter 19

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Functional Dependencies and Normalization

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Schema Refinement and Normal Forms. Why schema refinement?

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Relational-Database Design

Exam 1 Solutions Spring 2016

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Functional Dependency and Algorithmic Decomposition

Functional Dependencies

Transcription:

CSE 303: Database Lecture 10 Chapter 3: Design Theory of Relational Database Outline 1st Normal Form = all tables attributes are atomic 2nd Normal Form = obsolete Boyce Codd Normal Form = will study 3rd Normal Form = see book 1 2 First Normal Form (1NF) A database schema is in First Normal Form (1NF) if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. First Normal Form (1NF) Customer Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 789 Maria Fernandez 555-808-9633 3 4 1

Not in First Normal Form (1NF) Customer Now in First Normal Form (1NF) Customer Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 555-776-4100 789 Maria Fernandez 555-808-9633 Customer ID First Name Surname Telephone Number 123 Robert Ingram 555-861-2025 456 Jane Wright 555-403-1659 456 Jane Wright 555-776-4100 789 Maria Fernandez 555-808-9633 5 6 Now in First Normal Form (1NF) Customer ID First Name Surname 123 Robert Ingram 456 Jane Wright 789 Maria Fernandez First Normal Form (1NF) A database schema is in First Normal Form if all tables attributes contain only atomic values. Student Name GPA Name GPA Courses Alice 3.8 Bob 3.7 Carol 3.9 Math Customer ID Telephone Number Alice 3.8 DB 123 555-861-2025 OS Student Course 456 555-403-1659 Alice Math Course DB Bob 3.7 Carol Math Math 456 555-776-4100 OS Alice DB DB 789 555-808-9633 May need Bob DB OS Math to add keys Carol 3.9 OS Alice OS 7 8 Carol OS Student Takes Course 2

Relational Schema Design Data Anomalies Conceptual Model: name Product buys Person price name ssn When a database is poorly designed we get anomalies: Redundancy: data is repeated Relational Model (in 1NF) plus FD s Normalization: Eliminates anomalies Update anomalies: need to change in several places Delete anomalies: may lose data when we don t want 9 10 Relational Schema Design Recall set attributes (persons with several phones): Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield One person may have multiple phones, but lives in only one city Anomalies: Redundancy = repeated data Update anomalies = Fred moves to Bellevue Deletion anomalies = Joe deletes his phone number: what is his city? 11 Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN Anomalies are gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone numbers (how?) PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 12 3

Relational Schema Design (or Logical Design) Main idea: Start with some relational schema Find out its functional dependencies Use them to design a better relational schema Functional Dependencies A form of constraint hence, part of the schema Finding them is part of the database design Also used in normalizing the relations 13 14 Definition: Functional Dependencies If two tuples agree on the attributes When Does an FD Hold Definition: A 1,..., A m B 1,..., B n holds in R if: "t, t R, (t.a 1 =t.a 1... t.a m =t.a m t.b 1 =t.b 1... t.b n =t.b n ) A 1, A 2,, A n then they must also agree on the attributes R A 1... A m B 1... B m Formally: B 1, B 2,, B m t t A 1, A 2,, A n B 1, B 2,, B m if t, t agree here then t, t agree here 15 16 4

: Movie table : Movie table Title Year Length Genre StudioName StarName Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne s World 1992 95 Comedy Paramount Dana Carvey Wayne s World 1992 95 Comedy Paramount Mike Meyers Title Year Length Genre StudioName StarName Star Wars 1977 124 SciFi Fox Carrie Fisher Star Wars 1977 124 SciFi Fox Mark Hamill Star Wars 1977 124 SciFi Fox Harrison Ford Gone With the Wind 1939 231 Drama MGM Vivien Leigh Wayne s World 1992 95 Comedy Paramount Dana Carvey Wayne s World 1992 95 Comedy Paramount Mike Meyers title, year length title, year genre title, year length, genre, studioname title, year studioname How about? title, year starname 17 18 s An FD holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID Name, Phone, Position but not Name EmpID EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Position Phone or Name Phone 19 20 5

EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Inferring other dependencies from a set of FDs name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 but not Phone Position name color category department color, category price name, category price 21 22 Armstrong s Rules (1/3) Armstrong s Rules (2/3) A 1, A 2,, A n B 1, B 2,, B m Is equivalent to Splitting rule and Combing rule A 1, A 2,, A n A i where i = 1, 2,..., n Trivial Rule A 1, A 2,, A n B 1 A 1, A 2,, A n B 2..... A 1, A 2,, A n B m Why? A 1 A m A1... Am B1... Bm 23 24 6

Armstrong s Rules (3/3) Transitive Closure Rule A 1 A m B 1 B m C 1... C p If A 1, A 2,, A n B 1, B 2,, B m and B 1, B 2,, B m C 1, C 2,, C p then A 1, A 2,, A n C 1, C 2,, C p Why? 25 26 Inferring other dependencies from a set of FDs name category color department price Gizmo Gadget Green Toys 49 (continued) Start from the following FDs: 1. name color 2. category department 3. color, category price Infer the following FDs: name, category price THIS IS TOO HARD! Let s see an easier way. Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 name color category department color, category price name, category price 27 Inferred FD 4. name, category name 5. name, category color 6. name, category category 7. name, category color, category 8. name, category price Which Rule did we apply? Trivial Transitive 1, 4 Trivial Split/combine Transitive 7, 3 28 7

Closure of a set of Attributes Closures name category color department price Given a set of attributes A 1,, A n and a set of FDs The closure, {A 1,, A n } + = the set of attributes B s.t. A 1,, A n B Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 : name color category department color, category price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color} 29 30 X={A1,, An}. Closure Algorithm Repeat until X doesn t change do: if B 1,, B n C is a FD and B 1,, B n are all in X then add C to X. : name color category department color, category price {name, category} + = { name, category, color, department, price } More s In class: R(A,B,C,D,E,F) B, C A, D D E C, F B Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } Hence: name, category color, department, price 31 32 8

More s In class: R(A,B,C,D,E,F) B, C A, D D E C, F B Application of Closures Does a new FD logically follows from a set of FD Inferring All FDs that logically follows from a set of FD Compute {A,B} + X = {A, B, C, D, E} Compute {A, F} + X = {A, F} Finding all keys and superkeys 33 34 (1) Does a new FD logically follows from a set of FDs? R(A 1, A 2,.., A m ) Set of FDs A new FD R(A,B,C,D) B, C D C, D A A, D B A, D C To check if X A (new FD) Using given set of FDs Compute X + i.e., {left side} + Check if A X + 1. Compute {A,D} + 2. If it contains C then the new FD logically follows 35 36 9

R(A,B,C,D) B, C D C, D A A, D B 1. Compute {B, D} + = {B, D} B, D A 2. A is not a member of the set, hence it doesn t logically follow 37 : Using Closure to Infer ALL FDs A, D B B D Step 1: Compute X +, for every X {A,B,C,D}: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD s X Y, s.t. Y X + and X Y = : 38 AB CD, AD BC, BC D, ABC D, ABD C, ACD B Another Enrollment(student, major, course, room, time) student major major, course room course time What else can we infer? [in class, or at home] Application of Closure: Finding Keys A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n B A key is a minimal superkey i.e. set of attributes which is a superkey and for which no subset is a superkey 39 40 10

How many Superkeys? How many Superkeys? Suppose R is a relation with attributes A 1, A 2,.A n. As a function of n, tell how many superkeys R has, if: a) The only key is A 1 b) The only keys are A 1 and A 2 Suppose R is a relation with attributes A 1, A 2,.A n. As a function of n, tell how many superkeys R has, if: a) The only key is A 1 2 n-1 b) The only keys are A 1 and A 2 3.2 n-2 41 42 Computing (Super)Keys 1 Compute X + for all sets X If X + = all attributes, then X is a key List only the minimal X s R(A,B,C,D) B, C D C, D A A, D B What are all the keys? What are all the superkeys that are not keys? 43 44 11

R(A,B,C,D) B, C D C, D A A, D B Keys: AB, AD, BC, CD R(A,B,C,D) B, C D C, D A A, D B Superkeys that are not Keys: ABC, ABD, BCD, ACD, ABCD A + = A, B + = B, C + = C, D + = D AB + =ABCD, AC What + =AC, AD are all + =ABCD, the keys? BC + =ABCD, BD + =BD, CD + =ABCD ABC + = ABD + = What ACD are + = ABCD all the superkeys (no need to that compute are not keys? why?) BCD + = ABCD, ABCD + = ABCD A + = A, B + = B, C + = C, D + = D AB + =ABCD, AC What + =AC, AD are all + =ABCD, the keys? BC + =ABCD, BD + =BD, CD + =ABCD ABC + = ABD + = What ACD are + = ABCD all the superkeys (no need to that compute are not keys? why?) BCD + = ABCD, ABCD + = ABCD 45 46 2 2 A, D B B D Keys: AB, AD A, D B B D Superkeys that are not Keys: ABC, ABD, ACD, ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, What are all the keys? BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ What = ACD are + = all ABCD the superkeys (no need that to compute are not keys? why?) BCD + = BCD, ABCD+ = ABCD A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, What are all the keys? BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ What = ACD are + = all ABCD the superkeys (no need that to compute are not keys? why?) BCD + = BCD, ABCD+ = ABCD 47 48 12

Product(name, price, category, color) name, category price category color Product(name, price, category, color) name, category price category color What is the key? What is the key? (name, category) + = name, category, price, color Hence (name, category) is a key 49 50 s of Keys Enrollment(student, address, course, room, time) student address room, time course student, course room, time Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two keys (find keys at home) AB C BC A or A BC B AC what are the keys here? Can you design FDs such that there are three keys? 51 52 13

Eliminating Anomalies Main idea: X A is OK if X is a (super)key X A is not OK otherwise Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield SSN Name, City What is the key? {SSN, PhoneNumber} Hence SSN Name, City is a bad dependency 53 54 Boyce-Codd Normal Form (BCNF) A simple condition for removing anomalies from relations: All two-attribute relations are in BCNF R(A,B) Case 1: Suppose there is no dependency Nothing is being violated, so fine A relation R is in BCNF if: If A 1,..., A n B is a non-trivial dependency in R, then {A 1,..., A n } is a superkey for R In other words: there are no bad FDs, that is the left side of every nontrivial FD must be a super key. Equivalently: " X, either (X + = X) or (X + = all attributes) Case 2: Suppose A B is the only dependency A + = AB, so left side of A B is a key Case 3: Suppose B A is the only dependency B + = AB, so left side of B A is a key Case 4: Suppose A B and B A are two dependencies A + = AB, B + = AB, so left side of B A is a key and left side of A B is also a key 55 56 14

BCNF Decomposition Algorithm repeat choose A 1,, A m B 1,, B n that violates BCNF split R into R 1 (A 1,, A m, B 1,, B n ) and R 2 (A 1,, A m, [others]) continue with both R 1 and R 2 until no more violations Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield B s R 1 A s Others R 2 In practice, we have a better algorithm (coming up) 57 SSN Name, City What is the key? {SSN, PhoneNumber} use SSN Name, City to split R1(SSN, Name, City) R2(SSN, PhoneNumber) 58 Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 SSN Name, City Let s check anomalies: Redundancy? Update? Delete? BCNF Decomposition Algorithm BCNF_Decompose(R) find X s.t.: X X + [all attributes] if (not found) then R is in BCNF let Y = X + - X let Z = [all attributes] - X + decompose R into R1(X Y) and R2(X Z) continue to decompose recursively R1 and R2 59 60 15

R(A,B,C,D) R 11 (B,C) Find X s.t.: X X + [all attributes] R 1 (A,B,C) B + = BC ABC R(A,B,C,D) A + = ABC ABCD R 12 (A,B) R 2 (A,D) A B B C What are the keys? 2: BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN name, age age haircolor Iteration 1: Person SSN + = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) Iteration 2: P SSN + = SSN, name, age, haircolor age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber) Find X s.t.: X X + [all attributes] What are the keys? 61 62 Decompositions in General R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) R 1 (A 1,..., A n, B 1,..., B m ) R 2 (A 1,..., A n, C 1,..., C p ) R 1 = projection of R on A 1,..., A n, B 1,..., B m R 2 = projection of R on A 1,..., A n, C 1,..., C p 63 16