CSE 344 AUGUST 3 RD NORMALIZATION

Similar documents
CSE 344 MAY 16 TH NORMALIZATION

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Management CSE 344

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Data Management CSE 344

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

CSE 544 Principles of Database Management Systems

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Database Design and Implementation

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Schema Refinement. Feb 4, 2010

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Lectures 6. Lecture 6: Design Theory

CMPT 354: Database System I. Lecture 9. Design Theory

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CSC 261/461 Database Systems Lecture 11

SCHEMA NORMALIZATION. CS 564- Fall 2015

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

CSE 344 AUGUST 6 TH LOSS AND VIEWS

CSC 261/461 Database Systems Lecture 12. Spring 2018

CSC 261/461 Database Systems Lecture 13. Spring 2018

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Design theory for relational databases

Design Theory for Relational Databases

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Chapter 3 Design Theory for Relational Databases

Schema Refinement and Normal Forms

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Introduction to Data Management CSE 344

Design Theory for Relational Databases

CS54100: Database Systems

Relational Design Theory

Chapter 3 Design Theory for Relational Databases

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Schema Refinement & Normalization Theory

Functional Dependencies & Normalization. Dr. Bassam Hammo

Databases 2012 Normalization

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Relational Database Design

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

DECOMPOSITION & SCHEMA NORMALIZATION

CS322: Database Systems Normalization

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Schema Refinement and Normal Forms

Functional Dependencies and Normalization

INF1383 -Bancos de Dados

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Relational Database Design

CSIT5300: Advanced Database Systems

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Schema Refinement and Normal Forms Chapter 19

Functional Dependency Theory II. Winter Lecture 21

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Functional Dependency and Algorithmic Decomposition

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Schema Refinement and Normal Forms. Chapter 19

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Constraints: Functional Dependencies

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Information Systems (Informationssysteme)

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Lossless Joins, Third Normal Form

Chapter 10. Normalization Ext (from E&N and my editing)

Schema Refinement and Normalization

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Chapter 8: Relational Database Design

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Functional Dependencies

Schema Refinement and Normal Forms

Constraints: Functional Dependencies

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Chapter 7: Relational Database Design

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

LOGICAL DATABASE DESIGN Part #1/2

Functional Dependencies

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Transcription:

CSE 344 AUGUST 3 RD NORMALIZATION

ADMINISTRIVIA WQ6 due Monday DB design HW7 due next Wednesday DB design normalization

DATABASE DESIGN PROCESS Conceptual Model: name product makes company price name address Relational Model: Tables + constraints And also functional dep. Normalization: Eliminates anomalies Conceptual Schema Physical storage details Physical Schema

RELATIONAL SCHEMA DESIGN Name StudentID PhoneNumber City Fred 123456789 206-555-1234 Seattle Fred 123456789 206-555-6543 Seattle Joe 987654321 908-555-2121 Westfield One person may have multiple phones, but lives in only one city Primary key is thus (StudentID, PhoneNumber) What is the problem with this schema?

RELATIONAL SCHEMA DESIGN Name StudentID PhoneNumber City Fred 123456789 206-555-1234 Seattle Fred 123456789 206-555-6543 Seattle Joe 987654321 908-555-2121 Westfield Anomalies: Redundancy Update anomalies Deletion anomalies = repeat data = what if Fred moves to Bellevue? = what if Joe deletes his phone number?

RELATION DECOMPOSITION Break the relation into two: Name StudentID PhoneNumber City Fred 123456789 206-555-1234 Seattle Fred 123456789 206-555-6543 Seattle Joe 987654321 908-555-2121 Westfield Name StudentID City Fred 123456789 Seattle Joe 987654321 Westfield Anomalies have gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone numbers (how?) StudentID PhoneNumber 123456789 206-555-1234 123456789 206-555-6543 987654321 908-555-2121

RELATIONAL SCHEMA DESIGN (OR LOGICAL DESIGN) How do we do this systematically? Start with some relational schema Find out its functional dependencies (FDs) Use FDs to normalize the relational schema

FUNCTIONAL DEPENDENCIES (FDS) Definition If two tuples agree on the attributes A 1, A 2,, A n then they must also agree on the attributes B 1, B 2,, B m Formally: A 1 A n determines B 1..B m A 1, A 2,, A n à B 1, B 2,, B m

FUNCTIONAL DEPENDENCIES (FDS) Definition A 1,..., A m à B 1,..., B n holds in R if: t, t R, (t.a 1 = t.a 1... t.a m = t.a m à t.b 1 = t.b 1... t.b n = t.b n ) R A 1... A m B 1... B n t t if t, t agree here then t, t agree here

FUNCTIONAL DEPENDENCIES (FDS) Definition A 1,..., A m à B 1,..., B n holds in R if: t, t R, (t.a 1 = t.a 1... t.a m = t.a m à t.b 1 = t.b 1... t.b n = t.b n ) Logically equivalent: t, t R, (t.a 1 = t.a 1... t.a m = t.a m ) (t.b 1 = t.b 1... t.b n = t.b n )

EXAMPLE An FD holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID à Name, Phone, Position Position à Phone but not Phone à Position

EXAMPLE EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 ß Salesrep E1111 Smith 9876 ß Salesrep E9999 Mary 1234 Lawyer Position à Phone

EXAMPLE EmpID Name Phone Position E0045 Smith 1234 à Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 à Lawyer But not Phone à Position

EXAMPLE name à color category à department color, category à price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Do all the FDs hold on this instance?

EXAMPLE name à color category à department color, category à price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 49 Gizmo Stationary Green Office-supp. 59 What about this one?

BUZZWORDS FD holds or does not hold on an instance If we can be sure that every instance of R will be one in which a given FD is true, then we say that R satisfies the FD If we say that R satisfies an FD, we are stating a constraint on R

WHY BOTHER WITH FDS? Name StudentID PhoneNumber City Fred 123456789 206-555-1234 Seattle Fred 123456789 206-555-6543 Seattle Joe 987654321 908-555-2121 Westfield Anomalies: Redundancy Update anomalies Deletion anomalies FD: StudentID -> Name, City = repeat data = what if Fred moves to Bellevue? = what if Joe deletes his phone number?

AN INTERESTING OBSERVATION If all these FDs are true: name à color category à department color, category à price Then this FD also holds: name, category à price If we find out from application domain that a relation satisfies some FDs, it doesn t mean that we found all the FDs that it satisfies! There could be more FDs implied by the ones we have.

CLOSURE OF A SET OF ATTRIBUTES Given a set of attributes A 1,, A n The closure is the set of attributes B, notated {A 1,, A n } +, s.t. A 1,, A n à B Example: Closures: name + 1. name à color 2. category à department 3. color, category à price = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

CLOSURE ALGORITHM X={A1,, An}. Repeat until X doesn t change do: if B 1,, B n à C is a FD and B 1,, B n are all in X then add C to X. Example: 1. name à color 2. category à department 3. color, category à price {name, category} + = { name, category, color, department, price} Hence: name, category à color, department, price

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, }

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, }

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E }

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } What is the key of R?

PRACTICE AT HOME Find all FD s implied by: A, B à C A, D à B B à D

PRACTICE AT HOME Find all FD s implied by: A, B à C A, D à B B à D Step 1: Compute X +, for every X: A + = A, B + = BD, C + = C, D + = D AB + =ABCD, AC + =AC, AD + =ABCD, BC + =BCD, BD + =BD, CD + =CD ABC + = ABD + = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD + = ABCD Step 2: Enumerate all FD s X à Y, s.t. X + = X Y (with X Y = ): AB à CD, ADàBC, ABC à D, ABD à C, ACD à B

KEYS A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n à B I.e., for which closure = all attributes A key is a minimal superkey A superkey and for which no subset is a superkey (A key already determines everything, so any superset of key does too.)

COMPUTING (SUPER)KEYS For all sets X, compute X + If X + = [all attributes], then X is a superkey Try reducing to the minimal X s to get the key

EXAMPLE Product(name, price, category, color) name, category à price category à color What is the key?

EXAMPLE Product(name, price, category, color) name, category à price category à color What is the key? (name, category) + = { name, category, price, color } Hence (name, category) is a key

KEY OR KEYS? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys

KEY OR KEYS? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys A à B B à C C à A or ABàC BCàA or AàBC BàAC what are the keys here?

ELIMINATING ANOMALIES Main idea: X à A is OK if X is a (super)key X à A is not OK otherwise Need to decompose the table, but how? Boyce-Codd Normal Form

BOYCE-CODD NORMAL FORM There are no bad FDs: Definition. A relation R is in BCNF if: Whenever Xà B is a non-trivial dependency, then X is a superkey. Equivalently: Definition. A relation R is in BCNF if: " X, either X + = X or X + = [all attributes]

BCNF DECOMPOSITION ALGORITHM Normalize(R) find X s.t.: X X + and X + [all attributes] if (not found) then R is in BCNF let Y = X + - X; Z = [all attributes] - X + decompose R into R1(X Y) and R2(X Z) Normalize(R1); Normalize(R2); Y X Z X +

EXAMPLE Name StudentID PhoneNumber City Fred 123456789 206-555-1234 Seattle Fred 123456789 206-555-6543 Seattle Joe 987654321 908-555-2121 Westfield Joe 987654321 908-555-1234 Westfield StudentID à Name, City The only key is: {StudentID, PhoneNumber} Hence StudentID à Name, City is a bad dependency Name, City SID + SID Phone- Number In other words: StudentID+ = StudentID, Name, City and is neither StudentID nor All Attributes

EXAMPLE BCNF DECOMPOSITION Name StudentID City Fred 123456789 Seattle Joe 987654321 Westfield StudentID à Name, City StudentID PhoneNumber 123456789 206-555-1234 123456789 206-555-6543 987654321 908-555-2121 987654321 908-555-1234 Name, City SID + SID Let s check anomalies: Redundancy? Update? Delete? Phone- Number

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) name, SSN age, haircolor phonenumber

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) What are the keys? Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber)

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Note the keys! Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber)

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D)

R(A,B,C,D) EXAMPLE: BCNF Recall: find X s.t. X X + [all-attrs] R(A,B,C,D) A à B B à C

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) R 2 (A,D)

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D)

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) What are the keys? What happens if in R we first pick B +? Or AB +?