CSE 344 MAY 16 TH NORMALIZATION

Similar documents
CSE 344 AUGUST 3 RD NORMALIZATION

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Management CSE 344

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Data Management CSE 344

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

CSE 544 Principles of Database Management Systems

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Database Design and Implementation

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Schema Refinement. Feb 4, 2010

Lectures 6. Lecture 6: Design Theory

CMPT 354: Database System I. Lecture 9. Design Theory

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CSC 261/461 Database Systems Lecture 11

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

SCHEMA NORMALIZATION. CS 564- Fall 2015

CSE 344 AUGUST 6 TH LOSS AND VIEWS

CSC 261/461 Database Systems Lecture 12. Spring 2018

CSC 261/461 Database Systems Lecture 13. Spring 2018

Design theory for relational databases

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Schema Refinement and Normal Forms

Design Theory for Relational Databases

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Chapter 3 Design Theory for Relational Databases

Introduction to Data Management CSE 344

Relational Design Theory

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Chapter 3 Design Theory for Relational Databases

Design Theory for Relational Databases

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CS54100: Database Systems

DECOMPOSITION & SCHEMA NORMALIZATION

Relational Database Design

Schema Refinement & Normalization Theory

CS322: Database Systems Normalization

Schema Refinement and Normal Forms

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Functional Dependencies & Normalization. Dr. Bassam Hammo

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Databases 2012 Normalization

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Schema Refinement and Normal Forms

Relational Database Design

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Functional Dependencies and Normalization

INF1383 -Bancos de Dados

Information Systems (Informationssysteme)

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normal Forms Chapter 19

Functional Dependency Theory II. Winter Lecture 21

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Functional Dependency and Algorithmic Decomposition

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Lossless Joins, Third Normal Form

Schema Refinement and Normal Forms. Chapter 19

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Constraints: Functional Dependencies

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Schema Refinement and Normalization

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

CSIT5300: Advanced Database Systems

Functional Dependencies

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Functional Dependencies

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Schema Refinement and Normal Forms

Constraints: Functional Dependencies

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Chapter 10. Normalization Ext (from E&N and my editing)

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

LOGICAL DATABASE DESIGN Part #1/2

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Chapter 8: Relational Database Design

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Comp 5311 Database Management Systems. 5. Functional Dependencies Exercises

Transcription:

CSE 344 MAY 16 TH NORMALIZATION

ADMINISTRIVIA HW6 Due Tonight Prioritize local runs OQ6 Out Today HW7 Out Today E/R + Normalization Exams In my office; Regrades through me

DATABASE DESIGN PROCESS Conceptual Model: name product makes company price name address Relational Model: Tables + constraints And also functional dep. Normalization: Eliminates anomalies Conceptual Schema Physical storage details Physical Schema

RELATIONAL SCHEMA DESIGN Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield One person may have multiple phones, but lives in only one city Primary key is thus (SSN, PhoneNumber) What is the problem with this schema?

RELATIONAL SCHEMA DESIGN Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Anomalies: Redundancy = repeat data Update anomalies = what if Fred moves to Bellevue? Deletion anomalies = what if Joe deletes his phone number?

RELATION DECOMPOSITION Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN Anomalies have gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone numbers (how?) PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121

RELATIONAL SCHEMA DESIGN (OR LOGICAL DESIGN) How do we do this systematically? Start with some relational schema Find out its functional dependencies (FDs) Use FDs to normalize the relational schema

FUNCTIONAL DEPENDENCIES (FDS) Definition If two tuples agree on the attributes A 1, A 2,, A n then they must also agree on the attributes B 1, B 2,, B m Formally: A 1 A n determines B 1..B m A 1, A 2,, A n à B 1, B 2,, B m

FUNCTIONAL DEPENDENCIES (FDS) Definition A 1,..., A m à B 1,..., B n holds in R if: t, t R, (t.a 1 = t.a 1... t.a m = t.a m à t.b 1 = t.b 1... t.b n = t.b n ) R A 1... A m B 1... B n t t if t, t agree here then t, t agree here

EXAMPLE An FD holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID à Name, Phone, Position Position à Phone but not Phone à Position

EXAMPLE EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 ß Salesrep E1111 Smith 9876 ß Salesrep E9999 Mary 1234 Lawyer Position à Phone

EXAMPLE EmpID Name Phone Position E0045 Smith 1234 à Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 à Lawyer But not Phone à Position

EXAMPLE name à color category à department color, category à price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Do all the FDs hold on this instance?

EXAMPLE name à color category à department color, category à price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 49 Gizmo Stationary Green Office-supp. 59 What about this one?

BUZZWORDS FD holds or does not hold on an instance If we can be sure that every instance of R will be one in which a given FD is true, then we say that R satisfies the FD If we say that R satisfies an FD, we are stating a constraint on R

WHY BOTHER WITH FDS? Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Anomalies: Redundancy = repeat data Update anomalies = what if Fred moves to Bellevue? Deletion anomalies = what if Joe deletes his phone number?

AN INTERESTING OBSERVATION If all these FDs are true: name à color category à department color, category à price Then this FD also holds: name, category à price If we find out from application domain that a relation satisfies some FDs, it doesn t mean that we found all the FDs that it satisfies! There could be more FDs implied by the ones we have.

CLOSURE OF A SET OF ATTRIBUTES Given a set of attributes A 1,, A n The closure is the set of attributes B, notated {A 1,, A n } +, s.t. A 1,, A n à B Example: 1. name à color 2. category à department 3. color, category à price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color}

CLOSURE ALGORITHM X={A1,, An}. Repeat until X doesn t change do: if B 1,, B n à C is a FD and B 1,, B n are all in X then add C to X. Example: 1. name à color 2. category à department 3. color, category à price {name, category} + = { name, category, color, department, price } Hence: name, category à color, department, price

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, }

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, }

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E }

EXAMPLE In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } What is the key of R?

PRACTICE AT HOME Find all FD s implied by: A, B à C A, D à B B à D

PRACTICE AT HOME Find all FD s implied by: A, B à C A, D à B B à D Step 1: Compute X +, for every X: A + = A, B + = BD, C + = C, D + = D AB + =ABCD, AC + =AC, AD + =ABCD, BC + =BCD, BD + =BD, CD + =CD ABC + = ABD + = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD + = ABCD Step 2: Enumerate all FD s X à Y, s.t. Y X + and X Y = : AB à CD, ADàBC, ABC à D, ABD à C, ACD à B

KEYS A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n à B A key is a minimal superkey A superkey and for which no subset is a superkey

COMPUTING (SUPER)KEYS For all sets X, compute X + If X + = [all attributes], then X is a superkey Try reducing to the minimal X s to get the key

EXAMPLE Product(name, price, category, color) name, category à price category à color What is the key?

EXAMPLE Product(name, price, category, color) name, category à price category à color What is the key? (name, category) + = { name, category, price, color } Hence (name, category) is a key

KEY OR KEYS? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys

KEY OR KEYS? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys A à B B à C C à A or ABàC BCàA or AàBC BàAC what are the keys here?

ELIMINATING ANOMALIES Main idea: X à A is OK if X is a (super)key X à A is not OK otherwise Need to decompose the table, but how? Boyce-Codd Normal Form

BOYCE-CODD NORMAL FORM There are no bad FDs: Definition. A relation R is in BCNF if: Whenever Xà B is a non-trivial dependency, then X is a superkey. Equivalently: Definition. A relation R is in BCNF if: " X, either X + = X or X + = [all attributes]

BCNF DECOMPOSITION ALGORITHM Normalize(R) find X s.t.: X X + and X + [all attributes] if (not found) then R is in BCNF let Y = X + - X; Z = [all attributes] - X + decompose R into R1(X Y) and R2(X Z) Normalize(R1); Normalize(R2); Y X Z X +

EXAMPLE Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield SSN à Name, City The only key is: {SSN, PhoneNumber} Hence SSN à Name, City is a bad dependency Name, City SSN + SSN Phone- Number In other words: SSN+ = SSN, Name, City and is neither SSN nor All Attributes

EXAMPLE BCNF DECOMPOSITION Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN à Name, City SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 Name, City SSN + Let s check anomalies: Redundancy? Update? Delete? SSN Phone- Number

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) name, age, haircolor SSN phonenumber

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) What are the keys? Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber)

Find X s.t.: X X + and X + [all attributes] EXAMPLE BCNF DECOMPOSITION Person(name, SSN, age, haircolor, phonenumber) SSN à name, age age à haircolor Note the keys! Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber)

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D)

R(A,B,C,D) EXAMPLE: BCNF Recall: find X s.t. X X + [all-attrs] R(A,B,C,D) A à B B à C

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) R 2 (A,D)

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D)

R(A,B,C,D) EXAMPLE: BCNF A à B B à C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) What are the keys? What happens if in R we first pick B +? Or AB +?

DECOMPOSITIONS IN GENERAL R(A 1,..., A n, B 1,..., B m, C 1,..., C p ) S 1 (A 1,..., A n, B 1,..., B m ) S 2 (A 1,..., A n, C 1,..., C p ) S 1 = projection of R on A 1,..., A n, B 1,..., B m S 2 = projection of R on A 1,..., A n, C 1,..., C p