SCHEMA NORMALIZATION. CS 564- Fall 2015

Similar documents
DECOMPOSITION & SCHEMA NORMALIZATION

Database Design and Implementation

Schema Refinement. Feb 4, 2010

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CSE 344 AUGUST 3 RD NORMALIZATION

CSE 344 MAY 16 TH NORMALIZATION

Introduction to Data Management CSE 344

Schema Refinement and Normal Forms

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Relational Database Design

Introduction to Management CSE 344

CSE 544 Principles of Database Management Systems

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

CSC 261/461 Database Systems Lecture 11

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

CMPT 354: Database System I. Lecture 9. Design Theory

CSC 261/461 Database Systems Lecture 13. Spring 2018

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Schema Refinement & Normalization Theory

Lossless Joins, Third Normal Form

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Lectures 6. Lecture 6: Design Theory

Relational Design: Characteristics of Well-designed DB

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Schema Refinement and Normal Forms Chapter 19

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CS54100: Database Systems

Schema Refinement and Normal Forms

Relational Database Design

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normalization

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Chapter 8: Relational Database Design

Constraints: Functional Dependencies

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Schema Refinement and Normal Forms. Why schema refinement?

Databases 2012 Normalization

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Schema Refinement and Normal Forms

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement and Normal Forms. Chapter 19

CS322: Database Systems Normalization

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Schema Refinement and Normal Forms

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Design Theory for Relational Databases

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Functional Dependency Theory II. Winter Lecture 21

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Functional Dependencies & Normalization. Dr. Bassam Hammo

Functional Dependencies and Normalization

INF1383 -Bancos de Dados

Constraints: Functional Dependencies

Relational Design Theory

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

CSE 132B Database Systems Applications

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Functional Dependency and Algorithmic Decomposition

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Information Systems (Informationssysteme)

CSC 261/461 Database Systems Lecture 12. Spring 2018

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Chapter 3 Design Theory for Relational Databases

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Chapter 3 Design Theory for Relational Databases

Schema Refinement: Other Dependencies and Higher Normal Forms

Design Theory for Relational Databases

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Normaliza)on and Func)onal Dependencies

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Database Design and Normalization

Functional Dependencies

Lecture #7 (Relational Design Theory, cont d.)

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

Introduction to Data Management CSE 344

Transcription:

SCHEMA NORMALIZATION CS 564- Fall 2015

HOW TO BUILD A DB APPLICATION Pick an application Figure out what to model (ER model) Output: ER diagram Transform the ER diagram to a relational schema Refine the relational schema (normalization) Now ready to implement the schema and load the data! 2

MOTIVATING EXAMPLE SSN name age phonenumber 934729837 Paris 24 608-374- 8422 934729837 Paris 24 603-534- 8399 123123645 John 30 608-321- 1163 384475687 Arun 20 206-473- 8221 What is the primary key? (SSN, PhoneNumber) What is the problem with this schema? 3

MOTIVATING EXAMPLE SSN name age phonenumber 934729837 Paris 24 608-374- 8422 934729837 Paris 24 603-534- 8399 123123645 John 30 608-321- 1163 384475687 Arun 20 206-473- 8221 Problems: redundant storage update: change the age of Paris? insert: what if a person has no phone number? delete: what if Arun deletes his phone number? 4

SOLUTION: DECOMPOSITION SSN name age phonenumber 934729837 Paris 24 608-374- 8422 934729837 Paris 24 603-534- 8399 123123645 John 30 608-321- 1163 384475687 Arun 20 206-473- 8221 SSN name age 934729837 Paris 24 123123645 John 30 384475687 Arun 20 SSN phonenumber 934729837 608-374- 8422 934729837 603-534- 8399 123123645 608-321- 1163 384475687 206-473- 8221 5

GENERAL SOLUTION identify bad schemas functional dependencies (FDs) decompose the tables (normalize) using the FDs specified decomposition should be used judiciously: normal forms (BCNF, 3NF) guarantee against some forms of redundancy does decomposition cause any problems? Lossless join Dependency preservation 6

FUNCTIONAL DEPENDENCIES 7

FD: DEFINITION FDs are a form of constraint generalize the concept of keys If two tuples agree on the attributes A 1, A 2,, An then they must agree on the attributes B (, B ),, B * Formally: A (, A ),, A + B (, B ),, B * 8

FD: EXAMPLE 1 SSN name age phonenumber 934729837 Paris 24 608-374- 8422 934729837 Paris 24 603-534- 8399 123123645 John 30 608-321- 1163 384475687 Arun 20 206-473- 8221 SSN name, age SSN, age name SSN phonenumber 9

FD: EXAMPLE 2 studentid semester courseno section instructor 124434 4 CS 564 1 Paris 546364 4 CS 564 2 Arun 999492 6 CS 764 1 Anhai 183349 6 CS 784 1 Jeff courseno, section instructor studentid semester 10

HOW DO WE INFER FDS? What FDs are valid for a relational schema? think from an application point of view An FD is an inherent property of an application not something we can infer from a set of tuples Given a table with a set of tuples we can confirm that a FD seems to be valid to infer that a FD is definitely invalid we can never prove that a FD is valid 11

EXAMPLE 3 name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office- supplies 59 To confirm whether name department erase all other columns check that the relationship name- department is many- one! 12

WHY FDS? keys are special cases of FDs more integrity constraints for the application having FDs will help us detect that a table is bad, and how to decompose the table 13

MORE ON FDS If the following FDs hold: A B B C then the following FD is also true: A C This means that there are more FDs that can be found! How? Armstrong s Axioms 14

ARMSTRONG S AXIOMS: 1 Reflexivity For any subset X A (,.., A + : A (, A ),, A + X Examples A, B B A, B, C A, B A, B, C A, B, C 15

ARMSTRONG S AXIOMS: 2 Augmentation For any attribute sets X, Y, Z : if X Y then X, Z Y, Z Examples A B implies A, C B, C A, B C implies A, B, C C 16

ARMSTRONG S AXIOMS: 3 Transitivity For any attribute sets X, Y, Z : if X Y and Y Z then X Z Examples A B and B C imply A C A C, D and C, D E imply A E 17

APPLYING ARMSTRONG S AXIOMS Product(name, category, color, department, price) name color category department color, category price Inferred FDs: name, category price (1) Augmentation, (2) Transitivity name, category color (1) Reflexivity, (2) Transitivity 18

APPLYING ARMSTRONG S AXIOMS FD Closure If F is a set of FDs, the closure F L is the set of all FDs logically implied by F Armstrong s axioms are: sound: any FD generated by an axiom belongs in F L complete: repeated application of the axioms will generate all FDs in F L 19

CLOSURE OF ATTRIBUTE SETS Attribute Closure If X is an attribute set, the closure X L is the set of all attributes B such that: X B In other words, X L includes all attributes that are functionally determined from X 20

EXAMPLE Product(name, category, color, department, price) name color category department color, category price Attribute Closure: {name} L = {name, color} name, category L = {name, color, category, department, price} 21

WHY IS CLOSURE NEEDED? Does X Y hold? check if Y X L Compute the closure F L of FDs for each subset of attributes X, compute X L for each subset of attributes Y X L, output the FD X Y 22

CLOSURE ALGORITHM Let X = {A (, A ),, A + } Until X doesn t change repeat: if B (, B ),, B * C is a FD and B (, B ),, B * are all in X then add C to X 23

EXAMPLE R(A, B, C, D, E, F) A, B C A, D E B D A, F B Compute: {A, B} L = {A, B, C, D, E} {A, F} L = {A, F, B, D, E, C} 24

KEYS & SUPERKEYS Relation R superkey: a set of attributes A 1,A 2,, A + such that for any other attribute B A 1, A 2,, A + B key: a minimal superkey none of its subsets functionally determines all attributes of R 25

COMPUTING KEYS & SUPERKEYS Compute X L for all sets of attributes X If X L = all attributes, then X is a superkey If no subset of X is a superkey, then X is also a key 26

EXAMPLE Product(name, category, price, color) name color color, category price Superkeys: name, category, name, category, price name, category, color, {name, category, price, color} Keys: name, category 27

MANY KEYS? Q: Is it possible to have many keys in a relation R? YES!! Take relation R(A, B, C)with FDs A, B C A, C B 28

RECAP FDs and (super)keys Reasoning with FDs: given a set of FDs, infer all implied FDs given a set of attributes X, infer all attributes that are functionally determined by X Next we will look at how to use them to detect that a table is bad 29

BCNF DECOMPOSITION 30

BOYCE- CODD NORMAL FORM (BCNF) A relation R is in BCNF if whenever X B is a non- trivial FD, then X is a superkey in R Equivalent definition: for every attribute set X either X L = X or X L = all attributes 31

BCNF EXAMPLE 1 SSN name age phonenumber 934729837 Paris 24 608-374- 8422 934729837 Paris 24 603-534- 8399 123123645 John 30 608-321- 1163 384475687 Arun 20 206-473- 8221 SSN name, age key = {SSN, phonenumber} SSN name, age is a bad dependency The above relation is not in BCNF! 32

BCNF EXAMPLE 2 SSN name age 934729837 Paris 24 123123645 John 30 384475687 Arun 20 SSN name, age key = {SSN} The above relation is in BCNF! 33

BCNF EXAMPLE 3 SSN phonenumber 934729837 608-374- 8422 934729837 603-534- 8399 123123645 608-321- 1163 384475687 206-473- 8221 key = {SSN, phonenumber} The above relation is in BCNF! Q: can we have a binary relation that is not in BCNF? 34

BCNF DECOMPOSITION Find an FD that violates the BCNF condition A (, A ),, A + B (, B ),, B * Decompose R to R 1 and R 2 : R 1 R 2 B s A s remaining attributes Continue until no BCNF violations are left 35

DECOMPOSITION EXAMPLE SSN name age phonenumber 934729837 Paris 24 608-374- 8422 934729837 Paris 24 603-534- 8399 123123645 John 30 608-321- 1163 384475687 Arun 20 206-473- 8221 The FD SSN name, age violates BCNF Split into two relations R 1, R 2 as follows: R 1 R 2 name age SSN phonenumber 36

DECOMPOSITION EXAMPLE R 1 R 2 name age SSN phonenumber SSN name, age SSN name age 934729837 Paris 24 123123645 John 30 384475687 Arun 20 SSN phonenumber 934729837 608-374- 8422 934729837 603-534- 8399 123123645 608-321- 1163 384475687 206-473- 8221 37

BCNF EXAMPLE Person(SSN, name, age, candrink, phonenumber) SSN name, age age candrink 38

DECOMPOSITION PROPERTIES 39

DECOMPOSITION IN GENERAL Let R(A 1,, A n ). To decompose, create: R 1 (B 1,.., B m ) R 2 (C 1,, C l ) where {B (,, B * } {C (,, C [ } = {A (, A + } Then: R 1 is the projection of R onto B 1,.., B m R 2 is the projection of R onto C 1,.., C l 40

PROPERTIES 1. minimize redundancy 2. avoid information loss 3. preserve functional dependencies 4. ensure good query performance 41

INFORMATION LOSS EXAMPLE name age phonenumber Paris 24 608-374- 8422 John 24 608-321- 1163 Arun 20 206-473- 8221 Decompose into: R 1 (name, age) R 2 (age, phonenumber) name age Paris 24 John 24 Arun 20 age phonenumber 24 608-374- 8422 24 608-321- 1163 20 206-473- 8221 Can we put it back together? 42

LOSSLESS- JOIN DECOMPOSITION R(A, B, C) decompose R 1 (A, B) R 2 (B, C) R (A, B, C) recover The decomposition is lossless- join if R is the same as R 43

FD PRESERVING Given a relation R and a set of FDs F, decompose R into R 1 and R 2 Suppose R 1 has a set of FDs F 1 R 2 has a set of FDs F 2 F 1 and F 2 are computed from F The decomposition is dependency preserving if by enforcing F 1 over R 1 and F 2 over R 2, we can enforce F over R 44

GOOD EXAMPLE Person(SSN, name, age, candrink) SSN name, age age candrink Decomposes into: R 1 (SSN, name, age) SSN name, age R 2 (age, candrink) age candrink 45

BAD EXAMPLE R(A, B, C) A B R 1 A a 1 B b R 2 A a 1 C c B, C A a 2 b a 2 c Decomposes into: R 1 (A, B) A B R 2 (A, C) no FDs here!! A B C a 1 b c recover a 2 b c The recovered table violates B, C A 46

DECOMPOSITION When decomposing a relation R, we want to achieve good properties These properties can be conflicting BCNF decomposition achieves some of these: removes certain types of redundancy is lossless- join is not always dependency preserving 47

WHY IS BCNF LOSSLESS- JOIN? Example: R(A, B, C) with A B decomposes into: R 1 (A, B) and R 2 (A, C) Suppose tuple (a,b,c) is in the recovered R Then, (a,b) in R 1 and (a,c) in R 2 But then (a,b,c) is in R Since A B it must be that b = b So (a,b,c) is also in R! 48

WHY IS BCNF NOT FD PRESERVING? R(A, B, C) A B B, C A The BCNF decomposition is: R 1 (A, B) with FD A B R 2 (A, C) with no FDs There may not exist any BCNF decomposition that is FD preserving! 49

NORMAL FORMS BCNF is what we call a normal form Other normal forms exist: 1NF: flat tables (atomic values) 2NF 3NF BCNF 4NF more restrictive 50

THIRD NORMAL FORM (3NF) 51

3NF DEFINITION A relation R is in 3NF if whenever X A, one of the following is true: A X (trivial FD) X is a superkey A is part of some key of R (prime attribute) BCNF implies 3NF 52

3NF CONTINUED Example: R(A, B, C) with A, B C and C A is in 3NF. Why? is not in BCNF. Why? Compromise used when BCNF not achievable: aim for BCNF and settle for 3NF Lossless- join and dependency- preserving decomposition of R into a collection of 3NF relations is always possible! 53

3NF DECOMPOSITION The algorithm for BCNF decomposition can be used to get a lossless- join decomposition into 3NF We can typically stop earlier! To ensure dependency preservation as well, instead of the given set of FDs F, we use a minimal (or canonical) cover for F 54

MINIMAL COVER FOR FDS minimal cover G for a set of FDs F: G L = F L If we modify G by deleting an FD or by deleting attributes from an FD in G, the closure changes The LHS of each FD is unique The minimal cover gives a lossless- join and dependency- preserving decomposition algorithm! 55

EXAMPLE Example: A B A, B, C, D E E, F G, H A, C, D, F E, G The minimal cover is the following: A B A, C, D E E, F G, H 56

3NF ALGORITHM 1. Find a lossless- join 3NF decomposition (that might violate some FDs) 2. Compute a minimal cover F 3. Find the FDs in F that are not preserved 4. For each non- preserved FD X A add a new relation R(X, A) 57

IS NORMALIZATION ALWAYS GOOD? Example: suppose A and B are always used together, but normalization says they should be in different tables decomposition might produce unacceptable performance loss Example: data warehouses huge historical DBs, rarely updated after creation joins expensive or impractical Everyday DBs: aim for BCNF, settle for 3NF! 58

RECAP Bad schemas lead to redundancy redundant storage, update, insert, and delete anomaly To correct bad schemas: decompose relations must be a lossless- join decomposition would like dependency- preserving decompositions Desired normal forms BCNF: only superkey FDs 3NF: superkey FDs + dependencies with prime attributes on the RHS 59