Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Similar documents
CSE 344 MAY 16 TH NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CSE 544 Principles of Database Management Systems

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Management CSE 344

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Introduction to Data Management CSE 344

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Database Design and Implementation

Schema Refinement. Feb 4, 2010

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Lectures 6. Lecture 6: Design Theory

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CMPT 354: Database System I. Lecture 9. Design Theory

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CSC 261/461 Database Systems Lecture 11

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

SCHEMA NORMALIZATION. CS 564- Fall 2015

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

CSC 261/461 Database Systems Lecture 12. Spring 2018

CSE 344 AUGUST 6 TH LOSS AND VIEWS

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

CSC 261/461 Database Systems Lecture 13. Spring 2018

Schema Refinement and Normal Forms

Design Theory for Relational Databases

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Design theory for relational databases

CS322: Database Systems Normalization

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases

DECOMPOSITION & SCHEMA NORMALIZATION

CS54100: Database Systems

Schema Refinement and Normal Forms. Chapter 19

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Introduction to Data Management CSE 344

Schema Refinement and Normal Forms

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

INF1383 -Bancos de Dados

Functional Dependency Theory II. Winter Lecture 21

Relational Database Design

Schema Refinement and Normal Forms

Functional Dependencies & Normalization. Dr. Bassam Hammo

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Relational Design Theory

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Chapter 3 Design Theory for Relational Databases

Chapter 8: Relational Database Design

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement & Normalization Theory

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Chapter 7: Relational Database Design

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Databases 2012 Normalization

Functional Dependencies and Normalization

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Relational Database Design

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Constraints: Functional Dependencies

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Information Systems (Informationssysteme)

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Schema Refinement and Normalization

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

CSIT5300: Advanced Database Systems

Lossless Joins, Third Normal Form

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Schema Refinement and Normal Forms

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Functional Dependency and Algorithmic Decomposition

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Schema Refinement and Normal Forms. Why schema refinement?

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Exam 1 Solutions Spring 2016

Relational-Database Design

Constraints: Functional Dependencies

Functional Dependencies

Database Design and Normalization

Desirable properties of decompositions 1. Decomposition of relational schemes. Desirable properties of decompositions 3

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

CSE 132B Database Systems Applications

Transcription:

Introduction to Database Systems CSE 414 Lecture 20: Design Theory CSE 414 - Spring 2018 1

Class Overview Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit 3: Non-relational data Unit 4: RDMBS internals and query optimization Unit 5: Parallel query processing Unit 6: DBMS usability, conceptual design E/R diagrams Schema normalization Unit 7: Transactions 2

Database Design Process Conceptual Model: name product makes company price name address Relational Model: Tables + constraints And also functional dep. Normalization: Eliminates anomalies Conceptual Schema Physical storage details Physical Schema

Entity / Relationship Diagrams Entity set = a class An entity = an object Product Attribute city Relationship makes CSE 414 - Spring 2018 4

Arrows in Multiway Relationships Q: What does the arrow mean? Product date Purchase Store Person A: Any person buys a given product from at most one store AND every store sells to every person at most one product CSE 414 - Spring 2018 5

N-N Relationships to Relations date prod-id cust-id date name Orders Shipment Shipping-Co Orders(prod-ID,cust-ID, date) Shipment(prod-ID,cust-ID, name, date) Shipping-Co(name, address) address prod-id cust-id name date Gizmo55 Joe12 UPS 4/10/2011 Gizmo55 Joe12 FEDEX 4/9/2011

N-1 Relationships to Relations date prod-id cust-id date name Orders Shipment Shipping-Co Orders(prod-ID,cust-ID, date1, name, date2) Shipping-Co(name, address) address Remember: no separate relations for many-one relationship 7

Subclasses to Relations name category price Product isa isa Product Sw.Product Name Price Category Gizmo 99 gadget Camera 49 photo Toy 39 gadget Name platforms Gizmo unix Software Product platforms Educational Product Other ways to convert are possible Age Group Ed.Product Name Gizmo Toy Age Group toddler retired CSE 414 - Spring 2018 8

Modeling Union Types with Subclasses Solution 2: better, more laborious Owner isa Person ownedby isa Company FurniturePiece CSE 414 - Spring 2018 9

Weak Entity Sets Entity sets are weak as their key comes from other classes to which they are related. Team affiliation University sport number name Team(sport, number, universityname) University(name) CSE 414 - Spring 2018 10

Referential Integrity Constraints Product makes Company Each product made by at most one company. Some products made by no company Product makes Company Each product made by exactly one company. CSE 414 - Spring 2018 11

Other Constraints Product <100 makes Company Q: What does this mean? A: A Company entity cannot be connected by relationship to more than 99 Product entities CSE 414 - Spring 2018 12

Constraints in SQL Constraints in SQL: Keys, foreign keys Attribute-level constraints Tuple-level constraints Global constraints: assertions simplest Most complex The more complex the constraint, the harder it is to check and to enforce CSE 414 - Spring 2018 13

What happens when data changes? SQL has three policies for maintaining referential integrity: NO ACTION reject violating modifications (default) CASCADE after delete/update do delete/update SET NULL set foreign-key field to NULL SET DEFAULT set foreign-key field to default value need to be declared with column, e.g., CREATE TABLE Product (pid INT DEFAULT 42) CSE 414 - Spring 2018 14

What makes good schemas? CSE 414 - Spring 2018 15

Relational Schema Design Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield One person may have multiple phones, but lives in only one city Primary key is thus (SSN, PhoneNumber) What is the problem with this schema? CSE 414 - Spring 2018 16

Relational Schema Design Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Anomalies: Redundancy = repeat data Update anomalies = what if Fred moves to Bellevue? Deletion anomalies = what if Joe deletes his phone number? CSE 414 - Spring 2018 17

Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone numbers (how?) 18

Relational Schema Design (or Logical Design) How do we do this systematically? Start with some relational schema Find out its functional dependencies (FDs) Use FDs to normalize the relational schema CSE 414 - Spring 2018 19

Functional Dependencies (FDs) Definition If two tuples agree on the attributes A 1, A 2,, A n then they must also agree on the attributes Formally: B 1, B 2,, B m A 1 A n determines B 1..B m A 1, A 2,, A n à B 1, B 2,, B m CSE 414 - Spring 2018 20

Functional Dependencies (FDs) Definition A 1,..., A m à B 1,..., B n holds in R if: t, t R, (t.a 1 = t.a 1... t.a m = t.a m à t.b 1 = t.b 1... t.b n = t.b n ) R A 1... A m B 1... B n t t if t, t agree here then t, t agree here 21

Example An FD holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID à Name, Phone, Position Position à Phone but not Phone à Position CSE 414 - Spring 2018 22

Example EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 ß Salesrep E1111 Smith 9876 ß Salesrep E9999 Mary 1234 Lawyer Position à Phone CSE 414 - Spring 2018 23

Example EmpID Name Phone Position E0045 Smith 1234 à Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 à Lawyer But not Phone à Position CSE 414 - Spring 2018 24

Example name à color category à department color, category à price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Do all the FDs hold on this instance? CSE 414 - Spring 2018 25

Example name à color category à department color, category à price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 49 Gizmo Stationary Green Office-supp. 59 What about this one? CSE 414 - Spring 2018 26

Buzzwords FD holds or does not hold on an instance If we can be sure that every instance of R will be one in which a given FD is true, then we say that R satisfies the FD If we say that R satisfies an FD, we are stating a constraint on R CSE 414 - Spring 2018 27

Why bother with FDs? Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Anomalies: Redundancy = repeat data Update anomalies = what if Fred moves to Bellevue? Deletion anomalies = what if Joe deletes his phone number? CSE 414 - Spring 2018 28

An Interesting Observation If all these FDs are true: name à color category à department color, category à price Then this FD also holds: name, category à price If we find out from application domain that a relation satisfies some FDs, it doesn t mean that we found all the FDs that it satisfies! There could be more FDs implied by the ones we have. CSE 414 - Spring 2018 29

Closure of a set of Attributes Given a set of attributes A 1,, A n The closure is the set of attributes B, notated {A 1,, A n } +, s.t. A 1,, A n à B Example: 1. name à color 2. category à department 3. color, category à price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color} CSE 414 - Spring 2018 30

Closure Algorithm X={A1,, An}. Repeat until X doesn t change do: if B 1,, B n à C is a FD and B 1,, B n are all in X then add C to X. Example: 1. name à color 2. category à department 3. color, category à price {name, category} + = { name, category, color, department, price } Hence: name, category à color, department, price CSE 414 - Spring 2018 31

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, } Compute {A, F} + X = {A, F, } CSE 414 - Spring 2018 32

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, } CSE 414 - Spring 2018 33

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } CSE 414 - Spring 2018 34

Example In class: R(A,B,C,D,E,F) A, B à C A, D à E B à D A, F à B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } What is the key of R? CSE 414 - Spring 2018 35

Find all FD s implied by: A, B à C A, D à B B Practice at Home à D Step 1: Compute X +, for every X: A + = A, B + = BD, C + = C, D + = D AB + =ABCD, AC + =AC, AD + =ABCD, BC + =BCD, BD + =BD, CD + =CD ABC + = ABD + = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD + = ABCD Step 2: Enumerate all FD s X à Y, s.t. Y X + and X Y = : AB à CD, ADàBC, ABC à D, ABD à C, ACD à B 36

Keys A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n à B A key is a minimal superkey A superkey and for which no subset is a superkey CSE 414 - Spring 2018 37

Computing (Super)Keys For all sets X, compute X + If X + = [all attributes], then X is a superkey Try reducing to the minimal X s to get the key CSE 414 - Spring 2018 38

Example Product(name, price, category, color) name, category à price category à color What is the key? CSE 414 - Spring 2018 39

Example Product(name, price, category, color) name, category à price category à color What is the key? (name, category) + = { name, category, price, color } Hence (name, category) is a key CSE 414 - Spring 2018 40

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys CSE 414 - Spring 2018 41

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more distinct keys A à B B à C C à A or ABàC BCàA or AàBC BàAC what are the keys here? CSE 414 - Spring 2018 42

Eliminating Anomalies Main idea: X à A is OK if X is a (super)key X à A is not OK otherwise Need to decompose the table, but how? Boyce-Codd Normal Form CSE 414 - Spring 2018 43

Boyce-Codd Normal Form Dr. Raymond F. Boyce CSE 414 - Spring 2018 44

CSE 414 - Spring 2018 45

Boyce-Codd Normal Form There are no bad FDs: Definition. A relation R is in BCNF if: Whenever Xà B is a non-trivial dependency, then X is a superkey. Equivalently: Definition. A relation R is in BCNF if: " X, either X + = X or X + = [all attributes] CSE 414 - Spring 2018 46

BCNF Decomposition Algorithm Normalize(R) find X s.t.: X X + and X + [all attributes] if (not found) then R is in BCNF let Y = X + - X; Z = [all attributes] - X + decompose R into R1(X Y) and R2(X Z) Normalize(R1); Normalize(R2); Y X Z X + CSE 414 - Spring 2018 47

Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield SSN à Name, City The only key is: {SSN, PhoneNumber} Hence SSN à Name, City is a bad dependency Name, City SSN + SSN Phone- Number In other words: SSN+ = SSN, Name, City and is neither SSN nor All Attributes

Example BCNF Decomposition Name SSN City Fred 123-45-6789 Seattle Joe 987-65-4321 Westfield SSN à Name, City SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 Name, City SSN + SSN Phone- Number Let s check anomalies: Redundancy? Update? Delete? CSE 414 - Spring 2018 49