Introduction to Management CSE 344

Similar documents
11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

CSE 344 MAY 16 TH NORMALIZATION

CSE 344 AUGUST 3 RD NORMALIZATION

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Introduction to Data Management CSE 344

CSE 544 Principles of Database Management Systems

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Database Design and Implementation

Schema Refinement. Feb 4, 2010

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

CMPT 354: Database System I. Lecture 9. Design Theory

Lectures 6. Lecture 6: Design Theory

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSC 261/461 Database Systems Lecture 11

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CSC 261/461 Database Systems Lecture 12. Spring 2018

SCHEMA NORMALIZATION. CS 564- Fall 2015

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

CSC 261/461 Database Systems Lecture 13. Spring 2018

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Introduction to Data Management CSE 344

Schema Refinement and Normal Forms

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Design theory for relational databases

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Design Theory for Relational Databases

Chapter 3 Design Theory for Relational Databases

Design Theory for Relational Databases

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Chapter 3 Design Theory for Relational Databases

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Relational Design Theory

Schema Refinement & Normalization Theory

DECOMPOSITION & SCHEMA NORMALIZATION

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

CS54100: Database Systems

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement and Normal Forms

CS322: Database Systems Normalization

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

Functional Dependencies & Normalization. Dr. Bassam Hammo

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

INF1383 -Bancos de Dados

Schema Refinement and Normal Forms. Chapter 19

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Databases 2012 Normalization

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Relational Database Design

Functional Dependency Theory II. Winter Lecture 21

Schema Refinement and Normal Forms Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

Relational Database Design

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Information Systems (Informationssysteme)

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Functional Dependency and Algorithmic Decomposition

Functional Dependencies and Normalization

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Schema Refinement and Normalization

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Lecture 7: Karnaugh Map, Don t Cares

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Lossless Joins, Third Normal Form

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Functional Dependencies

Schema Refinement and Normal Forms

FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Chapter 11, Relational Database Design Algorithms and Further Dependencies

Lecture #7 (Relational Design Theory, cont d.)

CSE 132B Database Systems Applications

Announcements Wednesday, September 20

Database Design and Normalization

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Comp 5311 Database Management Systems. 5. Functional Dependencies Exercises

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Chapter 8: Relational Database Design

Announcements Monday, September 17

Introduction to Data Management. Lecture #9 (Relational Design Theory, cont.)

Transcription:

Introduction to Management CSE 344 Lectures 17: Design Theory 1

Announcements No class/office hour on Monday Midterm on Wednesday (Feb 19) in class HW5 due next Thursday (Feb 20) No WQ next week (WQ6 due on Feb 25) But try to finish early! (Lecture 15-17) 2

Midterm All material up to and including Lecture 15 SQL, basic evaluation + indexes, RA, datalog-withnegation, RC, XML/XPath/XQuery, E/R diagram Open books, open notes, no electronic devices Don t waste paper printing stuff. Normally, you shouldn t need any notes during the exam. My suggestion is to print, say, 5-6 selected slides from the lecture notes that you had trouble with, and to print your own homework, just in case you forget some cool solution you used there. Make sure you understand all the concepts! 3

Relational Schema Design Conceptual Model: name Product buys Person DONE price name ssn Relational Model: plus FD s DONE Normalization: Eliminates anomalies Today 4

Relational Schema Design Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield One person may have multiple phones, but lives in only one city Primary key is thus (SSN, PhoneNumber) What is the problem with this schema? 5

Relational Schema Design Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Anomalies: Redundancy = repeat data Update anomalies = what if Fred moves to Bellevue? Deletion anomalies = what if Joe deletes his phone number? Can you suggest a solution? 6

Relation Decomposition Break the relation into two: Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Name SSN City Fred Joe 123-45-6789 Seattle 987-65-4321 Westfield SSN PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 Anomalies have gone: No more repeated data Easy to move Fred to Bellevue (how?) Easy to delete all Joe s phone numbers (how?) 7

Relational Schema Design (or Logical Design) How do we do this systematically? Start with some relational schema Find out its functional dependencies (FDs) Use FDs to normalize the relational schema 8

Functional Dependencies (FDs) Definition If two tuples agree on the attributes A 1, A 2,, A n then they must also agree on the attributes Formally: B 1, B 2,, B m A 1 A n determines B 1..B m A 1, A 2,, A n B 1, B 2,, B m 9

Functional Dependencies (FDs) Definition A 1,..., A m B 1,..., B n holds in R if: t, t R, (t.a 1 = t.a 1... t.a m = t.a m t.b 1 = t.b 1... t.b n = t.b n ) R A 1... A m B 1... B n t t if t, t agree here then t, t agree here 10

Example An FD holds, or does not hold on an instance: EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer EmpID Name, Phone, Position Position Phone but not Phone Position 11

Example EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Position Phone 12

Example EmpID Name Phone Position E0045 Smith 1234 Clerk E3542 Mike 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer But not Phone Position 13

Example name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Do all the FDs hold on this instance? 14

Example name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Green Toys 99 Do all the FDs hold on this instance? No. color,category -> price does not hold 15

Example name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 What about this one? 16

Example name color category department color, category price name category color department price Gizmo Gadget Green Toys 49 Tweaker Gadget Black Toys 99 Gizmo Stationary Green Office-supp. 59 What about this one? Yes 17

Terminology FD holds or does not hold on an instance If we can be sure that every instance of R will be one in which a given FD is true, then we say that R satisfies the FD If we say that R satisfies an FD F, we are stating a constraint on R 18

An Interesting Observation If all these FDs are true: name color category department color, category price Then this FD also holds: name, category price If we find out from application domain that a relation satisfies some FDs, it doesn t mean that we found all the FDs that it satisfies! There could be more FDs implied by the ones we have. 19

Closure of a set of Attributes Given a set of attributes A 1,, A n (under FD set F) The closure, {A 1,, A n } + = the set of attributes B that (any reln that satisfies F) satisfies A 1,, A n B Example: 1. name color 2. category department 3. color, category price Closures: name + = {name, color} {name, category} + = {name, category, color, department, price} color + = {color} 20

Closure Algorithm X={A1,, An}. Repeat until X doesn t change do: if B 1,, B n C is a FD and B 1,, B n are all in X then add C to X. Example: 1. name color 2. category department 3. color, category price {name, category} + = {name, category,..} 21

Closure Algorithm X={A1,, An}. Repeat until X doesn t change do: if B 1,, B n C is a FD and B 1,, B n are all in X then add C to X. Example: 1. name color 2. category department 3. color, category price {name, category} + = { name, category, color, department, price } Hence: name, category color, department, price 22

In class: R(A,B,C,D,E,F) Example A, B C A, D E B D A, F B Compute {A,B} + Compute {A, F} + X = {A, B, } X = {A, F, } 23

In class: R(A,B,C,D,E,F) Example A, B C A, D E B D A, F B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, } 24

In class: R(A,B,C,D,E,F) Example A, B C A, D E B D A, F B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } 25

In class: R(A,B,C,D,E,F) Example A, B C A, D E B D A, F B Compute {A,B} + X = {A, B, C, D, E } Compute {A, F} + X = {A, F, B, C, D, E } Can you see a key of R? 26

Find all FD s implied by: Practice at Home A, B C A, D B B D Step 1: Compute X +, for every X: Step 2: Enumerate all FD s X Y, s.t. Y X + and X Y = : 27

Find all FD s implied by: Practice at Home A, B C A, D B B D Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD+ = ABCD 28

Find all FD s implied by: Practice at Home A, B C A, D B B D Step 1: Compute X +, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD + = ABCD (no need to compute why?) BCD + = BCD, ABCD+ = ABCD Step 2: Enumerate all FD s X Y, s.t. Y X + and X Y = : AB CD, AD BC, ABC D, ABD C, ACD B 29

Keys A superkey is a set of attributes A 1,..., A n s.t. for any other attribute B, we have A 1,..., A n B A key is a minimal superkey A superkey and for which no subset is a superkey 30

Computing (Super)Keys For all sets X, compute X + If X + = [all attributes], then X is a superkey Try only the minimal X s to get the keys No subset of X is a superkey 31

Example Product(name, price, category, color) name, category price category color What is the key? 32

Example Product(name, price, category, color) name, category price category color What is the key? (name, category) + = { name, category, price, color } Hence (name, category) is a key 33

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more keys Try in class.... 34

Key or Keys? Can we have more than one key? Given R(A,B,C) define FD s s.t. there are two or more keys A B B C C A or AB C BC A or A BC B AC what are the keys here? 35

Eliminating Anomalies Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield SSN Name, City What is the key? Suggest a rule for decomposing the table to eliminate anomalies 36

Eliminating Anomalies Main idea: X A is OK if X is a (super)key X A is not OK otherwise Need to decompose the table, but how? 37

Boyce-Codd Normal Form There are no bad FDs: Definition. A relation R is in BCNF if: Whenever X B is a non-trivial dependency, then X is a superkey. Equivalently: Definition. A relation R is in BCNF if: X, either X + = X or X + = [all attributes] Trivial FD 38

BCNF Decomposition Algorithm Normalize(R) find X s.t.: X X + [all attributes] if (not found) then R is in BCNF let Y = X + - X; Z = [all attributes] - X + decompose R into R1(X Y) and R2(X Z) Normalize(R1); Normalize(R2); Y X Z X + 39

Example Name SSN PhoneNumber City Fred 123-45-6789 206-555-1234 Seattle Fred 123-45-6789 206-555-6543 Seattle Joe 987-65-4321 908-555-2121 Westfield Joe 987-65-4321 908-555-1234 Westfield SSN Name, City The only key is: {SSN, PhoneNumber} Hence SSN Name, City is a bad dependency Name, City SSN + In other words: SSN+ = Name, City and is neither SSN nor All Attributes SSN Phone- Number 40

Example BCNF Decomposition Name SSN City Fred Joe SSN 123-45-6789 Seattle 987-65-4321 Westfield PhoneNumber 123-45-6789 206-555-1234 123-45-6789 206-555-6543 987-65-4321 908-555-2121 987-65-4321 908-555-1234 SSN Name, City Name, City SSN + SSN Phone- Number Let s check anomalies: Redundancy? Update Fred s city? Delete Joe s all ph? 41

Find X s.t.: X X + [all attributes] Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN name, age age haircolor 42

Find X s.t.: X X + [all attributes] Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber, attr) SSN name, age age haircolor Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber, attr) name, SSN age, haircolor phonenumber 43

Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN name, age age haircolor Find X s.t.: X X + [all attributes] Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) What are the keys? Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber) 44

Example BCNF Decomposition Person(name, SSN, age, haircolor, phonenumber) SSN name, age age haircolor Find X s.t.: X X + [all attributes] Iteration 1: Person: SSN+ = SSN, name, age, haircolor Decompose into: P(SSN, name, age, haircolor) Phone(SSN, phonenumber) Note the keys! Iteration 2: P: age+ = age, haircolor Decompose: People(SSN, name, age) Hair(age, haircolor) Phone(SSN, phonenumber) 45

R(A,B,C,D) Practice at Home A B B C R(A,B,C,D) A + = ABC ABCD 46

R(A,B,C,D) Practice at Home A B B C R(A,B,C,D) A + = ABC ABCD R 1 (A,B,C) B + = BC ABC R 2 (A,D) R 11 (B,C) R 12 (A,B) What are the keys? What happens if in R we first pick B +? Or AB +? 47