Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Similar documents
Schema Refinement & Normalization Theory

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normalization

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms Chapter 19

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Schema Refinement and Normal Forms

Functional Dependencies

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

12/3/2010 REVIEW ALGEBRA. Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

Lecture #7 (Relational Design Theory, cont d.)

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Normal Forms 1. ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Schema Refinement and Normal Forms. Why schema refinement?

Functional Dependencies & Normalization. Dr. Bassam Hammo

MIS Database Systems Schema Refinement and Normal Forms

Functional Dependencies, Schema Refinement, and Normalization for Relational Databases. CSC 375, Fall Chapter 19

Schema Refinement. Feb 4, 2010

Introduction to Data Management. Lecture #9 (Relational Design Theory, cont.)

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

MIS Database Systems Schema Refinement and Normal Forms

Relational Database Design

SCHEMA NORMALIZATION. CS 564- Fall 2015

Constraints: Functional Dependencies

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Constraints: Functional Dependencies

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Database Design and Normalization

Design theory for relational databases

INF1383 -Bancos de Dados

Introduction to Data Management. Lecture #10 (DB Design Wrap-up)

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Functional Dependency and Algorithmic Decomposition

CSC 261/461 Database Systems Lecture 13. Spring 2018

Chapter 8: Relational Database Design

Lectures 6. Lecture 6: Design Theory

Lecture 15 10/02/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

CSIT5300: Advanced Database Systems

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Functional Dependencies and Normalization

Chapter 7: Relational Database Design

Func8onal Dependencies

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CSE 132B Database Systems Applications

CS54100: Database Systems

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Schema Refinement: Other Dependencies and Higher Normal Forms

CS322: Database Systems Normalization

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

CMPT 354: Database System I. Lecture 9. Design Theory

Database Design and Implementation

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

CSC 261/461 Database Systems Lecture 12. Spring 2018

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 10. Logical consequence (implication) Implication problem for fds

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013

HKBU: Tutorial 4

Functional Dependencies

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Introduction to Management CSE 344

Relational Design: Characteristics of Well-designed DB

Introduction to Data Management. Lecture #10 (Relational Design Theory, cont.)

Information Systems (Informationssysteme)

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Normaliza)on and Func)onal Dependencies

Functional Dependencies and Normalization

CSE 344 MAY 16 TH NORMALIZATION

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

Relational Design Theory

Design Theory for Relational Databases

CSC 261/461 Database Systems Lecture 11

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

Chapter 3 Design Theory for Relational Databases

Transcription:

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Background We started with schema design ER model translation into a relational schema Then we studied relational query languages relational algebra, relational calculus, SQL Now, we are back to schema design... we will finish the semester by considering how to refine an existing schema INFS614, GMU 2

An Example Consider the relation: Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked) (call it SNLRWH, for short) What if we know that an employee s rating (R) determines their hrly_wages (W)? S N L R W H 123-22-3666 Attishoo 48 8 10 40 231-31-5368 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 The information in W is repeated (needlessly)! INFS614, GMU 3

Redundancy When part of database can be derived from other parts, redundancy exists. Example: the hrly_wage of Smiley is the same as the hrly_wage of Attishoo because they have the same rating and we know rating determines hrly_wage. This redundancy exists because of underlying integrity constraints called functional dependencies. The topic this lecture! rating -> hrly_wage INFS614, GMU 4

Is Redundancy Good or Bad? Bad: In this example, it seems like it would be cleaner to have a lookup table relating rating and hourly wages Instead of awkwardly embedding such info in this table. Good: Materialized views, and entire data warehouses, are redundant! This is useful because it saves expensive (re)computation. In both cases, it s important to understand redundancy, its sources, & how to manage it. INFS614, GMU 5

Problems Which Can Arise From Redundancy Update anomaly: What if we change W in just the 1st tuple of SNLRWH? Insertion anomaly: What if we want to insert an employee and don t know the hourly wage for his rating? Or use the wrong wage? Deletion anomaly: What if we delete all employees with rating 5, we lose the information about the wage for rating 5! Sub-optimal design: It would be a cleaner design to have a reference table relating ratings to wages. Extra Storage: (although storage is cheap) INFS614, GMU 6

S N L R W H 123-22-3666 Attishoo 48 8 10 40 231-31-5368 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 INFS614, GMU 7

Overview We need to understand functional dependencies, and how they can lead to redundancy in a schema design. update/insert/delete anomolies intuitively awkward designs Schema refinement manages functional dependencies Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD). Goal: decompose into a normal form with less redundancy several are possible: e.g., 3NF, BCNF... Decomposition should be used judiciously; new issues can result: Violate desirable properties of decompositions (lossless-join and dependency-preservation.) Over-normalization INFS614, GMU 8

Functional Dependencies (FDs) Functional dependencies are a type of integrity constraint: The value of one set of attributes depends on the value of another set of attributes in the same relation Example: Emps (ssn, name, lot, rank, salary) rank salary (e.g., rank determines the value of salary) rank lot ssn name lot rank salary 123456789 Will Smith A1 manager 150000 123456679 Bob Brown A1 manager 150000 123456780 Ned Wang A2 programmer 70000 092873817 Joyce Braddock A1 analyst 85000 73681721 Edward Madison A2? programmer 70000? INFS614, GMU 9

Functional Dependencies (FDs) A functional dependency (FD) has the form: X Y, where X and Y are two sets of attributes. Examples: Rating Hrly_wage, AB C X, Y, Z denote attribute sets; A, B, C single attributes. FD X Y is satisfied by relation instance r if: For all pairs of tuples t1 and t2 r: (t1[x] = t2[x]) (t1[y] =t2[y]) i.e., given any two tuples in r, if the X values agree, then the Y values must also agree. The FD X Y is NOT satisfied by r if: There exists a pair of tuples t1 and t2 in r such that t1[x] = t2[x] but t1[y] t2[y] i.e., we can find two tuples in r, such that X values agree, but Y values don t. INFS614, GMU 10

Functional Dependencies (FDs) The FD holds over relation R if, for every (allowable) instance r of R, r satisfies the FD. An FD, as an integrity constraint, is a statement about all allowable relation instances. Given some instance r1 of R, we can check if it violates some FD f or not But we cannot infer f by looking at an instance! FD s must be determined from application semantics e.g., Its company policy that rating determines salary as for all integrity constraints! INFS614, GMU 11

Decomposition Example: Hourly_Emps Two FDs on Hourly_Emps: Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked); denoted: SNLRWH ssn is the key: S SNLRWH rating determines hrly_wages: R W Decomposition & its effect on these FDs SNLRWH is replaced by SNLRH and RW R W is isolated in a separate relation RW SNLRH has FD: S SNLRH INFS614, GMU 12

Instance-Level Decomposition RW R W SNLRWH S N L R W H 123-22-3666 Attishoo 48 8 10 40 231-31-5368 Smiley 22 8 10 30 131-24-3650 Smethurst 35 5 7 30 434-26-3751 Guldu 35 5 7 32 612-67-4134 Madayan 35 8 10 40 SNLRH S N L R H 8 10 123-22-3666 Attishoo 48 8 40 5 7 231-31-5368 Smiley 22 8 30 131-24-3650 Smethurst 35 5 30 434-26-3751 Guldu 35 5 32 612-67-4134 Madayan 35 8 40 INFS614, GMU 13

Decomposition at the ER Diagram Level Entity set Hourly_Emps name lot rank hrly_wages ssn A decomposed E-R design Hourly_Emps R W hrs_worked name lot ssn hrs_worked rank hrly_wages Hourly_Emps has-rank Ranks R W INFS614, GMU 14

FD Examples A B C 1 1 2 1 1 3 2 1 3 2 1 2 How many possible FDs totally on this relation instance? 49. Possible FDs with A as the left side: A A A B A C A AB A AC A BC A ABC Satisfied by this relation instance? Yes Yes No Yes No No No 7 non-empty subsets of {A,B,C} on the left x 7 non-empty subsets of {A,B,C} on the right INFS614, GMU 15

FD Examples (cont.) A B C 1 1 2 1 1 3 2 1 3 2 1 2 FD C B C AB B C B B AC B Satisfied by this relation instance? Yes No No Yes Yes INFS614, GMU 16

Relationship between FDs and Keys Let R be the set of all attributes in relation R, and K be a key X R means X is a (super)key of R. K R is always an FD For example: Given relation R(A, B, C). A ABC means that A is a key. Keys have a lot in common with FDs! Big Insight: the LHS (left hand side) of every FD would like to be a key in its own relation INFS614, GMU 17

Reasoning About FDs Given some FDs, we can usually infer additional FDs: Employees(ssn, name, lot, did, since) ssn name,lot,did,since ssn did, did lot implies ssn lot A BC implies A B An FD f is logically implied by a set of FDs F if f holds whenever all FDs in F hold. F + = closure of F is the set of all FDs that are implied by F. INFS614, GMU 18

Reasoning about FDs How do we get all the FDs that are logically implied by a given set of FDs? Armstrong s Axioms (X, Y, Z are sets of attributes): Reflexivity: If X Y, then X Y Augmentation: If X Y, then XZ YZ for any Z Transitivity: If X Y and Y Z, then X Z INFS614, GMU 19

Armstrong s axioms Armstrong s axioms are sound and complete inference rules for FDs! Sound: the set of new FD s derived using the axioms contains only (no more than) those logically implied by the base set of FD s i.e., they don t derive any non-inferable FD s Complete: all FD s logically implied by the base set of FDs can be derived using the axioms. i.e., they derive all the inferable FD s INFS614, GMU 20

Example of using Armstrong s Axioms Couple of additional rules (that follow from AA): Union: If X Y and X Z, then X YZ Decomposition: If X YZ, then X Y and X Z Derive these using Armstrong s axioms! Union: By augmentation: XX XY and XY YZ By transitivity: XX YZ, or X YZ Decomposition: By reflexivity: YZ Y By transitivity: X YZ and YZ Y produces X Y INFS614, GMU 21

Reasoning About FDs Given Contracts(contractid,supplierid,projectid,deptid,partid,qty,value) C is the key: C CSJDPQV A project purchases a given part on at most one contract: JP C Dept purchases at most one part from a supplier: SD P {JP C, C CSJDPQV} {JP CSJDPQV} {SD P} {SDJ JP} {SDJ JP, JP CSJDPQV} {SDJ CSJDPQV} Three new derived FDs! INFS614, GMU 22

Reasoning About FDs Can we derive the FD A E from F = {A B, B C, C D E }? i.e, is A E in F +? (the closure of F) Computing the closure of a set of FDs can be expensive. Cardinality of closure can be exponential in # attrs! Typically, we just want to check if FD X Y is in the closure of a set of FDs F. Good news: We don t have to actually compute F + to do this! INFS614, GMU 23

Testing For Membership in F + 1) Given a set of FD s F, and a new FD X Y Is X Y in F +? 2) To test this, we first compute the attribute closure of X (denoted X + ) wrt F: X + = the set of all attributes A such that X A is in F + There is a linear time algorithm to compute this. (We learn it on the next slide.) 3) Then, check if Y is in X + If Y is in X +, then X Y is in F + INFS614, GMU 24

Computing X + Inputs: F (a set of FDs), and an FD: X Y Output: Result=X + Method: Step 1: Result := X; Step 2: If R S F, and R Result: Result := S Result Repeat step 2 until Result cannot be changed,then output Result. INFS614, GMU 25

Example Using X + F={A B, AC D, AB C} Is A BCD an FD in F +? Compute: A + = ABCD ABCD BCD; therefore YES! Note: we can find all candidate keys in relation R with this algorithm! start with each X in R, where X is a single attribute if X + = all attributes in R, X is a candidate key for R grow X to 2, 3... attributes in length, until done (due to minimality) INFS614, GMU 26

Computing F + (Using X + ) Given F={ A B, B C}. Compute F + (with attributes A, B, C). A B C AB AC BC ABC A B C AB AC BC ABC Attribute closure A + =ABC B + =BC C + =C AB + =ABC AC + =ABC BC + =BC ABC + =ABC An entry with means FD (the row) (the column) is in F +. An entry gets when (the column) is in (the row) + INFS614, GMU 27

Check if two sets of FDs are equivalent Two sets F and G of FDs are equivalent if they logically imply the same set of FDs. I.e., if F + = G +, then they are equivalent. For example, F={A B, A C} is equivalent to G={A BC} How to test? Two steps: Every FD in F is in G + Every FD in G is in F + These two steps can use the algorithm (many times) for X + INFS614, GMU 28

Example Given F = {BD E, C A, E C, A E} and G = {C E,BD C}. Are G and F equivalent? 1. For each FD s X Y in G, X Y F + holds? C F + = { A,C, E} : C E F + (BD) F + = { B,D,E,C,A} : BD C F + 2. For each FD s W Z in F, W Z G + holds? (BD) G + = { B,D,C,E} : BD E G + (C) G + = {C,E} : C A G + v F and G are not equivalent! INFS614, GMU 29

Summary We need to understand functional dependencies, and how they can lead to schema design issues: update/insert/delete anomolies due to redundancy intuitively awkward designs Next week: Schema refinement can alleviate such problems Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD). Goal: decompose into a normal form several are possible: e.g., 3NF, BCNF... Decomposition should be used judiciously; problems can result: Violate desirable properties of decompositions (lossless-join and dependency-preservation.) Over-normalization INFS614, GMU 30

Extra Slides INFS614, GMU 31

Refining an ER Diagram 1st diagram translated: Employees(S,N,L,D,S) Departments(D,M,B) Lots associated with workers. Before: ssn name Employees Suppose all workers in a dept are assigned the same lot: D L After: Can adjust this to: name Employees2(S,N,D,S) ssn Departments2(D,M,B,L) lot since dname did budget Works_In Departments budget since dname did lot Employees Works_In Departments INFS614, GMU 32

Equivalences of Sets of FDs Two set of FD s G and F are equivalent if : G + = F + Every FD in G can be inferred from F, and vice versa Testing Equivalence of FD Sets 1. For each FD X Y in G, check if X Y F + holds Compute the closure X + w.r.t. F If Y X + then X Y F + 2. For each FD s W Z in F, check if W Z G + holds Compute the closure W + w.r.t. G If Z W + then W Z G + INFS614, GMU 33