FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Similar documents
FUNCTIONAL DEPENDENCY THEORY II. CS121: Relational Databases Fall 2018 Lecture 20

Functional Dependency Theory II. Winter Lecture 21

Chapter 7: Relational Database Design

CSE 132B Database Systems Applications

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Chapter 8: Relational Database Design

Relational Database Design

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

Relational Database Design

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

CSIT5300: Advanced Database Systems

SCHEMA NORMALIZATION. CS 564- Fall 2015

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Schema Refinement and Normal Forms Chapter 19

Database Design and Implementation

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms

Schema Refinement & Normalization Theory

Schema Refinement and Normal Forms

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Example (Contd.

The Evils of Redundancy. Schema Refinement and Normal Forms. Example: Constraints on Entity Set. Functional Dependencies (FDs) Refining an ER Diagram

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) CIS 330, Spring 2004 Lecture 11 March 2, 2004

Schema Refinement and Normal Forms. The Evils of Redundancy. Functional Dependencies (FDs) [R&G] Chapter 19

The Evils of Redundancy. Schema Refinement and Normal Forms. Functional Dependencies (FDs) Example: Constraints on Entity Set. Example (Contd.

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

CS54100: Database Systems

Lecture 6 Relational Database Design

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Relational Design: Characteristics of Well-designed DB

Shuigeng Zhou. April 6/13, 2016 School of Computer Science Fudan University

INF1383 -Bancos de Dados

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Relational-Database Design

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement. Feb 4, 2010

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

CSC 261/461 Database Systems Lecture 13. Spring 2018

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

Constraints: Functional Dependencies

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

CSC 261/461 Database Systems Lecture 11

Design Theory for Relational Databases

Functional Dependencies

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Functional Dependencies and Normalization

Lectures 6. Lecture 6: Design Theory

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Schema Refinement and Normal Forms

Schema Refinement and Normal Forms. Why schema refinement?

Schema Refinement and Normalization

CS 186, Fall 2002, Lecture 6 R&G Chapter 15

CMPT 354: Database System I. Lecture 9. Design Theory

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

CS322: Database Systems Normalization

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Normal Forms. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Constraints: Functional Dependencies

DECOMPOSITION & SCHEMA NORMALIZATION

Design theory for relational databases

Functional Dependency and Algorithmic Decomposition

Design Theory for Relational Databases

Lossless Joins, Third Normal Form

CSC 261/461 Database Systems Lecture 12. Spring 2018

Functional Dependencies & Normalization. Dr. Bassam Hammo

LOGICAL DATABASE DESIGN Part #1/2

Functional Dependencies

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

CSE 344 MAY 16 TH NORMALIZATION

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

Practice and Applications of Data Management CMPSCI 345. Lecture 16: Schema Design and Normalization

Normal Forms Lossless Join.

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

CSE 344 AUGUST 3 RD NORMALIZATION

Databases 2012 Normalization

Chapter 3 Design Theory for Relational Databases

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Introduction to Management CSE 344

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

Normal Forms (ii) ICS 321 Fall Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Relational Design Theory

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

HKBU: Tutorial 4

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Database Normaliza/on. Debapriyo Majumdar DBMS Fall 2016 Indian Statistical Institute Kolkata

Introduction to Data Management CSE 344

Comp 5311 Database Management Systems. 5. Functional Dependencies Exercises

Background: Functional Dependencies. æ We are always talking about a relation R, with a æxed schema èset of attributesè and a

Normaliza)on and Func)onal Dependencies

Relational Design Theory II. Detecting Anomalies. Normal Forms. Normalization

Transcription:

FUNCTIONAL DEPENDENCY THEORY CS121: Relational Databases Fall 2017 Lecture 19

Last Lecture 2 Normal forms specify good schema patterns First normal form (1NF): All attributes must be atomic Easy in relational model, harder/less desirable in SQL Boyce-Codd normal form (BCNF): Eliminates redundancy using functional dependencies Given a relation schema R and a set of dependencies F For all functional dependencies a b in F +, where a b Í R, at least one of these conditions must hold: n a b is a trivial dependency n a is a superkey for R

Last Lecture (2) 3 Can convert a schema into BCNF If R is a schema not in BCNF: There is at least one nontrivial functional dependency a b Î F + such that a is not a superkey for R Replace R with two schemas: (a b) (R (b a)) May need to repeat this decomposition process until all schemas are in BCNF

Functional Dependency Theory 4 Important to be able to reason about functional dependencies! Main question: What functional dependencies are implied by a set F of functional dependencies? Other useful questions: Which attributes are functionally determined by a particular attribute-set? What minimal set of functional dependencies must actually be enforced in a database? Is a particular schema decomposition lossless? Does a decomposition preserve dependencies?

Rules of Inference 5 Given a set F of functional dependencies Actual dependencies listed in F may be insufficient for normalizing a schema Must consider all dependencies logically implied by F For a relation schema R A functional dependency f on R is logically implied by F on R if every relation instance r(r) that satisfies F also satisfies f Example: Relation schema R(A, B, C, G, H, I) Dependencies: A B, A C, CG H, CG I, B H Logically implies: A H, CG HI, AG I

Rules of Inference (2) 6 Axioms are rules of inference for dependencies This group is called Armstrong s axioms Greek letters a, b, g, represent attribute sets Reflexivity rule: If a is a set of attributes and b Í a, then a b holds. Augmentation rule: If a b holds, and g is a set of attributes, then ga gb holds. Transitivity rule: If a b holds, and b g holds, then a g holds.

Computing Closure of F 7 Can use Armstrong s axioms to compute F + from F F is a set of functional dependencies F + = F repeat for each functional dependency f in F + apply reflexivity and augmentation rules to f add resulting functional dependencies to F + for each pair of functional dependencies f 1, f 2 in F + if f 1 and f 2 can be combined using transitivity add resulting functional dependency to F + until F + stops changing

Armstrong s Axioms 8 Axioms are sound They don t generate any incorrect functional dependencies Axioms are complete Given a set of functional dependencies F, repeated application generates all F + F + could be very large LHS and RHS of a dependency are subsets of R A set of size n has 2 n subsets 2 n 2 n = 2 2n possible functional dependencies in R!

More Rules of Inference 9 Additional rules can be proven from Armstrong s axioms These make it easier to generate F + Union rule: If a b holds, and a g holds, then a bg holds. Decomposition rule: If a bg holds, then a b holds and a g holds. Pseudotransitivity rule: If a b holds, and gb d holds, then ag d holds.

Attribute-Set Closure 10 How to tell if an attribute-set a is a superkey? If a R then a is a superkey. What attributes are functionally determined by an attribute-set a? Given: Attribute-set a Set of functional dependencies F The set of all attributes functionally determined by a under F is called the closure of a under F Written as a +

Attribute-Set Closure (2) 11 It s easy to compute the closure of attribute-set a! Algorithm is very simple Inputs: attribute-set a set of functional dependencies F a + = a repeat for each functional dependency b g in F if b Í a + then a + = a + g until a + stops changing

Attribute-Set Closure (3) 12 Can easily test if a is a superkey Compute a + If R Í a + then a is a superkey of R Can also use to identify functional dependencies a b holds if b Í a + n Find closure of a under F; if it contains b then a b holds! Can compute F + with attribute-set closure too: n For each g Í R, find closure g + under F n We know that g g + n For each subset S Í g +, add functional dependency g S

Attribute-Set Closure Example 13 Relation schema R(A, B, C, G, H, I) Dependencies: A B, A C, CG H, CG I, B H Is AG a superkey of R? Compute (AG) + Start with a + = AG A B, A C cause a + = ABCG CG H, CG I cause a + = ABCGHI AG is a superkey of R!

Attribute-Set Closure Example (2) 14 Relation schema R(A, B, C, G, H, I) Dependencies: A B, A C, CG H, CG I, B H Is AG a candidate key of R? A candidate key is a minimal superkey Compute attribute-set closure of all proper subsets of superkey; if we get R then it s not a candidate key Compute the attribute-set closures under F A + = ABCH G + = G AG is indeed a candidate key!

BCNF Revisited 15 BCNF algorithm states, if R i is a schema not in BCNF: There is at least one nontrivial functional dependency a b such that a is not a superkey for R i Two points: a b Î F +, not just in F For R i, only care about func. deps. where a b Î R i How do we tell if R i is not in BCNF? Can use attribute-set closure under F to find if there is a dependency in F + that affects R i For each proper subset a Ì R i, compute a + under F If a + doesn t contain R i, but a + does contain any attributes in R i a, then R i is not in BCNF

BCNF Revisited (2) 16 If a + doesn t contain R i, but a + does contain any attributes in R i a, then R i is not in BCNF If a + doesn t contain R i, what do we know about a with respect to R i? a is not a superkey of R i If a + contains attributes in R i a : Let b = R i Ç (a + a) We know there is some non-trivial functional dependency a b that holds on R i Since a b holds on R i, but a is not a candidate key of R i, we know that R i cannot be in BCNF.

BCNF Example 17 Start with schema R(A, B, C, D, E), and F = { A B, BC D } Is R in BCNF? Obviously not. Using A B, decompose into R 1 (A, B) and R 2 (A, C, D, E) Are we done? Pseudotransitivity rule says that if a b and gb d, then ag d AC D also holds on R 2, so R 2 is not in BCNF! Or, compute {AC} + = ABCD. Again, R 2 is not in BCNF.

Database Constraints 18 Enforcing database constraints can easily become very expensive Especially CHECK constraints! Best to define database schema such that constraint enforcement is efficient Ideally, enforcing a functional dependency involves only one relation Then, can specify a key constraint instead of a multitable CHECK constraint!

Example: Personal Bankers 19 Bank sets a requirement on employees: Each employee can work at only one branch emp_id branch_name Bank wants to give customers a personal banker at each branch At each branch, a customer has only one personal banker (A customer could have personal bankers at multiple branches.) cust_id, branch_name emp_id

Personal Bankers 20 E-R diagram: works_in employee emp_id emp_name branch branch_name branch_city assets cust_ banker_ branch customer cust_id cust_name Relationship-set schemas: type works_in(emp_id, branch_name) cust_banker_branch(cust_id, branch_name, emp_id, type)

Personal Bankers (2) 21 Schemas: works_in(emp_id, branch_name) cust_banker_branch(cust_id, branch_name, emp_id, type) Is this schema in BCNF? emp_id branch_name cust_banker_branch isn t in BCNF n emp_id isn t a candidate key on cust_banker_branch cust_banker_branch repeats branch_name unnecessarily, since emp_id branch_name Decompose into two BCNF schemas: works_in already has (emp_id, branch_name) (a b) Create cust_banker(cust_id, emp_id, type) (R (b a))

Personal Bankers (3) 22 New BCNF schemas: works_in(emp_id, branch_name) cust_banker(cust_id, emp_id, type) A customer can have one personal banker at each branch, so both cust_id and emp_id must be in the primary key Any problems with this new BCNF version? Now we can t easily constrain that each customer has only one personal banker at each branch! Could still create a complicated CHECK constraint involving multiple tables

Preserving Dependencies 23 The BCNF decomposition doesn t preserve this dependency: cust_id, branch_name emp_id Can t enforce this dependency within a single table In general, BCNF decompositions are not dependency-preserving Some functional dependencies are not enforceable within a single table Can t enforce them with a simple key constraint, so they are more expensive Solution: Third Normal Form

Third Normal Form 24 Slightly weaker than Boyce-Codd normal form Preserves more functional dependencies Also allows more repeated information! Given: Relation schema R Set of functional dependencies F R is in 3NF with respect to F if: For all functional dependencies a b in F +, where a Í R and b Í R, at least one of the following holds: n a b is a trivial dependency n a is a superkey for R n Each attribute A in b a is contained in a candidate key for R

Third Normal Form (2) 25 New condition: Each attribute A in b a is contained in a candidate key for R A general constraint: Doesn t require a single candidate key to contain all attributes in b a Just requires that each attribute in b a appears in some candidate key in R possibly even different candidate keys!

Personal Banker Example 26 Our non-bcnf personal banker schemas again: works_in(emp_id, branch_name) cust_banker_branch(cust_id, branch_name, emp_id, type) Is this schema in 3NF? emp_id branch_name cust_id, branch_name emp_id works_in is in 3NF (emp_id is the primary key) What about cust_banker_branch? Both dependencies hold on cust_banker_branch n emp_id branch_name, but emp_id isn t the primary key n cust_id, branch_name emp_id ; is emp_id part of any candidate key on cust_banker_branch?

Personal Banker Example (2) 27 Look carefully at the functional dependencies: Primary key of cust_banker_branch is (cust_id, branch_name) n { cust_id, branch_name } cust_banker_branch (all attributes) (constraint arises from the E-R diagram & schema translation) n (Also specified this constraint: cust_id, branch_name emp_id) We also know that emp_id branch_name Pseudotransitivity rule: if a b and gb d, then ag d n { emp_id } { branch_name } n { cust_id, branch_name } cust_banker_branch n Therefore, { emp_id, cust_id } cust_banker_branch also holds! (cust_id, emp_id) is a candidate key of cust_banker_branch So cust_banker_branch is in fact in 3NF (And we need to enforce this second candidate key too )

Canonical Cover 28 Given a relation schema, and a set of functional dependencies F Database needs to enforce F on all relations Invalid changes should be rolled back F could contain a lot of functional dependencies Dependencies might even logically imply each other Want a minimal version of F, that still represents all constraints imposed by F Should be more efficient to enforce minimal version

Canonical Cover (2) 29 A canonical cover F c for F is a set of functional dependencies such that: F logically implies all dependencies in F c F c logically implies all dependencies in F Can t infer any functional dependency in F c from other dependencies in F c No functional dependency in F c contains an extraneous attribute Left side of all functional dependencies in F c are unique n There are no two dependencies a 1 b 1 and a 2 b 2 in F c such that a 1 = a 2

Extraneous Attributes 30 Given a set of functional dependencies F An attribute in a functional dependency is extraneous if it can be removed from F without affecting closure of F Formally: given F, and a b If A Î a, and F logically implies (F {a b}) {(a A) b}, then A is extraneous If A Î b, and (F {a b}) {a (b A)} logically implies F, then A is extraneous n i.e. generate a new set of functional dependencies F' by replacing a b with a (b A) n See if F' logically implies F

Testing Extraneous Attributes 31 Given relation schema R, and a set F of functional dependencies that hold on R Attribute A in a b If A Î a (i.e. A is on left side of the dependency), then let g = a {A} See if g b can be inferred from F Compute g + under F If b Ê g +, then A is extraneous in a

Testing Extraneous Attributes (2) 32 Given relation schema R, and a set F of functional dependencies that hold on R Attribute A in a b If A Î b (on right side of the dependency), then try the altered set F' F' = (F {a b}) {a (b A)} See if a A can be inferred from F' Compute a + under F' If a + includes A, then A is extraneous in b

Computing Canonical Cover 33 A simple way to compute the canonical cover of F F c = F repeat apply union rule to replace dependencies in F c of form a 1 b 1 and a 1 b 2 with a 1 b 1 b 2 find a functional dependency a b in F c with an extraneous attribute /* Use F c for the extraneous attribute test, not F!!! */ if an extraneous attribute is found, delete it from a b until F c stops changing

Canonical Cover Example 34 Functional dependencies F on schema (A, B, C) F = { A BC, B C, A B, AB C } Find F c Apply union rule to A BC and A B Left with: { A BC, B C, AB C } A is extraneous in AB C B C is logically implied by F (obvious) Left with: { A BC, B C } C is extraneous in A BC Logically implied by A B, B C F c = { A B, B C }

Another Example Functional dependencies F on schema (A, B, C, D) F = { A B, BC D, AC D } Find F c In this case, it may look like F c = F However, can infer AC D from A B, BC D (pseudotransitivity), so AC D is extraneous in F Therefore, F c = { A B, BC D } Alternately, can argue that D is extraneous in AC D With F' = { A B, BC D }, we see that {AC} + = ACD, so D is extraneous in AC D (If you eliminate the entire RHS of a functional dependency, it goes away)

Canonical Covers 36 A set of functional dependencies can have multiple canonical covers! Example: F = { A BC, B AC, C AB } Has several canonical covers: n F c = { A B, B C, C A } n F c = { A B, B AC, C B } n F c = { A C, C B, B A } n F c = { A C, B C, C AB } n F c = { A BC, B A, C A }