Functional Dependencies. Chapter 3

Similar documents
UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

Relational Database Design

SCHEMA NORMALIZATION. CS 564- Fall 2015

CS54100: Database Systems

Relational Database Design Theory Part II. Announcements (October 12) Review. CPS 116 Introduction to Database Systems

Design Theory for Relational Databases. Spring 2011 Instructor: Hassan Khosravi

Database Design and Implementation

CSC 261/461 Database Systems Lecture 10 (part 2) Spring 2018

Lectures 6. Lecture 6: Design Theory

Schema Refinement. Feb 4, 2010

Constraints: Functional Dependencies

Design Theory for Relational Databases

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh

CSE 544 Principles of Database Management Systems

DESIGN THEORY FOR RELATIONAL DATABASES. csc343, Introduction to Databases Renée J. Miller and Fatemeh Nargesian and Sina Meraji Winter 2018

Introduction to Data Management CSE 344

Data Dependencies in the Presence of Difference

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Schema Refinement & Normalization Theory

Constraints: Functional Dependencies

Functional Dependencies. Applied Databases. Not all designs are equally good! An example of the bad design

Database Design: Normal Forms as Quality Criteria. Functional Dependencies Normal Forms Design and Normal forms

Design Theory: Functional Dependencies and Normal Forms, Part I Instructor: Shel Finkelstein

Information Systems for Engineers. Exercise 8. ETH Zurich, Fall Semester Hand-out Due

CSE 303: Database. Outline. Lecture 10. First Normal Form (1NF) First Normal Form (1NF) 10/1/2016. Chapter 3: Design Theory of Relational Database

Review: Keys. What is a Functional Dependency? Why use Functional Dependencies? Functional Dependency Properties

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CSIT5300: Advanced Database Systems

CSE 344 AUGUST 3 RD NORMALIZATION

Chapter 8: Relational Database Design

Schema Refinement and Normal Forms. Why schema refinement?

CSE 344 MAY 16 TH NORMALIZATION

Functional. Dependencies. Functional Dependency. Definition. Motivation: Definition 11/12/2013

Introduction to Database Systems CSE 414. Lecture 20: Design Theory

CSC 261/461 Database Systems Lecture 8. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #15: BCNF, 3NF and Normaliza:on

Relational Design: Characteristics of Well-designed DB

L13: Normalization. CS3200 Database design (sp18 s2) 2/26/2018

Relational-Database Design

Chapter 3 Design Theory for Relational Databases

Design Theory for Relational Databases

Relational Design Theory I. Functional Dependencies: why? Redundancy and Anomalies I. Functional Dependencies

10/12/10. Outline. Schema Refinements = Normal Forms. First Normal Form (1NF) Data Anomalies. Relational Schema Design

Functional Dependencies. Getting a good DB design Lisa Ball November 2012

Design theory for relational databases

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Functional Dependencies & Normalization. Dr. Bassam Hammo

CSC 261/461 Database Systems Lecture 12. Spring 2018

Exam 1 Solutions Spring 2016

CSC 261/461 Database Systems Lecture 11

Sets. Alice E. Fischer. CSCI 1166 Discrete Mathematics for Computing Spring, Outline Sets An Algebra on Sets Summary

11/6/11. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

INF1383 -Bancos de Dados

Schema Refinement and Normal Forms

Chapter 7: Relational Database Design

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Practice and Applications of Data Management CMPSCI 345. Lecture 15: Functional Dependencies

11/1/12. Relational Schema Design. Relational Schema Design. Relational Schema Design. Relational Schema Design (or Logical Design)

Relational Design Theory

CSE 132B Database Systems Applications

Information Systems (Informationssysteme)

Functional Dependencies

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

Design Theory. Design Theory I. 1. Normal forms & functional dependencies. Today s Lecture. 1. Normal forms & functional dependencies

Chapter 7: Relational Database Design. Chapter 7: Relational Database Design

Functional Dependency and Algorithmic Decomposition

CSC 261/461 Database Systems Lecture 13. Spring 2018

Chapter 3 Design Theory for Relational Databases

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

FUNCTIONAL DEPENDENCY THEORY. CS121: Relational Databases Fall 2017 Lecture 19

Schema Refinement: Other Dependencies and Higher Normal Forms

Schema Refinement. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Functional Dependencies

Data Bases Data Mining Foundations of databases: from functional dependencies to normal forms

Schema Refinement and Normal Forms. Case Study: The Internet Shop. Redundant Storage! Yanlei Diao UMass Amherst November 1 & 6, 2007

A few details using Armstrong s axioms. Supplement to Normalization Lecture Lois Delcambre

LOGICAL DATABASE DESIGN Part #1/2

CMPS Advanced Database Systems. Dr. Chengwei Lei CEECS California State University, Bakersfield

The Evils of Redundancy. Schema Refinement and Normalization. Functional Dependencies (FDs) Example: Constraints on Entity Set. Refining an ER Diagram

Detecting Functional Dependencies Felix Naumann

Functional Dependencies and Normalization

Relational Database Design

Databases Lecture 8. Timothy G. Griffin. Computer Laboratory University of Cambridge, UK. Databases, Lent 2009

MA554 Assessment 1 Cosets and Lagrange s theorem

Schema Refinement and Normal Forms. The Evils of Redundancy. Schema Refinement. Yanlei Diao UMass Amherst April 10, 2007

Comp 5311 Database Management Systems. 5. Functional Dependencies Exercises

Functional Dependencies and Normalization

Schema Refinement and Normal Forms

Database System Concepts, 5th Ed.! Silberschatz, Korth and Sudarshan See for conditions on re-use "

CS322: Database Systems Normalization

Schema Refinement and Normal Forms. Chapter 19

Schema Refinement and Normal Forms

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

Introduction to Management CSE 344

Normalization. October 5, Chapter 19. CS445 Pacific University 1 10/05/17

Lossless Joins, Third Normal Form

HKBU: Tutorial 4

Schema Refinement and Normal Forms

L14: Normalization. CS3200 Database design (sp18 s2) 3/1/2018

Transcription:

Functional Dependencies Chapter 3 1

Halfway done! So far: Relational algebra (theory) SQL (practice) E/R modeling (theory) HTML/Flask/Python (practice) Midterm (super fun!) Next up: Database design theory (builds on E/R modeling, more theory) Data structures (practice) Query optimization, transactions (theory + practice!) Non-relational DB models (NoSQL 2

More DB design theory E/R diagrams can still leave you with redundancies in your schemas. Redundancy: Storing something twice in the DB when you only need to store it once. 3

4

5

Often, our first attempts at DB schemas can be improved, especially by eliminating redundancy. Functional dependencies help us do this. 6

What is a FD? Statement of the form: If two tuples of relation R agree on attributes A 1 A n, then they must agree on tuples B 1 B m. We say A 1 through A n functionally determine B 1 through B m. i.e., if you have two rows in a table, and the two rows all have the same values for A 1 A n, then the rows must have the same values for B 1 B m. 8

What is a FD? Write as X -> Y X and Y are sets of attributes from a relation. Read: "X functionally determines Y" Intuitive definitions: "If you know X, you can determine Y." "For each X, there can be only one Y. Guidelines: Often Y is a single attribute, though it doesn t have to be. 9

What is a FD? If every instance of a relation will make a FD true, then the relation satisfies the FD. Determined from real-world knowledge, not DB. An FD is a constraint on a single relational schema (one table). It must hold on every instance of the relation. Therefore, you cannot deduce an FD from a relational instance. 11

StudentR# StudentName CRN ProfName Title Grade 101 Harry Potter 1 Snape Potions A 101 Harry Potter 2 McGonagall Transfiguration B 101 Harry Potter 3 Trelawney Divination C 102 Hermione Granger 1 Snape Potions B 102 Hermione Granger 2 McGonagall Transfiguration A 103 Ronald Weasley 1 Snape Potions B What are the FDs? 15

StudentR# StudentName CRN ProfName Title Grade 101 Harry Potter 1 Snape Potions A 101 Harry Potter 2 McGonagall Transfiguration B 101 Harry Potter 3 Trelawney Divination C 102 Hermione Granger 1 Snape Potions B 102 Hermione Granger 2 McGonagall Transfiguration A 103 Ronald Weasley 1 Snape Potions B StudentR# -> StudentName CRN -> ProfName CRN -> Title StudentR# CRN -> Grade 16

StudentR# StudentName CRN ProfName Title Grade 101 Harry Potter 1 Snape Potions A 101 Harry Potter 2 McGonagall Transfiguration B 101 Harry Potter 3 Trelawney Divination C 102 Hermione Granger 1 Snape Potions B 102 Hermione Granger 2 McGonagall Transfiguration A 103 Ronald Weasley 1 Snape Potions B Is StudentR# -> StudentName a FD? Is CRN -> Grade a FD? 17

Where do FDs come from? "Key-ness" of attributes Domain and application constraints Real world constraints 18

Definition of Keys FDs allow us to formally define keys A set of attributes {A 1, A 2,, A n } is a key for relation R if it satisfies: Uniqueness: {A 1, A 2,, A n } functionally determine all the other attributes of R Minimality: no proper subset of {A 1, A 2,, A n } functionally determines all other attributes of R. 19

StudentR# StudentName CRN ProfName Title Grade 101 Harry Potter 1 Snape Potions A 101 Harry Potter 2 McGonagall Transfiguration B 101 Harry Potter 3 Trelawney Divination C 102 Hermione Granger 1 Snape Potions B 102 Hermione Granger 2 McGonagall Transfiguration A 103 Ronald Weasley 1 Snape Potions B What are the keys? 20

Two things you already know and one thing you don't: A relation can have more than one key. Usually one key is known as the primary key. FDs have nothing to do with primary keys, just keys. 22

Another way to think about FDs For a FD A 1 A 2 A n -> B 1 B 2 B m : Equivalent to the set of FDs A 1 A 2 A n -> B 1 A 1 A 2 A n -> B 2 A 1 A 2 A n -> B m : (etc, through) You should be able to imagine a function f(a 1, A 2,,A n ) that computes a unique B 1 (or B 2 ). 23

Superkeys 24

Superkeys A superkey (superset of a key) is a set of attributes that contains a key. In other words, a superkey satisfies the uniqueness part of the key definition, but may not satisfy the minimality part. 25

StudentR# StudentName CRN ProfName Title Grade 101 Harry Potter 1 Snape Potions A 101 Harry Potter 2 McGonagall Transfiguration B 101 Harry Potter 3 Trelawney Divination C 102 Hermione Granger 1 Snape Potions B 102 Hermione Granger 2 McGonagall Transfiguration A 103 Ronald Weasley 1 Snape Potions B What are the keys and superkeys? 27

With a partner Consider a relation about people in the USA, including name, SSN, street address, city, state, zip code, area code, and 7-digit phone number. What FDs would you expect to hold? What are the keys for this relation? Hints: Can an area code straddle two states? Can a zip code straddle two area codes? 28

Rules for Manipulating FDs Learn how to reason about FDs Define rules for deriving new FDs from a given set of FDs Example: R(A, B, C) satisfies FDs A->B, B->C. What others does it satisfy? A -> C What is the key for R? A (because A->B and A->C) 29

Review FD: X -> Y: for each X, there is only one Y. Knowing the value of X tells you the value of Y. Superkey: a set of attributes that functionally determines all of the other attributes of a relation. Key: a superkey that is also minimal (can't remove any attributes from it and still functionally determine all the other attributes). 30

Equivalence of FDs Why? To derive new FDs from a set of FDs An FD F follows from a set of FDs T if every relation instance that satisfies all the FDs in T also satisfies F A à C follows from T = {AàB, BàC} Two sets of FDs S and T are equivalent if each FD in S follows from T and each FD in T follows from S S = {AàB, BàC, AàC} and T = {AàB, BàC} are equivalent 31

Splitting and Combining FDs The set of FDs A1 A2 A3 An à B1 A1 A2 A3 An à B2 is equivalent to the FD A1 A2 A3 An à B1 B2 B3 Bm This equivalence implies two rules: Splitting rule Combining rule These rules work because all the FDs in S and T have identical left hand sides 32

Splitting and Combining FDs Can we split and combine left hand sides of FDs? Consider a relation Flights(airline, flightnum, source, dest) What are FDs? Does the FD flightnum -> source follow from airline flightnum -> source? No! 33

Triviality of FDs A FD A1 A2 An à B1 B2 Bm is Trivial if the B s are a subset of the A s Non-trivial if at least one B is not among the A s Completely non-trivial if none of the B s are among the A s Most real-world FDs are expressed completely non-trivially. 34

Triviality of FDs What good are trivial and non-trivial FDs? Trivial dependencies are always true They help simplify reasoning about FDs Trivial dependency rule: The FD A1 A2 An à B1 B2 Bm is equivalent to the FD A1 A2 An à C1 C2...Ck, where the C s are those B s that are not A s, i.e. Example: Suppose this FD holds: SSN -> birthday SSN Then this FD also holds: SSN -> birthday 35

Find a trivial FD: StudentR# StudentName CRN ProfName Title Grade 101 Harry Potter 1 Snape Potions A 101 Harry Potter 2 McGonagall Transfiguration B 101 Harry Potter 3 Trelawney Divination C 102 Hermione Granger 1 Snape Potions B 102 Hermione Granger 2 McGonagall Transfiguration A 103 Ronald Weasley 1 Snape Potions B 36

Review A set of FDs S follows from another set of FDs T iff all the FDs in S are implied by those in T. (e.g., through the splitting/combining rule, transitivity, etc) Two sets of FDs are equivalent if each set follows from the other. 37

Closure of a set of attributes Suppose you have a set of attributes {A1,, An} and a set of FDs S. The closure of {A1,, An} under S is the set of attributes B such that every relation in S also satisfies A1 An -> B. Intuitive def'n: B is the largest set of attributes that we can deduce from knowing A1,, An. Closure of {A1, An} denoted by {A1, An} + 38

Closure of Attributes: Algorithm 1. Use the splitting rule so that each FD in S has one attribute on the right (always possible). 2. Set X = {A1, A2, An} 3. Find a FD B1 B2 Bk à C in S such that {B1 B2 Bk} X but C X 4. Add C to X 5. Repeat the last two steps until you can t find C Why is the algorithm correct? Read 3.2.5 in textbook 39

Closure of Attributes: Example Suppose a relation R(A, B, C, D, E, F) has FDs: AB à C, BC à AD, D à E, CF à B Find the closures of: {A, B} {B, C, F} {A, F} under the FDs above. 40

Note about closure The closure of a set of attributes will be different for differing sets of FDs. R(A, B, C, D); with FDs A -> B and C -> D. What is {A, B} +? R(A, B, C); with FDs A -> BC and C -> D What is {A, B} +? Takeaway: Closure of a set of attributes is meaningless without a set of FDs. 41

Why compute closures? Can test whether any FD follows from a set of other FDs. Say we know a set of FDs S, and we want to check if a "new" FD A1 An -> B follows from S. Simply check if B is in {A1, A2,, An} + under S. To prove the correctness of rules for manipulating FDs. Can compute keys algorithmically. 49

Algorithm for computing keys Recall a superkey is a set of attributes that functionally determines all the other attributes. The closure of a set of attributes A1 An under a set of FDs gives you all the other attributes in R that can be functionally determined from knowing A1 An. Let s figure out the connection between superkeys and attribute closure. 52

Connection between closure and keys Suppose we have a relation R(A1, A2,, An). A superkey is a set of attributes that functionally determines all the other attributes of a relation. Set X is a superkey for R iff? X + = {A1, A2,, An} A key is a superkey that is also minimal (can t leave any attributes out of it): Set X is a minimal superkey (a key) iff? For any attribute A in X, (X-{A}) + {A1, A2,, An} 53

(Brute-force) algorithm for computing keys Given: A relation R (A1, A2,, An) The set of all FDs S that hold in R Find: Compute all the keys of R 1. For every subset K of {A1, A2,, An} compute its closure 2. If K + = {A1, A2, An} and for every attribute A, (K {A}) + is not {A1, A2, An}, then output K as a key 54

Students and Profs Suppose we have one single relation with attributes: R# Student Name ProfID (ID of professor teaching a class with the student) ProfName AdvisorID AdvisorName 55

Armstrong s Axioms We can use closures of attributes to determine if any FD follows from a given set of FDs Armstrong's axioms: complete set of inference rules from which it is possible to derive every FD that follows from a given set. Not the right W. W. Armstrong. This is Warwick Windridge Armstrong, an Australian cricketer. He did not invent these axioms. They were originated by William Ward Armstrong, who is Canadian, and does not have a picture on Wikipedia. 56

Armstrong s Axioms X, Y, and Z are sets of attributes. Reflexivity Y X X Y E.g. ssn name à ssn (always gives you a trivial FD) Augmentation X Y XW YW E.g. ssn à name can give you ssn grade à name grade 57

Armstrong's Axioms X, Y, and Z are sets of attributes. Transitivity X Y Y Z " # $ X Z e.g. if ssn à address and address à tax-rate then ssn à tax-rate 58

Note on notation Relation Schema: R(A1, A2, A3): parentheses surround attributes, attributes separated by commas. Set of attributes: {A1, A2, A3}: curly braces surround attributes, attributes separated by commas FD: A1 A2 à A3: no parentheses or curly braces, attributes separated by spaces, arrows separates left hand side and right hand side Set of FDs: {A1 A2 à A3, A2 à A1}: curly braces surround FDs, FDs separated by commas 64

Computing Closures of FDs Many times we are given a set of FDs and are interested in learning if there is a simpler set of FDs that has all the same implications that the original set does. To compute the closure of a set of FDs, repeatedly apply Armstrong s Axioms until you cannot find any new FDs. 67

Examples of Computing Closures of FDs (Let us include only completely non-trivial FDs in these examples, with a single attribute on the right) F = {AàB, BàC} {F} + =?? 68

Examples of Computing Closures of FDs (Let us include only completely non-trivial FDs in these examples, with a single attribute on the right) F = {AàB, BàC} {F} + = {AàB, BàC, AàC, ACàB, ABàC} 69

Examples of Computing Closures of FDs (Let us include only completely non-trivial FDs in these examples, with a single attribute on the right) F = {ABàC, BCàA, ACàB} {F} + =?? 70

Examples of Computing Closures of FDs (Let us include only completely non-trivial FDs in these examples, with a single attribute on the right) F = {ABàC, BCàA, ACàB} {F} + = {ABàC, BCàA, ACàB} 71

Examples of Computing Closures of FDs (Let us include only completely non-trivial FDs in these examples, with a single attribute on the right) F = {AàB, BàC, CàD} {F} + =?? 72

Examples of Computing Closures of FDs (Let us include only completely non-trivial FDs in these examples, with a single attribute on the right) F = {AàB, BàC, CàD} {F} + = {AàB, BàC, CàD, AàC, AàD, BàD, } 73

Closures of Attributes vs Closure of FDs Closure of attributes: Takes a set of attributes A and a set of FDs S. Produces a set of attributes (all the attribs that can be functionally determined from A, given S). Used for computing keys, checking if an FD follows from a set of FDs. Closure of a set of FDs: Takes a set of FDs. Produces a set of FDs (all the FDs that follow from S). Can be used for verifying a minimal basis, but also can verify by using closure of attributes. 75

Basis Set of FDs In linear algebra, a basis is the smallest set of linearly independent vectors such that you can build any other vector out of the basis vectors. 77

Basis Set of FDs In databases, a (minimal) basis for a set of FDs S is the smallest set of FDs that is equivalent to S. That is, all of the FDs in S follow from the basis set of FDs. 78

Minimal basis Given a set of FDs S, a minimal basis for S is another set of FDs B where: All the FDs in B have singleton right sides. If any FD is removed from B, the result is no longer a basis. If we remove any attribute from the left side of any FD in B, the result is no longer a basis. Like in linear algebra, there can be multiple minimal bases for a set of FDs, though unlike in linear algebra, two minimal bases for a set of FDs may be different sizes. 79

Example of Minimal Basis R(A, B, C) is a relation such that each attribute functionally determines the other two attributes What are the FDs that hold in R and what are the minimal bases? (Assume only one attribute on the right-hand side, only non-trivial FDs) 80

Example of Minimal Basis R(A, B, C) is a relation such that each attribute functionally determines the other two attributes What are the FDs that hold in R and what are the minimal bases? (Assume only one attribute on the right-hand side, only non-trivial FDs) FDs: AàB, AàC, BàA, BàC, CàA, CàB, ABàC, BCàA, ACàB Minimal Bases: {AàB, BàA, BàC, CàB}, {AàB, BàC, CàA}, etc. 83