Relational Algebra and Calculus

Similar documents
Relational Algebra & Calculus

Introduction to Data Management. Lecture #12 (Relational Algebra II)

Database Applications (15-415)

Relational Algebra 2. Week 5

Database Systems Relational Algebra. A.R. Hurson 323 CS Building

CS5300 Database Systems

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

Database Systems SQL. A.R. Hurson 323 CS Building

Database Applications (15-415)

CS54100: Database Systems

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1

Databases 2011 The Relational Algebra

General Overview - rel. model. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Motivation. Overview - detailed

Chapter 3 Relational Model

GAV-sound with conjunctive queries

Query answering using views

Schedule. Today: Jan. 17 (TH) Jan. 24 (TH) Jan. 29 (T) Jan. 22 (T) Read Sections Assignment 2 due. Read Sections Assignment 3 due.

Relational Algebra on Bags. Why Bags? Operations on Bags. Example: Bag Selection. σ A+B < 5 (R) = A B

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation

Constraints: Functional Dependencies

Learning Goals. Relational Query Languages. Formal Relational Query Languages. Formal Query Languages: Relational Algebra and Relational Calculus

Sub-Queries in SQL SQL. 3 Types of Sub-Queries. 3 Types of Sub-Queries. Scalar Sub-Query Example. Scalar Sub-Query Example

Constraints: Functional Dependencies

Basic Logic and Proof Techniques

Languages. Theory I: Database Foundations. Relational Algebra. Paradigms. Projection. Basic Operators. Jan-Georg Smaus (Georg Lausen)

Relational Calculus. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

Computational Logic. Relational Query Languages with Negation. Free University of Bozen-Bolzano, Werner Nutt

CAS CS 460/660 Introduction to Database Systems. Functional Dependencies and Normal Forms 1.1

From Constructibility and Absoluteness to Computability and Domain Independence

Relational Algebra Part 1. Definitions.

CHAPTER 6. Copyright Cengage Learning. All rights reserved.

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

Packet #2: Set Theory & Predicate Calculus. Applied Discrete Mathematics

Relational completeness of query languages for annotated databases

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS

Chapter 2: Basic Notions of Predicate Logic

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 2. More operations: renaming. Previous lecture. Renaming.

Logic and Propositional Calculus

Advanced DB CHAPTER 5 DATALOG

CS632 Notes on Relational Query Languages I

ER Modelling: Summary

About the relationship between formal logic and complexity classes

Introduction to Metalogic

Part I: Propositional Calculus

Relational-Database Design

The complexity of recursive constraint satisfaction problems.

Chapter 16. Logic Programming. Topics. Logic Programming. Logic Programming Paradigm

Handbook of Logic and Proof Techniques for Computer Science

The non-logical symbols determine a specific F OL language and consists of the following sets. Σ = {Σ n } n<ω

Introduction to Data Management. Lecture #6 (Relational DB Design Theory)

THE AUSTRALIAN NATIONAL UNIVERSITY Second Semester COMP2600 (Formal Methods for Software Engineering)

3515ICT: Theory of Computation. Regular languages

Section 2.2 Set Operations. Propositional calculus and set theory are both instances of an algebraic system called a. Boolean Algebra.

2.2 Lowenheim-Skolem-Tarski theorems

Fundamentos lógicos de bases de datos (Logical foundations of databases)

COMP9020 Lecture 3 Session 2, 2014 Sets, Functions, and Sequences. Revision: 1.3

Tuple Relational Calculus

CS2742 midterm test 2 study sheet. Boolean circuits: Predicate logic:

Introduction to Data Management. Lecture #7 (Relational DB Design Theory II)

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.

The Church-Turing Thesis and Relative Recursion

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

UNIVERSITY OF EAST ANGLIA. School of Mathematics UG End of Year Examination MATHEMATICAL LOGIC WITH ADVANCED TOPICS MTH-4D23

Axiomatic set theory. Chapter Why axiomatic set theory?

Movies Title Director Actor

Writing proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases

CHAPTER 1. Preliminaries. 1 Set Theory

CS 347 Parallel and Distributed Data Processing

CMPS 277 Principles of Database Systems. Lecture #9

COMP9020 Lecture 3 Session 2, 2016 Sets, Functions, and Sequences. Revision: 1.3

CS 4110 Programming Languages & Logics. Lecture 16 Programming in the λ-calculus

Lambda Calculus! Gunnar Gotshalks! LC-1

Incomplete Information in RDF

Introduction to Data Management. Lecture #6 (Relational Design Theory)

Computer Sciences Department

COSC 430 Advanced Database Topics. Lecture 2: Relational Theory Haibo Zhang Computer Science, University of Otago

Chapter 1 The Real Numbers

1 What is numerical analysis and scientific computing?

INTRODUCTION TO PREDICATE LOGIC HUTH AND RYAN 2.1, 2.2, 2.4

Theory of Computation

chapter 12 MORE MATRIX ALGEBRA 12.1 Systems of Linear Equations GOALS

Lecture 2: Syntax. January 24, 2018

Predicate Logic. Xinyu Feng 09/26/2011. University of Science and Technology of China (USTC)

Inference in first-order logic. Production systems.

Sets are one of the basic building blocks for the types of objects considered in discrete mathematics.

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

Opleiding Informatica

UNIT-III REGULAR LANGUAGES

Proof Theory and Subsystems of Second-Order Arithmetic

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

Fundamentals of Software Engineering

DATABASE DESIGN I - 1DL300

Equational Logic. Chapter Syntax Terms and Term Algebras

Lecture #7 (Relational Design Theory, cont d.)

6 The Relational Data Model: Algebraic operations on tabular data

Database Design and Implementation

Stanford University CS103: Math for Computer Science Handout LN9 Luca Trevisan April 25, 2014

Syntax. Notation Throughout, and when not otherwise said, we assume a vocabulary V = C F P.

Mathematical Preliminaries. Sipser pages 1-28

Recurrence Relations and Recursion: MATH 180

Transcription:

Topics Relational Algebra and Calculus Linda Wu Formal query languages Preliminaries Relational algebra Relational calculus Expressive power of algebra and calculus (CMPT 354 2004-2) Chapter 4 CMPT 354 2004-2 2 Relational Query Languages Relational model supports simple, powerful query languages Allow manipulation and retrieval of data from a database Allow for much optimization Strong formal foundation based on logic Query Languages programming languages Query languages are not expected to be Turing complete Query languages are not intended to be used for complex calculations Query languages support easy, efficient access to large data sets Formal Relational Query Languages Two mathematical query languages form the basis for real languages (e.g. SQL), and for implementation Relational Algebra Describe a step-by-step procedure for computing the desired answer Operational, useful for representing execution plans Relational Calculus Describe the desired answer, rather than how to compute it Non-operational, declarative Chapter 4 CMPT 354 2004-2 3 Chapter 4 CMPT 354 2004-2 4 1

Preliminaries A query is applied to relation instances, and the result of a query is also a relation instance Schemas of input relations for a query are fixed The schema for the result of a given query is also fixed Positional vs. named-field notation Positional notation is easier for formal definitions; named-field notation is more readable Both are used in SQL Relational Algebra Selection Projection Set operations Renaming Joins Division Chapter 4 CMPT 354 2004-2 5 Operators Basic operators Selection (σ): select a subset of rows from relation Projection (π): delete unwanted columns from relation Cross-product ( ): combine two relations Set-difference ( ): tuples in relation 1 but not in relation 2 Union ( ): tuples in both relation 1 and 2 Additional operators Intersection( ), join( ), division( ), renaming(ρ) Not essential, but very useful Operators (Cont.) Each operator accepts relation instance(s) as arguments, and returns a relation instance as result Algebra expression Composed by operators Describe a procedure by which computing the desired answer Used by relational systems to represent query evaluation plans Chapter 4 CMPT 354 2004-2 7 Chapter 4 CMPT 354 2004-2 8 2

Example Instances Boat ( bid: integer, bname: string, color: string ) Sailors ( : integer, : string, : integer, age: real ) Reserves ( : integer, bid: integer, day: date ) age Dustin 7 45.0 31 Lubber 8 55.5 Rusty Instance of Sailors bid day 1 //96 3 11/12/96 Instance R1 of Reserves age 28 Yuppy 9 31 Lubber 8 55.5 44 guppy 5 Rusty Instance of Sailors Chapter 4 CMPT 354 2004-2 9 Projection π To delete attributes that are not in projection list Schema of result contains exactly the fields in the projection list, with the same names that they had in the single input relation Projection operator has to eliminate duplicates! Yuppy Lubber guppy 9 8 5 Rusty π ( S 2 ) snam e, age 55.5 π age ( S 2 ) Chapter 4 CMPT 354 2004-2 Selection σ To select rows that satisfy selection condition No duplicates in result Schema of result is identical to schema of single input relation Result relation can be the input for another relational algebra operation (operator composition) 28 Yuppy Rusty Yuppy Rusty 9 9 age σ S >8 ( 2) Chapter 4 CMPT 354 2004-2 11 π σ ( ( S )), > 8 2 Selection σ (Cont.) Selection condition A Boolean combination (, ) of terms A term has the forms: attribute op constant, or, attribute1 op attribute2 * op is one of: <,, =,,, > Example ( 8) (age < 50) (1 = 2) (bid1 bid2) Chapter 4 CMPT 354 2004-2 12 3

Union, Intersection, Set-Difference These 3 operators take 2 input relations, which must be unioncompatible: Have the same number of fields Corresponding fields have the same types Result schema The first relation Dustin 7 age 45.0 Dustin 7 Lubber 8 Rusty guppy 5 Yuppy 9 Chapter 4 CMPT 354 2004-2 13 31 44 28 31 Lubber Rusty 8 age 45.0 55.5 age 55.5 Cross-Product R S = {<r, s> r R, s S} Each row of R is paired with each row of S Result schema has one field per field of R and S, with field names inherited if possible The result fields have the same domains as the corresponding fields in R and S Naming conflict: R and S contain field(s) with the same name The corresponding fields in R S are unnamed and referred to only by position E.g., both and R1 have a field Chapter 4 CMPT 354 2004-2 14 Cross-Product (Cont.) Renaming ρ () 31 31 Dustin Dustin Lubber Lubber Rusty Rusty 7 7 8 8 age () 45.0 45.0 55.5 55.5 R1 bid 1 3 1 3 1 3 day //96 11/12/96 //96 11/12/96 //96 11/12/96 ρ (R(F), E) E: a relational algebra expression R: a new relation F: a list of fields that are renamed Takes E and returns an instance of R R has the same tuples as the result of E R has the same schema as E, but some fields are renamed ρ ( C(1 1, 5 2), R1 ) ρ ( C, R1) ρ ( (1 1, 5 2), R1 ) Chapter 4 CMPT 354 2004-2 15 Chapter 4 CMPT 354 2004-2 16 4

Joins One of the most useful operations in relational algebra The most common way to combine information from two or more relations Defined as a cross-product followed by selections and projections Has a smaller result than a cross-product Condition join, equijoin, natural join, etc. Chapter 4 CMPT 354 2004-2 17 Joins (Cont.) Condition Join R >< c S = σ c ( R S) C: join condition may refer to the attributes of both R and S Result schema is same as that of cross-product Result has fewer tuples than cross-product; might be able to compute more efficiently () 31 Dustin 7 Lubber 8 >< age () bid day 45.0 3 55.5 3 R1 S 1. < R1. 11/12/96 11/12/96 Chapter 4 CMPT 354 2004-2 18 Joins (Cont.) Equijoin: a special case of condition join where the condition C contains only equalities Equality is of form: R.name1 = S.name2 Result schema is similar to cross-product, but only one copy of fields for which equality is specified Natural Join: equijoin on all common fields age bid day Dustin 7 45.0 1 //96 Rusty 3 11/12/96 >< R1, or, ><.=R1. R1 Chapter 4 CMPT 354 2004-2 19 Division Not a primitive operator, but useful for expressing queries like: Find sailors who have reserved all boats Let A have 2 fields, x & y; B have only field y A/B = { x x,y A Y B} i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple in A (reserves), or, if the set of y values (boats) associated with an x value (sailor) in A contains all y values in B, then x value is in A/B In general, x and y can be any lists of fields; y is the list of fields in B, and x y is the list of fields of A Chapter 4 CMPT 354 2004-2 20 5

Division (Cont.) Division (Cont.) A S# S3 S4 S4 P# P1 P3 P4 P1 P4 B1 P# A/B1 S# S3 S4 B2 P# P4 A/B2 S# S4 B3 P# P1 P4 A/B3 S# Division is not an essential operation; just a useful shorthand (Also true of joins, but joins are so common that systems implement joins specially) Expressing division using basic operators Idea: for A/B, compute all x values that are not disqualified by some y value in B x value is disqualified if: by attaching y value from B, we obtain an xy tuple that is not in A Disqualified x values: A / B: π x (( π x ( A) B) A) π x ( A) all disqualified tuples Chapter 4 CMPT 354 2004-2 21 Chapter 4 CMPT 354 2004-2 Examples reserved boat #3 Solution 1: π (( σ Re serves) >< Sailors) bid =3 Solution 2: ρ ( Temp1, σ Re serves) bid = 3 ρ ( Temp2, Temp1 >< Sailors) Solution 3: π ( Temp2) π ( σ (Re serves>< Sailors)) bid = 3 Chapter 4 CMPT 354 2004-2 23 Examples (Cont.) reserved a red boat Information about boat color is only available in Boats; so need an extra join with Boats Solution 1: π (( σ Boats serves Sailors color ' red ' ) >< Re >< ) = Solution 2 (more efficient): π ( π π σ (( Boats s Sailors bid color ' red ' ) >< Re ) >< ) = A query optimizer can find the second solution, given the first one! Chapter 4 CMPT 354 2004-2 24 6

Examples (Cont.) reserved a red or a green boat Identify all red or green boats, then find sailors who have reserved one of these boats ρ ( Tempboats,( σ )) color = ' red ' color = ' green ' Boats π ( Tempboats>< Re serves>< Sailors) Tempboats can also be defined using union What if is replaced by in this query? Examples (Cont.) reserved a red and a green boat Identify sailors who have reserved red boats, sailors who have reserved green boats, and then, find the intersection Note that is a key for Sailors ρ ( Tempred, π (( σ B oats) >< Re serves)) color = ' red ' ρ ( Tempgreen, π (( σ Boats) >< Re serves)) color = ' green' π (( Tempred Tempgreen) >< Sailors) Chapter 4 CMPT 354 2004-2 25 Chapter 4 CMPT 354 2004-2 26 Examples (Cont.) reserved all boats Uses division; schemas of the input relations to the division (/) must be carefully chosen ρ ( Temps, ( π Re serves) / ( π )), bid bid Boats π ( Temps>< Sailors) reserved all Interlake boats... / π ( σ ) bid bname= ' Interlake' B oats Summary The relational model has rigorously defined query languages that are simple and powerful Relational algebra is more operational; useful as internal representation for query evaluation plans There might be several ways of expressing a given query; a query optimizer should choose the most efficient version Chapter 4 CMPT 354 2004-2 27 Chapter 4 CMPT 354 2004-2 28 7

Relational Calculus Relational Calculus Domain relational calculus Tuple relational calculus Two variants of relational calculus Tuple relational calculus (TRC): SQL Domain relational calculus (DRC): QBE Calculus has variables, constants, comparison operators, logical connectives and quantifiers TRC: variables range over tuples DRC: variables range over domain elements (= field values) Both TRC and DRC are simple subsets of first-order logic Calculus expressions are called formulas An answer tuple is an assignment of constants to variables that make the formula evaluate to true Chapter 4 CMPT 354 2004-2 30 Domain Relational Calculus DRC query has the form { x 1, x 2,, x n p ( x 1, x 2,, x n ) } The answer to the query includes all tuples x 1, x 2,, x n that make the formula p ( x 1, x 2,, x n ) be true DRC formula is recursively defined, starting with simple atomic formulas, and building bigger and better formulas using the logical connectives Example: find all sailors with a above 7 { I, N, T, A I, N, T, A Sailors T > 7 } Domain Relational Calculus (Cont.) DRC atomic formula x 1, x 2,, x n Rname, or, X op Y, or, X op constant * Rname is relation name; X, Y are domain variables; op is one of <, >, =,,, DRC formula an atomic formula, or, p, p q, p q, where p and q are formulas, or, X (p(x)), where variable X is free in p(x), or, X (p(x)), where variable X is free in p(x) Chapter 4 CMPT 354 2004-2 31 Chapter 4 CMPT 354 2004-2 32 8

Domain Relational Calculus (Cont.) Free and bound variables and are quantifiers The use of X and X is said to bind X A variable that is not bound is free An important restriction on the definition of a DRC query { x 1, x 2,, x n p ( x 1, x 2,, x n ) } The variables x 1, x 2,, x n that appear to the left of ` must be the only free variables in the formula p ( ) DRC Query Examples Find all sailors with a above 7 { I, N, T, A I, N, T, A Sailors T > 7} The condition I, N, T, A Sailors ensures that the domain variables I, N, T and A are bound to fields of the same Sailors tuple should be read as such that The term I, N, T, A to the left of says that every tuple I, N, T, A that satisfies T>7 is in the answer Chapter 4 CMPT 354 2004-2 33 Chapter 4 CMPT 354 2004-2 34 DRC Query Examples (Cont.) Find the names of sailors rated > 7 who have reserved boat #3 Ir, Br, D: a shorthand for Ir( Br( D())) : to find a tuple in Reserves that joins with the Sailors tuple under coneration { N I, T, A ( I, N, T, A Sailors T > 7 Ir, Br, D ( Ir, Br, D Reserves Ir = I Br = 3 ) ) } { N I, T, A ( I, N, T, A Sailors T > 7 Ir, Br, D Reserves ( Ir = I Br = 3 ) ) } DRC Query Examples (Cont.) Find sailors rated > 7 who have reserved a red boat The parentheses control the scope of each quantifier s binding { I, N, T, A I, N, T, A Sailors T > 7 Ir, Br, D ( Ir, Br, D Reserves Ir = I B, BN, C ( B, BN, C Boats B= Br C = red ) ) } { I, N, T, A I, N, T, A Sailors T > 7 I, Br, D Reserves Br, BN, red Boats } Chapter 4 CMPT 354 2004-2 35 Chapter 4 CMPT 354 2004-2 36 9

DRC Query Examples (Cont.) reserved all boats (solution 1) Find all sailors I, N, T, A such that: for each 3-tuple B, BN, C, either it is not a tuple in Boats, or there is a tuple in Reserves showing that sailor I has reserved B { N I, T, A ( I, N, T, A Sailors B, BN, C ( ( B, BN, C Boats ) ( Ir, Br, D Reserves ( Ir = I Br = B ) ) ) ) } Chapter 4 CMPT 354 2004-2 37 DRC Query Examples (Cont.) reserved all boats (solution 2) Simpler notation, same query (much clearer!) { N I, T, A ( I, N, T, A Sailors B, BN, C Boats ( Ir, Br, D Reserves ( Ir = I Br = B ) ) ) } To find the names of sailors who jave reserved all red boats { N I, T, A ( I, N, T, A Sailors B, BN, C Boats ( C red Ir, Br, D Reserves ( Ir = I Br = B ) ) ) } Chapter 4 CMPT 354 2004-2 38 Tuple Relational Calculus TRC query has the form { T p (T) } T is a tuple variable that takes on tuples of a relation as values p(t) is a formula describing T The answer to the query is the set of all tuples t that make p(t) be true when T = t TRC formula is recursively defined Example: find all sailors with a above 7 { S S Sailors S. >7} Tuple Relational Calculus (Cont.) TRC atomic formula R Rname, or, R.a op S.b, or, R.a op constant * Rname is relation name; R, S are tuple variables; a is an attribute of R, b is an attribute of S; op is one of <, >, =,,, TRC formula an atomic formula, or, p, p q, p q, where p and q are formulas, or, R (p(r)), where R is a tuple variable, or, R (p(r)), where R is a tuple variable Chapter 4 CMPT 354 2004-2 39 Chapter 4 CMPT 354 2004-2 40

TRC Query Examples Find the names and ages of sailors with a above 7 { P S Sailors ( S.>7 P.name = S. P.age = S.age ) } P is a tuple variable with two fields: name and age P.name=S. and P.age=S.age gives values to the fields of an answer tuple P * If a variable R does not appear in an atomic formula of the form R Rname, the type of R is a tuple whose fields include all (and only) fields of R that appear in the formula Chapter 4 CMPT 354 2004-2 41 TRC Query Examples (Cont.) reserved all boats { P S Sailors B Boats ( R Reserves ( S. = R. R.bid = B.bid P. = S. ) ) } Find sailors who have reserved all red boats { S S Sailors B Boats ( B.color red ( R Reserves ( S. = R. R.bid = B.bid ) ) ) } Chapter 4 CMPT 354 2004-2 42 Expressive Power of Algebra and Calculus Unsafe query a syntactically correct calculus query that has an infinite number of answers E.g., { S ( S Sailors ) } Every query that can be expressed in relational algebra can be expressed as a safe query in DRC / TRC; the converse is also true Relational Completeness Query language (e.g., SQL) can express every query that is expressible in relational algebra In addition, commercial query languages can express some queries that cannot be expressed in relational algebra Summary Relational calculus is non-operational, and users define queries in terms of what they want, not in terms of how to compute it (declarativeness) Algebra and safe calculus have same expressive power, leading to the notion of relational completeness Chapter 4 CMPT 354 2004-2 43 Chapter 4 CMPT 354 2004-2 44 11