Databases 2011 Christian S. Jensen Computer Science, Aarhus University
What is an Algebra? An algebra consists of values operators rules Closure: operations yield values Examples integers with +,, sets with,, \, matrices with +,, functions with,, -1 O relations with query operators 2
Mathematical Relations An n-ary relation on a set S is a subset of S n Examples is a binary relation on R, a subset of R R { (1.2, 3.4), (34, 117.363), ( 53, 0.1234),... } divides is a binary relation on N, a subset of N N { (2, 4), (3, 9), (3, 12), (17, 34), (1237, 21029),... } negative is a binary relation on N, a subset of N N { (3,-3), (-17,17), (0,0), (2, -2), (-2,2), (87, -87),...} sum is a ternary relation on N, a subset of N N N { (3,5,8), (23,14,37), (0,123,123), (42,87,129),... } married to is a binary relation on people { (Hillary, Bill), (Bill, Hillary), (Angelina, Brad),... } 3
Tables as Relations A database relation on a data set D consists of a schema of attribute names (a 1, a 2,..., a n ) a finite n-ary relation on D, a subset of D n A relation is like a table where all columns have the same generic type no duplicates are allowed no other constraints are imposed We implicitly allow permutations of the attributes 4
Relational Operators Database relations form an algebra with the operators union: intersection: difference: \ projection: π renaming: ρ selection: σ Cartesian product: natural join: These provide an abstract model of database queries 5
Union, Intersection, Difference The arguments must have the same schema The result has again that schema R S R S R \ S They compute the set operations on the relations 6
Projection π a 1,...,a n (R) Assume the schema of R is (a 1,...,a n,b 1,...,b m ) The schema of the result is (a 1,...,a n ) The result relation is { (d 1,..., d n ) (d 1,..., d n+m ) R } 7
Renaming ρ a b (R) The name a must occur as a i in the schema of R The name b must not occur in the schema of R Schema of the result: (a 1,..., a i-1, b, a i+1,..., a n ) The result relation is unchanged ρ a b,c d,e f (R) = ρ a b (ρ c d (ρ e f (R))) 8
Selection σ C (R) C is a condition of the attributes of R The resulting schema is unchanged The relation part is: { r r R C(r) } 9
Cartesian Product R S Assume R has schema (a 1,..., a m ) S has schema (b 1,..., b n ) The new schema is (a 1,..., a m, b 1,..., b n ) The relation part is { (c 1,..., c m+n ) (c 1,..., c m ) R (c m+1,..., c m+n ) S } 10
Natural Join R S Assume R has schema (a 1,..., a k, c 1,..., c n ) S has schema (c 1,..., c n, b 1,..., b m ) {a i } {b i } = The new schema is (a 1,..., a k, c 1,..., c n, b 1,..., b m ) The relation part is { (d 1,..., d k, e 1,..., e n, f 1,..., f m ) (d 1,..., d k, e 1,..., e n ) R (e 1,..., e n, f 1,..., f m ) S } 11
Derived Operators R S = R S = R (R S) when the schemas are identical R S = R S when the schemas are disjoint R Θ S = σ Θ (R S) the theta join SELECT DISTINCT X 1,, X k FROM R 1,, R n WHERE C = π x 1,, xk (σ C (R 1 R n ) 12
Query Trees In which meetings do the owners participate? π what,meetid (σ status= a ( ρ owner userid (Meetings) ρ pid userid (Participants))) π what,meetid σ status= a ρ owner userid ρ pid userid Meetings Participants 13
Limitations The relational algebra cannot answer all queries Flights from Copenhagen to Madrid Rome London Madrid Athens Athens Rome...... Which cities can be reached from Copenhagen in one or more flights? 14
Transitive Closure The transitive closure of a binary relation R R = { (x 1,x k ) x 1,...,x k-1 ((x i,x i+1 ) R) } No relational algebra expression computes R No SQL query can handle it either unless SQL is extended with recursion or a special closure operator is added (some DBMSs do support this) 15
Algebraic Laws (1/3) x x = x x y = y x x x = x x y = y x x (y z) = (x y) z x (y z) = (x y) z x (y z) = (x y) (x z) idempotence commutativity idempotence commutativity associativity associativity distributivity 16
Algebraic Laws (2/3) σ C (x y) = σ C (x) σ C (y) σ C (x \ y) = σ C (x) \ σ C (y) = σ C (x) \ y σ C (x y) = σ C (x) σ C (y) σ C (x y) = σ C (x) σ C (y) σ C (x) = σ C (σ C (x)) σ C (σ D (x)) = σ D (σ C (x)) σ C D (x) = σ C (σ D (x)) = σ C (x) σ D (x) σ C D (x) = σ C (x) σ D (x) σ C (x) = x \ σ C (x) distributivity distributivity distributivity distributivity idempotence commutativity splitting splitting splitting 17
Algebraic Laws (3/3) π a (x y) = π a (x) π a (y) distributivity (does not hold for and ) ρ a b (x y) = ρ a b (x) ρ a b (y) ρ a b (x \ y) = ρ a b (x) \ ρ a b (y) ρ b c (ρ a b (x)) = ρ a c (x) ρ a b (ρ c d (x)) = ρ c d (ρ a b (x)) distributivity distributivity cancellation commutativity 18
Zero and Unit Define 0 = the empty relation (for each schema) Define 1 as follows the schema is empty the relation contains the single empty row 0 x = x 0 = x 0 x = x 0 = 0 1 x = x 1 = x 19
Division 20
Division Example Completed student task Fred Database1 Fred Database2 Fred Compiler1 Eugene Database1 Eugene Compiler1 Eugene Compiler2 Sara Database1 Sara Database2 John Usability1 ddb task Database1 Database2 Completed ddb student Fred Sara Those students that have completed all the ddb tasks 21
Algebraic Query Optimization Rewritings may improve efficiency (A B) C A (B C) σ C (A B) σ C (A) σ C (B) Depends on the predicates (selectivities) and the specific instances 22
Algebraic Query Optimization Rewritings may improve efficiency: 10 6 rows 10 6 rows 10 rows (A B) C A (B C) 10 12 rows 10 rows σ C (A B) σ C (A) σ C (B) Depends on the predicates (selectivities) and the specific instances 23
Rules of Thumb Push selections down the expressions tree Push projections down the expression tree Order joins based on size estimates In general, search for a good expression tree use heuristics use statistics: table sizes, distinct values for attributes, histograms, etc. 24
Bag Algebra Allows relations to contain duplicate entries Sets are replaced by bags The bag versions of,, and \ count copies The bag versions of π, σ, and keep duplicates A better match with real-life SQL than sets Does still not account for the ordering of the tuples SQL offers some support for ordering Tuples in a relation are stored on disk in some order 25
Algebraic Laws for Bags Fewer algebraic laws are valid for the bag algebra Counter examples x (y z) = (x y) (x z) σ C D (x) = σ C (x) σ D (x) Beware when optimizing bag queries! 26
Algebraic Laws for Bags Fewer algebraic laws are valid for the bag algebra Counter examples x (y z) = (x y) (x z) σ C D (x) = σ C (x) σ D (x) x,y,z = a 42 C,D = true Beware when optimizing bag queries! 27