Module 10: Query Optimization

Size: px
Start display at page:

Download "Module 10: Query Optimization"

Transcription

1 Module 10: Query Optimization Module Outline 10.1 Outline of Query Optimization 10.2 Motivating Example 10.3 Equivalences in the relational algebra 10.4 Heuristic optimization 10.5 Explosion of search space 10.6 Dynamic programming strategy (System R) Web Forms Transaction Manager Lock Manager Plan Executor Operator Evaluator Concurrency Control Applications SQL Commands Files and Index Structures Buffer Manager Disk Space Manager Parser Optimizer SQL Interface You are here! Query Processor Recovery Manager DBMS Index Files Data Files System Catalog Database 292

2 10.1 Outline of Query Optimization The success of relational database technology is largely due to the systems ability to automatically find evaluation plans for declaratively specified queries. Given some (SQL) query Q, the system 1 parses and analyzes Q, 2 derives a relational algebra expression E that computes Q, 3 transforms and simplifies E, and 4 annotates the operators in E with access methods and operator algorithms to obtain an evaluation plan P. Discussed here: Task 3 is often called algebraic (or re-write) query optimization, while task 4 is also called non-algebraic (or cost-based) query optimization. 293

3 10.2 Motivating Example From query to plan Example: List the airports from which flights operated by Swiss (airline code LX) fly to any German (DE) airport. Airport : code country name FRA DE Frankfurt ZRH CH Zurich MUC DE Munich. Flight : from to airline FRA ZRH LX ZRH MUC LX FRA MUC US. SQL query Q: SELECT FROM WHERE f.from Flight f, Airport a f.to = a.code AND f.airline = LX AND a.country = DE 294

4 From query to plan... SQL query Q: SELECT FROM WHERE f.from Flight f, Airport a f.to = a.code AND f.airline = LX AND a.country = DE Relational algebra expression E that computes Q: from airline= LX country= DE Airport to=code Flight 295

5 From query to plan... Relational algebra expression E that computes Q: from airline= LX country= DE Airport to=code Flight One (of many) plan(s) P to evaluate Q: from scan airline= LX country= DE scan Airport heap scan to=code NL- Flight index scan on to 296

6 10.3 Equivalences in the relational algebra Two relational algebra expressions E 1, E 2 are equivalent if on every legal database instance the two expressions generate the same set of tuples. Note: the order of tuples is irrelevant Such equivalences are denoted by equivalence rules of the form E 1 E 2 (such a rule may be applied by the system in both directions, ). We know those equivalence rules from the course Information Systems. 297

7 Some equivalence rules 1 Conjunctive selections can be deconstructed into a sequence of individual selections: p 1 p 2 (E) p 1 (p 2 (E)) 2 Selection operations are commutative: p 1 (p 2 (E)) p 2 (p 1 (E)) 3 Only the last projection in a sequence of projections is needed, the others can be omitted: L 1 (L 2 ( L n (E) )) L 1 (E) 4 Selections can be combined with Cartesian products and joins: i) ii) p(e 1 E 2 ) E 1 p E 2 p(e 1 q E 2 ) E 1 p q E 2 298

8 Pictorial description of 4 i): p E 1 E 2 p E 1 E 2 299

9 5 Join operations are commutative: E 1 p E 2 E 2 p E 1 6 i) Natural joins (equality of common attributes) are associative: (E 1 E 2 ) E 3 E 1 (E 2 E 3 ) ii) Generals joins are associative in the following sense: (E 1 p E 2 ) q r E 3 ) E 1 p q (E 2 r E 3 ) where predicate r involves attributes of E 2, E 3 only. 7 Selection distributes over joins in the following ways: i) If predicate p involves attributes of E 1 only: p(e 1 q E 2 ) p(e 1 ) q E 2 ii) If predicate p involves only attributes of E 1 and q involves only attributes of E 2 : p q(e 1 r E 2 ) p(e 1 ) r q(e 2 ) (this is a consequence of rules 7 (a) and 1 ). 300

10 8 Projection distributes over join as follows: L 1 L 2 (E 1 p E 2 ) L 1 (E 1 ) p L 2 (E 2 ) if p involves attributes in L 1 L 2 only and L i contains attributes of E i only. 9 The set operations union and intersection are commutative: E 1 E 2 E 2 E 1 E 1 E 2 E 2 E 1 10 The set operations union and intersection are associative: (E 1 E 2 ) E 3 E 1 (E 2 E 3 ) (E 1 E 2 ) E 3 E 1 (E 2 E 3 ) 301

11 11 The selection operation distributes over, and \: p(e 1 E 2 ) p(e 1 ) p(e 2 ) p(e 1 E 2 ) p(e 1 ) p(e 2 ) p(e 1 \ E 2 ) p(e 1 ) \ p(e 2 ) Also: p(e 1 E 2 ) p(e 1 ) E 2 p(e 1 \ E 2 ) p(e 1 ) \ E 2 (this does not apply for ) 12 The projection operation distributes over : L(E 1 E 2 ) L(E 1 ) L(E 2 ) 302

12 10.4 Heuristic optimization Query optimizers use the equivalence rules of relational algebra to improve the expected performance of a given query in most cases. The optimization is guided by the following heuristics: (a) Break apart conjunctive selections into a sequence of simpler selections (rule 1 preparatory step for (b)). (b) Move down the query tree for the earliest possible execution (rules 2, 7, 11 reduce number of tuples processed). (c) Replace pairs by (rule 4 (a) avoid large intermediate results). (d) Break apart and move as far down the tree as possible lists of projection attributes, create new projections where possible (rules 3, 8, 12 reduce tuple widths early). (e) Perform the joins with the smallest expected result first. 303

13 Heuristic optimization: example SQL query Q: SELECT FROM WHERE AND AND p.ticketno Flight f, Passenger p, Crew c f.flightno = p.flightno AND f.flightno = c.flightno f.date = AND f.to = FRA p.name = c.name AND c.job = Pilot ( What would be a natural language formulation of Q?) 304

14 SELECT FROM WHERE AND AND p.ticketno Flight f, Passenger p, Crew c f.flightno = p.flightno AND f.flightno = c.flightno f.date = AND f.to = FRA p.name = c.name AND c.job = Pilot Canonical relational algebra expression (reflects the semantics of the SQL SELECT- FROM-WHERE block directly): p.ticketno f.flightno=p.flightno f.flightno=c.flightno c.job= Pilot Flight f Crew c Passenger p 305

15 Heuristic optimization: example 1 Break apart conjunctive selection to prepare push-down of selections: p.ticketno f.flightno=p.flightno f.flightno=c.flightno f.date= f.to= FRA p.name=c.name Flight f c.job= Pilot Crew c Passenger p 306

16 Heuristic optimization: example 2 Push down selection as far as possible (but no further!): p.ticketno f.flightno=c.flightno p.name=c.name f.flightno=p.flightno c.job= Pilot Crew c Passenger p f.to= FRA f.date= Flight f 307

17 Heuristic optimization: example 3 Re-unite sequences of selections into single conjunctive selections: p.ticketno f.flightno=c.flightno p.name=c.name f.flightno=p.flightno c.job= Pilot f.to= FRA f.date= Passenger p Crew c Flight f 308

18 Heuristic optimization: example 4 Introduce projections to reduce tuple widths: p.ticketno f.flightno=c.flightno p.name=c.name f.flightno=p.flightno c.flightno,c.name f.flightno c.job= Pilot Crew c f.to= FRA f.date= p.ticketno,p.flightno,p.name Flight f Passenger p 309

19 Heuristic optimization: example 5 Combine cartesian products and selections into joins: p.ticketno f.flightno f.flightno=p.flightno f.flightno=c.flightno p.name=c.name c.flightno,c.name c.job= Pilot Crew c f.to= FRA f.date= p.ticketno,p.flightno,p.name Flight f Passenger p 310

20 Heuristic optimization: example 6 Relation Passenger presumably is the largest relation, re-order the joins (associativity of general joins, rule 6 ii)): p.ticketno f.flightno f.flightno=c.flightno f.flightno=p.flightno p.name=c.name c.flightno,c.name p.ticketno,p.flightno,p.name Passenger p f.to= FRA f.date= c.job= Pilot Flight f Crew c 311

21 Choosing an evaluation plan When the optimizer annotates the resulting algebra expression E it needs to consider the interaction of the chosen operator algorithms/access methods. Choosing the cheapest (in terms of I/O) algorithm for each operation independently may not yield overall cheapest plan P. Example: merge join may be costlier than nested loops join (operands need to be sorted first), but yields output in sorted order (good for subsequent duplicate elimination, selection, grouping,... ) We need to consider all possible plans and then choose the best one in a cost-based fashion. 312

22 10.5 Explosion of search space Consider finding the best join order for the query R 1 R 2 R 3 R 4 Several join tree shapes (due to associativity, commutativity of ): R 1 R 2... R 1 R 2 bushy R 3 R 4... R 1 R 2 R 4 R 3 left-deep # of different join orders for an n-way join: R 3 R 4 right-deep (2n 2)! (n 1)! (n = 7 : , n = 10 : ) 313

23 Derivation of the number of possible join orderings Let J(n) denote the number of different join orderings for a join of n argument relations. Obvisouly, J(n) = T (n) n!... with T (n) the number of different binary tree shapes and n! the number of leaf permutations. 1 We can now derive T (n) inductively: T (1) = 1, n 1 T (n) = T (i) T (n i)... namely, T (n) = all possibilities T (left subtree) T (right subtree) It turns out that T (n) = C(n 1), for C(n) the n-th Catalan number, ( C(n) = 1 2n ) n+1 n = (2n)! (n+1)! n! Substituting T (n) = C(n 1), we obtain T (n) n! = 1 (2(n 1))! (n 1)! 1 see (Cormen et al., 1990) 314

24 Restricting the search space Fact: query optimization will not be able to find the overall best plan. Instead: optimizers try to avoid the really bad plans (I/O cost of different plans may differ substantially!) Restrict the search space: consider left-deep join orders only (left is outer relation, right is inner): R 1 R 2 R 4 R 3 Left-deep trees may be evaluated in a fully pipelined fashion (inner input is stored relation), intermediate results need not be written to temporary files, (Block) NL- may profit from available indexes on inner relation. Number of possible left-deep join orders for n-way join is only n! 315

25 Single relation plans Optimizer enumerates (generates) all possible plans to assess their cost. If query involves a single relation R only: Single relation plans: Consider each available method (e.g., heap scan, (un)clustered index scan) to access the tuples of a single relation R i. Keep the access method involving the least estimated cost. 316

26 Cost estimates for single relation plans (System R style) IBM System R ( 1970s): first successful relational database system, introduced most of the query optimization techniques still in use today. Pragmatic yet successful cost model for access methods on rel. R: Access method Cost { Height(I) + 1 if I is B + tree access primary key index I 2.2 if I is hash index clustered index I matching predicate p ( I + R ) sel(p) 2 unclustered index I matching predicate p sequential scan ( I + R ) sel(p) R 2 If sel(p) is unknown, assume 1/

27 Cost estimates for a single relation plan Query Q: SELECT FROM WHERE A R B = c Database profile: R = 500, R = , V (B, R) = 10 Q 1/V (B, R) R = 1/ = tuples retrieved }{{} sel(b=c) 1 Database maintains clustered index I B ( I B = 50) on attribute B: cost = ( I B + R ) 1/V (B, R) = ( ) 1/10 = 55 pages 318

28 Cost estimates for a single relation plan 2 Database maintains unclustered index I B ( I B = 50) on attribute B: cost = ( I B + R ) 1/V (B, R) = ( ) 1/10 = pages 3 No index support, use sequential file scan to access R: cost = R = 500 pages To evaluate query Q, use clustered index I B 319

29 Plans for multiple relation (join) queries We need to make sure not to miss the best left-deep join plan. Degrees of freedom left: 1 For each base relation in the query, consider all access methods. 2 For each join operation, select a join algorithm. How many possible query plans are left now? Back-of-envelope calculation (query with n relations) Assume j join algorithms available, i indexes per relation: #plans n! j n 1 (i + 1) n Example: with n = 3 relations and j = 3, i = 2: #plans 3! =

30 Plan enumeration 1 : example setup Example query (n = 3): SELECT FROM WHERE a.name, f.airline, c.name Airport a, Flight f, Crew c f.to = a.code AND f.flightno = c.flightno (Airport = A, Flight = F, Crew = C) Assumptions: Available join algorithms: hash join, block NL-, block INL- Available indexes: clustered B + tree index I on attribute Flight.to, I = 50 A = 500, 80 tuples/page F = 1000, 100 tuples/page C = F A tuples fit on a page 321

31 Plan enumeration 2 : candidate plans Enumerate n! left-deep join trees (3! = 6): C A F F A C C F A A C F F C A A F C Prune plans with (note: no join predicate between A, C) immediately! 4 candidate plans remain. 322

32 Plan enumeration 3 : join algorithm choices Candidate plan: C A F Possible join algorithm choices: NL- NL- NL- C H- C A F A F H- H- NL- C H- C A F A F Repeat for remaining 3 candidate plans. 323

33 Plan enumeration 4 : access method choices NL- Candidate plan: NL- C A F Possible access method choices: NL- NL- NL- C heap scan INL- C heap scan heap scan A F heap scan heap scan A F index scan on F.to Repeat for remaining candidate plans. 324

34 Plan enumeration 5 : cost estimation Estimate cost for candidate plan: NL- INL- C heap scan heap scan A F index scan Cost heap scan A: 500 (pages) Cost of A F : A sel(a.code = F.to) ( F + I ) = F.to is key / ( ) A F = A F /100 = F /100 = /100 = (pages) Cost of (A F ) C: A F C = = Total estimated cost: =

35 Plan enumeration 5 : cost estimation Current candidate plan: Remember: A = 500, F = 1 000, C = 10 A F = NL- NL- C heap scan heap scan A F heap scan NL-: scan left input + scan right input once for each page in left input Total estimated cost: A + A F + A F B = =

36 Plan enumeration 5 : cost estimation Current candidate plan: Remember: A = 500, F = 1 000, C = 10 A F = H- NL- C heap scan heap scan A F heap scan NL-: scan left input + scan right input once for each page in left input H- (assume 2 passes): 2 (scan both inputs + hash both inputs into buckets) + read hash buckets with join partners Total estimated cost: A + A F + 2 A F + 2 B + (A F ) B = =

37 Plan enumeration 5 : cost estimation Current candidate plan: Remember: A = 500, F = 1 000, C = 10 A F = NL- H- C heap scan heap scan A F heap scan NL-: scan left input + scan right input once for each page in left input H- (assume 2 passes): 2 (scan both inputs + hash both inputs into buckets) + read hash buckets with join partners Total estimated cost: 2 ( A + F ) + A F + A F B = 2 ( ) =

38 Plan enumeration 5 : cost estimation Current candidate plan: Remember: A = 500, F = 1 000, C = 10 A F = H- H- C heap scan heap scan A F heap scan NL-: scan left input + scan right input once for each page in left input H- (assume 2 passes): 2 (scan both inputs + hash both inputs into buckets) + read hash buckets with join partners Total estimated cost: 2 ( A + F ) + A F + 2 ( A F + B ) + B = 2 ( ) ( ) + 10 =

39 Repeated enumeration of identical sub-plans The plan enumeration reconsiders the same sub-plans over and over again. Cost and result size of sub-plan indepedent of larger embedding plan: NL- H- NL- NL- C scan NL- C scan H- C scan scan A F scan scan A F scan scan A F scan H- NL- H- H- C scan INL- scan A C scan INL- C scan F scan scan A F index scan A F index! Idea: Remember already considered sub-plans in memoization data structure. Resulting approach known as dynamic programming. 330

40 10.6 Dynamic programming strategy (System R) Divide plan enumeration into n passes (for a query with n joined relations): 1 Pass 1 (all 1-relation plans): Find best 1-relation plans for each relation (i.e., select access method) 2 Pass 2 (all 2-relation plans): Find best way to join plans of Pass 1 to another relation (generate left-deep trees: sub-plans of Pass 1 appear as outer in join). 3 Pass n (all n-relation plans): Find best way to join plans of Pass n 1 to the nth relation (sub-plans of Pass n 1 appear as outer in join) A k 1 relation sub-plan P is not combined with a kth relation R unless there is a join condition between the relations in P and R or all join conditions already present in P (avoid if possible). 331

41 Plan enumeration: pruning, interesting orders For each sub-plan obtained this way, remember cost and result size estimates! Pruning: For each subset of relations joined, keep only cheapest sub-plan overall + cheapest sub-plans that generate an intermediate result with an interesting order of tuples. Interesting order determined by presence of SQL ORDER BY clause in the query presence of SQL GROUP BY clause in the query join attributes of subsequent equi-joins (prepare for merge-). 332

42 System R style plan enumeration Example query: SELECT FROM WHERE a.name, f.airline, c.name Airport a, Flight f, Crew c f.to = a.code AND f.flightno = c.flightno Now assume: Available join algorithms: merge-, block NL-, block INL- Available indexes: clustered B + tree index I on A.code, height(i) = 3, I leaf = 500 A = , 5 tuples/page F = 10, 10 tuples/page C = 10, 20 tuples/page 10 F A tuples fit on a page, 10 F C tuples fit on a page 333

43 System R: Pass 1 (1-relation plans) Access methods for A: 1 heap scan cost = A = index scan on A.code, index I cost = I + A = = Keep 1 and 2 since 2 has interesting order on attribute to which is a join attribute. Access method for F : 1 heap scan cost = F = 10 Access method for C: 1 heap scan cost = C =

44 System R: Pass 2 (2-relation plans) Start with 1-relation plan to access A as outer:? Heap scan of A as outer: 1? = NL- A F cost = F = = ? = M- (assume 2-way sort/merge): cost = F + F = Index scan of A as outer: 3? = NL- cost = F = = ? = M- (assume 2-way sort/merge): cost = F + F = Keep 4 only (N.B. uses interesting order in non-optimal sub-plan!) 335

45 System R: Pass 2 (cont d) Start with F as outer:? F A/C? A as inner: 1? = NL-, heap scan A cost = F + F A = ? = INL-, index scan A cost = F + F (height(i) + 1) = (3 + 1) = 410 3? = M-, heap scan A cost = F + A + 2 ( F + A ) = ? = M-, index scan A cost = F + 2 F = Keep! C as inner: 5? = NL- cost = F + F C = = 110 6? = M- Keep! cost = F + C + 2 ( F + C ) = ( ) =

46 System R: Pass 2 (cont d) Start with C as outer:? C F 1? = NL- cost = C + C F = = 110 2? = M- cost = C + F + 2 ( C + F ) = ( ) = 60 Keep! N.B. C A not enumerated because of cross product ( ) avoidance. 337

47 System R: further pruning of 2-relation plans A F : M- INL- 1 index A F scan 2 scan F A index cost = , order on to cost = 410, no order C F : M- M- 3 scan C F scan 4 scan F C scan cost = 60, order on flightno cost = 60, order on flightno Keep 2 and 3 or 4 (order in 1 not interesting for subsequent join(s)). 338

48 System R: Pass 3 (3-relation plans) Best (A F ) sub-plan: cost = 410, no order, A F = 10 NL- INL- C 1 scan F A index cost = A F C = = M- INL- C scan F A index cost = C + 2 ( A F + C ) = ( ) =

49 System R: Pass 3 (cont d) Best (C F ) sub-plan: cost = 60, order on flightno, C F = 10, C F = 100 NL- M- A scan scan F C scan cost = M- M- A scan scan F C scan cost = INL- M- A index scan F C scan cost = = 460 M- M- A index scan F C scan cost =

50 System R: And the winner is... INL- M- A index cost = 460 Observations: scan F C scan Best plan mixes join algorithms and exploits indexes. Worst plan had cost > (exact cost unknown due to pruning). Optimization yielded 1000-fold improvement over worst plan! 341

51 Bibliography Astrahan, M. M., Schkolnick, M., and Kim, W. (1980). Performance of the System R access path selection mechanism. In IFIP Congress, pages Chamberlin, D., Astrahan, M., Blasgen, M., Gray, J., King, W., Lindsay, B., Lorie, R., Mehl, J., Price, T., Putzolu, F., Selinger, P., Schkolnick, M., Shultz, D., Traiger, I., Wade, B., and Yost, R. (1981). History and evaluation of System/R. Communications of the ACM, 24(10): Cormen, T. T., Leiserson, C. E., and Rivest, R. L. (1990). Introduction to algorithms. MIT Press. Jarke, M. and Koch, J. (1984). Query optimization in database systems. ACM Computing Surveys, 16(2): Ramakrishnan, R. and Gehrke, J. (2003). Database Management Systems. McGraw-Hill, New York, 3 edition. W. Kim, D.S. Reiner, D. B., editor (1985). Query Processing in Database Systems. Springer-Verlag. 342

You are here! Query Processor. Recovery. Discussed here: DBMS. Task 3 is often called algebraic (or re-write) query optimization, while

You are here! Query Processor. Recovery. Discussed here: DBMS. Task 3 is often called algebraic (or re-write) query optimization, while Module 10: Query Optimization Module Outline 10.1 Outline of Query Optimization 10.2 Motivating Example 10.3 Equivalences in the relational algebra 10.4 Heuristic optimization 10.5 Explosion of search

More information

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam. Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this

More information

Databases 2011 The Relational Algebra

Databases 2011 The Relational Algebra Databases 2011 Christian S. Jensen Computer Science, Aarhus University What is an Algebra? An algebra consists of values operators rules Closure: operations yield values Examples integers with +,, sets

More information

Query Optimization: Exercise

Query Optimization: Exercise Query Optimization: Exercise Session 6 Bernhard Radke November 27, 2017 Maximum Value Precedence (MVP) [1] Weighted Directed Join Graph (WDJG) Weighted Directed Join Graph (WDJG) 1000 0.05 R 1 0.005 R

More information

Exam 1. March 12th, CS525 - Midterm Exam Solutions

Exam 1. March 12th, CS525 - Midterm Exam Solutions Name CWID Exam 1 March 12th, 2014 CS525 - Midterm Exam s Please leave this empty! 1 2 3 4 5 Sum Things that you are not allowed to use Personal notes Textbook Printed lecture notes Phone The exam is 90

More information

Relational operations

Relational operations Architecture Relational operations We will consider how to implement and define physical operators for: Projection Selection Grouping Set operators Join Then we will discuss how the optimizer use them

More information

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions

Quiz 2. Due November 26th, CS525 - Advanced Database Organization Solutions Name CWID Quiz 2 Due November 26th, 2015 CS525 - Advanced Database Organization s Please leave this empty! 1 2 3 4 5 6 7 Sum Instructions Multiple choice questions are graded in the following way: You

More information

6.830 Lecture 11. Recap 10/15/2018

6.830 Lecture 11. Recap 10/15/2018 6.830 Lecture 11 Recap 10/15/2018 Celebration of Knowledge 1.5h No phones, No laptops Bring your Student-ID The 5 things allowed on your desk Calculator allowed 4 pages (2 pages double sided) of your liking

More information

CSE 562 Database Systems

CSE 562 Database Systems Outline Query Optimization CSE 562 Database Systems Query Processing: Algebraic Optimization Some slides are based or modified from originals by Database Systems: The Complete Book, Pearson Prentice Hall

More information

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1)

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1) Correlated subqueries Query Optimization CPS Advanced Database Systems SELECT CID FROM Course Executing correlated subquery is expensive The subquery is evaluated once for every CPS course Decorrelate!

More information

Join Ordering. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H :

Join Ordering. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : IKKBZ First Lemma Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : and, hence, C H (AUVB) = C H (A) +T (A)C H (U) +T (A)T (U)C H (V ) +T (A)T

More information

CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing

CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing Hector Garcia-Molina Zoltan Gyongyi CS 347 Notes 03 1 Query Processing! Decomposition! Localization! Optimization CS 347

More information

Spatial Database. Ahmad Alhilal, Dimitris Tsaras

Spatial Database. Ahmad Alhilal, Dimitris Tsaras Spatial Database Ahmad Alhilal, Dimitris Tsaras Content What is Spatial DB Modeling Spatial DB Spatial Queries The R-tree Range Query NN Query Aggregation Query RNN Query NN Queries with Validity Information

More information

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS INTRODUCTION TO RELATIONAL DATABASE SYSTEMS DATENBANKSYSTEME 1 (INF 3131) Torsten Grust Universität Tübingen Winter 2017/18 1 THE RELATIONAL ALGEBRA The Relational Algebra (RA) is a query language for

More information

Relational-Database Design

Relational-Database Design C H A P T E R 7 Relational-Database Design Exercises 7.2 Answer: A decomposition {R 1, R 2 } is a lossless-join decomposition if R 1 R 2 R 1 or R 1 R 2 R 2. Let R 1 =(A, B, C), R 2 =(A, D, E), and R 1

More information

Lineage implementation in PostgreSQL

Lineage implementation in PostgreSQL Lineage implementation in PostgreSQL Andrin Betschart, 09-714-882 Martin Leimer, 09-728-569 3. Oktober 2013 Contents Contents 1. Introduction 3 2. Lineage computation in TPDBs 4 2.1. Lineage......................................

More information

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and 7 RC Simulates RA. We now show that DRC (and hence TRC) is at least as expressive as RA. That is, given an RA expression E that mentions at most C, there is an equivalent DRC expression E that mentions

More information

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

DATA MINING LECTURE 3. Frequent Itemsets Association Rules DATA MINING LECTURE 3 Frequent Itemsets Association Rules This is how it all started Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases.

More information

Relational Algebra & Calculus

Relational Algebra & Calculus Relational Algebra & Calculus Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Outline v Conceptual Design: ER model v Logical Design: ER to relational model v Querying and

More information

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 2. More operations: renaming. Previous lecture. Renaming.

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 2. More operations: renaming. Previous lecture. Renaming. Plan of the lecture G53RDB: Theory of Relational Lecture 2 Natasha Alechina chool of Computer cience & IT nza@cs.nott.ac.uk Renaming Joins Definability of intersection Division ome properties of relational

More information

Join Ordering. 3. Join Ordering

Join Ordering. 3. Join Ordering 3. Join Ordering Basics Search Space Greedy Heuristics IKKBZ MVP Dynamic Programming Generating Permutations Transformative Approaches Randomized Approaches Metaheuristics Iterative Dynamic Programming

More information

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins 11 Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins Wendy OSBORN a, 1 and Saad ZAAMOUT a a Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge,

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length

More information

Environment (Parallelizing Query Optimization)

Environment (Parallelizing Query Optimization) Advanced d Query Optimization i i Techniques in a Parallel Computing Environment (Parallelizing Query Optimization) Wook-Shin Han*, Wooseong Kwak, Jinsoo Lee Guy M. Lohman, Volker Markl Kyungpook National

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 3: Query Processing Query Processing Decomposition Localization Optimization CS 347 Notes 3 2 Decomposition Same as in centralized system

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 4: Query Optimization Query Optimization Cost estimation Strategies for exploring plans Q min CS 347 Notes 4 2 Cost Estimation Based on

More information

Dependable Cardinality Forecasts for XQuery

Dependable Cardinality Forecasts for XQuery c Systems Group Department of Computer Science ETH Zürich August 26, 2008 Dependable Cardinality Forecasts for XQuery Jens Teubner, ETH (formerly IBM Research) Torsten Grust, U Tübingen (formerly TUM)

More information

A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms

A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms A General Lower ound on the I/O-Complexity of Comparison-based Algorithms Lars Arge Mikael Knudsen Kirsten Larsent Aarhus University, Computer Science Department Ny Munkegade, DK-8000 Aarhus C. August

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) Relational Calculus Lecture 5, January 27, 2014 Mohammad Hammoud Today Last Session: Relational Algebra Today s Session: Relational algebra The division operator and summary

More information

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Publication date 2014 Copyright 2014 Melinda Tóth, Zoltán Horváth Supported by TÁMOP-412A/1-11/1-2011-0052

More information

TASM: Top-k Approximate Subtree Matching

TASM: Top-k Approximate Subtree Matching TASM: Top-k Approximate Subtree Matching Nikolaus Augsten 1 Denilson Barbosa 2 Michael Böhlen 3 Themis Palpanas 4 1 Free University of Bozen-Bolzano, Italy augsten@inf.unibz.it 2 University of Alberta,

More information

Database Systems SQL. A.R. Hurson 323 CS Building

Database Systems SQL. A.R. Hurson 323 CS Building SQL A.R. Hurson 323 CS Building Structured Query Language (SQL) The SQL language has the following features as well: Embedded and Dynamic facilities to allow SQL code to be called from a host language

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) Relational Calculus Lecture 6, January 26, 2016 Mohammad Hammoud Today Last Session: Relational Algebra Today s Session: Relational calculus Relational tuple calculus Announcements:

More information

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Data Analytics Beyond OLAP. Prof. Yanlei Diao Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of

More information

Left-Deep Processing Trees. for general join graphs and a quite complex cost function counting disk

Left-Deep Processing Trees. for general join graphs and a quite complex cost function counting disk On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products Sophie Cluet and Guido Moerkotte 2 INRIA, Domaine de Voluceau, 7853 Le Chesnay Cedex, France 2 Lehrstuhl fur Informatik

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #3: SQL---Part 1 CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #3: SQL---Part 1 Announcements---Project Goal: design a database system applica=on with a web front-end Project Assignment

More information

Advanced Implementations of Tables: Balanced Search Trees and Hashing

Advanced Implementations of Tables: Balanced Search Trees and Hashing Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the

More information

CS632 Notes on Relational Query Languages I

CS632 Notes on Relational Query Languages I CS632 Notes on Relational Query Languages I A. Demers 6 Feb 2003 1 Introduction Here we define relations, and introduce our notational conventions, which are taken almost directly from [AD93]. We begin

More information

Relational Algebra and Calculus

Relational Algebra and Calculus Topics Relational Algebra and Calculus Linda Wu Formal query languages Preliminaries Relational algebra Relational calculus Expressive power of algebra and calculus (CMPT 354 2004-2) Chapter 4 CMPT 354

More information

Relational Algebra on Bags. Why Bags? Operations on Bags. Example: Bag Selection. σ A+B < 5 (R) = A B

Relational Algebra on Bags. Why Bags? Operations on Bags. Example: Bag Selection. σ A+B < 5 (R) = A B Relational Algebra on Bags Why Bags? 13 14 A bag (or multiset ) is like a set, but an element may appear more than once. Example: {1,2,1,3} is a bag. Example: {1,2,3} is also a bag that happens to be a

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Relational Algebra 3 February 2012 Prof. Walid Aref Core Relational Algebra A small set of operators that allow us to manipulate relations in limited but useful ways. The operators

More information

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation Knowledge Bases and Databases Part 1: First-Order Queries Diego Calvanese Faculty of Computer Science Master of Science in Computer Science A.Y. 2007/2008 Overview of Part 1: First-order queries 1 First-order

More information

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows:

Outline. 1 Introduction. 3 Quicksort. 4 Analysis. 5 References. Idea. 1 Choose an element x and reorder the array as follows: Outline Computer Science 331 Quicksort Mike Jacobson Department of Computer Science University of Calgary Lecture #28 1 Introduction 2 Randomized 3 Quicksort Deterministic Quicksort Randomized Quicksort

More information

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =.

Each internal node v with d(v) children stores d 1 keys. k i 1 < key in i-th sub-tree k i, where we use k 0 = and k d =. 7.5 (a, b)-trees 7.5 (a, b)-trees Definition For b a an (a, b)-tree is a search tree with the following properties. all leaves have the same distance to the root. every internal non-root vertex v has at

More information

Sub-Queries in SQL SQL. 3 Types of Sub-Queries. 3 Types of Sub-Queries. Scalar Sub-Query Example. Scalar Sub-Query Example

Sub-Queries in SQL SQL. 3 Types of Sub-Queries. 3 Types of Sub-Queries. Scalar Sub-Query Example. Scalar Sub-Query Example SQL Sub-Queries in SQL Peter Y. Wu Department of Computer and Information Systems Robert Morris University Data Definition Language Create table for schema and constraints Drop table Data Manipulation

More information

Implementation and Optimization Issues of the ROLAP Algebra

Implementation and Optimization Issues of the ROLAP Algebra General Research Report Implementation and Optimization Issues of the ROLAP Algebra F. Ramsak, M.S. (UIUC) Dr. V. Markl Prof. R. Bayer, Ph.D. Contents Motivation ROLAP Algebra Recap Optimization Issues

More information

Mathematical Logic Part Three

Mathematical Logic Part Three Mathematical Logic Part hree riday our Square! oday at 4:15PM, Outside Gates Announcements Problem Set 3 due right now. Problem Set 4 goes out today. Checkpoint due Monday, October 22. Remainder due riday,

More information

CS/IT OPERATING SYSTEMS

CS/IT OPERATING SYSTEMS CS/IT 5 (CR) Total No. of Questions :09] [Total No. of Pages : 0 II/IV B.Tech. DEGREE EXAMINATIONS, DECEMBER- 06 CS/IT OPERATING SYSTEMS. a) System Boot Answer Question No. Compulsory. Answer One Question

More information

CS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce

CS 347. Parallel and Distributed Data Processing. Spring Notes 11: MapReduce CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 11: MapReduce Motivation Distribution makes simple computations complex Communication Load balancing Fault tolerance Not all applications

More information

Journal of Theoretical and Applied Information Technology 30 th November Vol.96. No ongoing JATIT & LLS QUERYING RDF DATA

Journal of Theoretical and Applied Information Technology 30 th November Vol.96. No ongoing JATIT & LLS QUERYING RDF DATA QUERYING RDF DATA 1,* ATTA-UR-RAHMAN, 2 FAHD ABDULSALAM ALHAIDARI 1,* Department of Computer Science, 2 Department of Computer Information System College of Computer Science and Information Technology,

More information

ICS141: Discrete Mathematics for Computer Science I

ICS141: Discrete Mathematics for Computer Science I ICS141: Discrete Mathematics for Computer Science I Dept. Information & Computer Sci., Originals slides by Dr. Baek and Dr. Still, adapted by J. Stelovsky Based on slides Dr. M. P. Frank and Dr. J.L. Gross

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity

Lecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on

More information

Efficient query evaluation

Efficient query evaluation Efficient query evaluation Maria Luisa Sapino Set of values E.g. select * from EMPLOYEES where SALARY = 1500; Result of a query Sorted list E.g. select * from CAR-IMAGE where color = red ; 2 Queries as

More information

GIS CONCEPTS ARCGIS METHODS AND. 2 nd Edition, July David M. Theobald, Ph.D. Natural Resource Ecology Laboratory Colorado State University

GIS CONCEPTS ARCGIS METHODS AND. 2 nd Edition, July David M. Theobald, Ph.D. Natural Resource Ecology Laboratory Colorado State University GIS CONCEPTS AND ARCGIS METHODS 2 nd Edition, July 2005 David M. Theobald, Ph.D. Natural Resource Ecology Laboratory Colorado State University Copyright Copyright 2005 by David M. Theobald. All rights

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

On the Study of Tree Pattern Matching Algorithms and Applications

On the Study of Tree Pattern Matching Algorithms and Applications On the Study of Tree Pattern Matching Algorithms and Applications by Fei Ma B.Sc., Simon Fraser University, 2004 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of

More information

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS Study Guide: Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS This guide presents some study questions with specific referral to the essential

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties

Outline. Approximation: Theory and Algorithms. Application Scenario. 3 The q-gram Distance. Nikolaus Augsten. Definition and Properties Outline Approximation: Theory and Algorithms Nikolaus Augsten Free University of Bozen-Bolzano Faculty of Computer Science DIS Unit 3 March 13, 2009 2 3 Nikolaus Augsten (DIS) Approximation: Theory and

More information

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation rela%onal algebra Query Processing 3 steps: Parsing & Translation Optimization Evaluation 30 Simple set of algebraic operations on relations Journey of a query SQL select from where Rela%onal algebra π

More information

Approximate String Joins in a Database (Almost) for Free

Approximate String Joins in a Database (Almost) for Free Approximate String Joins in a Database (Almost) for Free Erratum Luis Gravano Panagiotis G. Ipeirotis H. V. Jagadish Columbia University Columbia University University of Michigan gravano@cs.columbia.edu

More information

Review Of Topics. Review: Induction

Review Of Topics. Review: Induction Review Of Topics Asymptotic notation Solving recurrences Sorting algorithms Insertion sort Merge sort Heap sort Quick sort Counting sort Radix sort Medians/order statistics Randomized algorithm Worst-case

More information

Spatial analysis in XML/GML/SVG based WebGIS

Spatial analysis in XML/GML/SVG based WebGIS Spatial analysis in XML/GML/SVG based WebGIS Haosheng Huang, Yan Li huang@cartography.tuwien.ac.at and yanli@scnu.edu.cn Research Group Cartography, Vienna University of Technology Spatial Information

More information

CSE 4502/5717: Big Data Analytics

CSE 4502/5717: Big Data Analytics CSE 4502/5717: Big Data Analytics otes by Anthony Hershberger Lecture 4 - January, 31st, 2018 1 Problem of Sorting A known lower bound for sorting is Ω is the input size; M is the core memory size; and

More information

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997 Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

More information

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms CSE 101, Winter 2018 Design and Analysis of Algorithms Lecture 5: Divide and Conquer (Part 2) Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ A Lower Bound on Convex Hull Lecture 4 Task: sort the

More information

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology

Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology Abstract parsing: static analysis of dynamically generated string output using LR-parsing technology Kyung-Goo Doh 1, Hyunha Kim 1, David A. Schmidt 2 1. Hanyang University, Ansan, South Korea 2. Kansas

More information

Query answering using views

Query answering using views Query answering using views General setting: database relations R 1,...,R n. Several views V 1,...,V k are defined as results of queries over the R i s. We have a query Q over R 1,...,R n. Question: Can

More information

QSQL: Incorporating Logic-based Retrieval Conditions into SQL

QSQL: Incorporating Logic-based Retrieval Conditions into SQL QSQL: Incorporating Logic-based Retrieval Conditions into SQL Sebastian Lehrack and Ingo Schmitt Brandenburg University of Technology Cottbus Institute of Computer Science Chair of Database and Information

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

Compact Representation for Answer Sets of n-ary Regular Queries

Compact Representation for Answer Sets of n-ary Regular Queries Compact Representation for Answer Sets of n-ary Regular Queries Kazuhiro Inaba 1 and Haruo Hosoya 1 The University of Tokyo, {kinaba,hahosoya}@is.s.u-tokyo.ac.jp Abstract. An n-ary query over trees takes

More information

Fly Cheaply: On the Minimum Fuel Consumption Problem

Fly Cheaply: On the Minimum Fuel Consumption Problem Journal of Algorithms 41, 330 337 (2001) doi:10.1006/jagm.2001.1189, available online at http://www.idealibrary.com on Fly Cheaply: On the Minimum Fuel Consumption Problem Timothy M. Chan Department of

More information

Relational completeness of query languages for annotated databases

Relational completeness of query languages for annotated databases Relational completeness of query languages for annotated databases Floris Geerts 1,2 and Jan Van den Bussche 1 1 Hasselt University/Transnational University Limburg 2 University of Edinburgh Abstract.

More information

CSE 202 Homework 4 Matthias Springer, A

CSE 202 Homework 4 Matthias Springer, A CSE 202 Homework 4 Matthias Springer, A99500782 1 Problem 2 Basic Idea PERFECT ASSEMBLY N P: a permutation P of s i S is a certificate that can be checked in polynomial time by ensuring that P = S, and

More information

GAV-sound with conjunctive queries

GAV-sound with conjunctive queries GAV-sound with conjunctive queries Source and global schema as before: source R 1 (A, B),R 2 (B,C) Global schema: T 1 (A, C), T 2 (B,C) GAV mappings become sound: T 1 {x, y, z R 1 (x,y) R 2 (y,z)} T 2

More information

Regular n-ary Queries in Trees and Variable Independence

Regular n-ary Queries in Trees and Variable Independence Regular n-ary Queries in Trees and Variable Independence Emmanuel Filiot Sophie Tison Laboratoire d Informatique Fondamentale de Lille (LIFL) INRIA Lille Nord-Europe, Mostrare Project IFIP TCS, 2008 E.Filiot

More information

Sequential: Vector of Bits

Sequential: Vector of Bits Counting the Number of Accesses Sequential: Vector of Bits When estimating seek costs, we need to calculate the probability distribution for the distance between two subsequent qualifying cylinders. We

More information

Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++ Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33 Parallelizing Embarrassingly Parallel

More information

First Lemma. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : C H (AUVB) = C H (A)

First Lemma. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : C H (AUVB) = C H (A) IKKBZ First Lemma Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : and, hence, C H (AUVB) = C H (A) +T(A)C H (U) +T(A)T(U)C H (V) +T(A)T(U)T(V)C

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR)

Databases. DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR) Databases DBMS Architecture: Hashing Techniques (RDBMS) and Inverted Indexes (IR) References Hashing Techniques: Elmasri, 7th Ed. Chapter 16, section 8. Cormen, 3rd Ed. Chapter 11. Inverted indexing: Elmasri,

More information

On Ordering Descriptions in a Description Logic

On Ordering Descriptions in a Description Logic On Ordering Descriptions in a Description Logic Jeffrey Pound, Lubomir Stanchev, David Toman, and Grant Weddell David R. Cheriton School of Computer Science University of Waterloo, Canada Abstract. We

More information

Data Dependencies in the Presence of Difference

Data Dependencies in the Presence of Difference Data Dependencies in the Presence of Difference Tsinghua University sxsong@tsinghua.edu.cn Outline Introduction Application Foundation Discovery Conclusion and Future Work Data Dependencies in the Presence

More information

α-acyclic Joins Jef Wijsen May 4, 2017

α-acyclic Joins Jef Wijsen May 4, 2017 α-acyclic Joins Jef Wijsen May 4, 2017 1 Motivation Joins in a Distributed Environment Assume the following relations. 1 M[NN, Field of Study, Year] stores data about students of UMONS. For example, (19950423158,

More information

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting

Outline for today. Information Retrieval. Cosine similarity between query and document. tf-idf weighting Outline for today Information Retrieval Efficient Scoring and Ranking Recap on ranked retrieval Jörg Tiedemann jorg.tiedemann@lingfil.uu.se Department of Linguistics and Philology Uppsala University Efficient

More information

A Humble Introduction to DIJKSTRA S A A DISCIPLINE OF PROGRAMMING

A Humble Introduction to DIJKSTRA S A A DISCIPLINE OF PROGRAMMING A Humble Introduction to DIJKSTRA S A A DISCIPLINE OF PROGRAMMING Do-Hyung Kim School of Computer Science and Engineering Sungshin Women s s University CONTENTS Bibliographic Information and Organization

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Classification of Join Ordering Problems

Classification of Join Ordering Problems Classification of Join Ordering Problems We distinguish four different dimensions: 1. query graph class: chain, cycle, star, and clique 2. join tree structure: left-deep, zig-zag, or bushy trees 3. join

More information

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February) Algorithms and Data Structures 016 Week 5 solutions (Tues 9th - Fri 1th February) 1. Draw the decision tree (under the assumption of all-distinct inputs) Quicksort for n = 3. answer: (of course you should

More information

Single Axioms for Boolean Algebra

Single Axioms for Boolean Algebra ARGONNE NATIONAL LABORATORY 9700 South Cass Avenue Argonne, IL 60439 ANL/MCS-TM-243 Single Axioms for Boolean Algebra by William McCune http://www.mcs.anl.gov/ mccune Mathematics and Computer Science Division

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give

More information

Association Rules. Fundamentals

Association Rules. Fundamentals Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia

More information

Predicate Logic - Introduction

Predicate Logic - Introduction Outline Motivation Predicate Logic - Introduction Predicates & Functions Quantifiers, Coming to Terms with Formulas Quantifier Scope & Bound Variables Free Variables & Sentences c 2001 M. Lawford 1 Motivation:

More information

Relational Nonlinear FIR Filters. Ronald K. Pearson

Relational Nonlinear FIR Filters. Ronald K. Pearson Relational Nonlinear FIR Filters Ronald K. Pearson Daniel Baugh Institute for Functional Genomics and Computational Biology Thomas Jefferson University Philadelphia, PA Moncef Gabbouj Institute of Signal

More information

Configuring Spatial Grids for Efficient Main Memory Joins

Configuring Spatial Grids for Efficient Main Memory Joins Configuring Spatial Grids for Efficient Main Memory Joins Farhan Tauheed, Thomas Heinis, and Anastasia Ailamaki École Polytechnique Fédérale de Lausanne (EPFL), Imperial College London Abstract. The performance

More information