Multi-join Query Evaluation on Big Data Lecture 2

Similar documents
Query Processing with Optimal Communication Cost

A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries

Communication Steps for Parallel Query Processing

Parallel-Correctness and Transferability for Conjunctive Queries

Single-Round Multi-Join Evaluation. Bas Ketsman

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

c Copyright 2015 Paraschos Koutris

Lecture 26 December 1, 2015

Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.

Factorized Relational Databases Olteanu and Závodný, University of Oxford

How to Optimally Allocate Resources for Coded Distributed Computing?

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Notes on MapReduce Algorithms

COMS E F15. Lecture 22: Linearity Testing Sparse Fourier Transform

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Fault-Tolerant Consensus

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

CS 347 Parallel and Distributed Data Processing

variance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.

Causality and Time. The Happens-Before Relation

Lower Bound Techniques for Multiparty Communication Complexity

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

Complexity theory for fellow CS students

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

Stream Computation and Arthur- Merlin Communication

Second main application of Chernoff: analysis of load balancing. Already saw balls in bins example. oblivious algorithms only consider self packet.

Randomized Algorithms. Lecture 4. Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010

A Lower Bound for Boolean Satisfiability on Turing Machines

Probabilistic Proofs of Existence of Rare Events. Noga Alon

Stochastic Submodular Cover with Limited Adaptivity

arxiv: v7 [cs.db] 22 Dec 2015

Lecture 13: 04/23/2014

Queries and Materialized Views on Probabilistic Databases

NP-Complete Reductions 2

Database Theory VU , SS Ehrenfeucht-Fraïssé Games. Reinhard Pichler

Section 6 Fault-Tolerant Consensus

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Symmetry and hardness of submodular maximization problems

Finding similar items

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

arxiv: v1 [cs.ai] 21 Sep 2014

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Ma/CS 117c Handout # 5 P vs. NP

Communication Complexity

Lecture 4 Chiu Yuen Koo Nikolai Yakovenko. 1 Summary. 2 Hybrid Encryption. CMSC 858K Advanced Topics in Cryptography February 5, 2004

Computational and Statistical Learning Theory

On Locating-Dominating Codes in Binary Hamming Spaces

6.842 Randomness and Computation Lecture 5

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 21: P vs BPP 2

CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing

10.4 The Kruskal Katona theorem

CSE 20 DISCRETE MATH WINTER

From Secure MPC to Efficient Zero-Knowledge

Lecture 38. CSE 331 Dec 4, 2017

Algorithms for Data Science

Flow Algorithms for Two Pipelined Filtering Problems

Lecture 12: Interactive Proofs

Application: Bucket Sort

6.852: Distributed Algorithms Fall, Class 10

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking

CS425: Algorithms for Web Scale Data

Privacy-Preserving Data Mining

Polynomial Identity Testing and Circuit Lower Bounds

Lecture 5 - Information theory

Methods that transform the original network into a tighter and tighter representations

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

Distributed Oblivious RAM for Secure Two-Party Computation

PCP Theorem and Hardness of Approximation

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing

Lecture 2: January 18

CS632 Notes on Relational Query Languages I

Theory of Computer Science to Msc Students, Spring Lecture 2

Notes on Complexity Theory Last updated: November, Lecture 10

INSTITUTE OF MATHEMATICS THE CZECH ACADEMY OF SCIENCES. A trade-off between length and width in resolution. Neil Thapen

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

The Complexity of a Reliable Distributed System

Theoretical Cryptography, Lecture 10

Math 262A Lecture Notes - Nechiporuk s Theorem

CS505: Distributed Systems

Lecture 16: Communication Complexity

Parallel Pipelined Filter Ordering with Precedence Constraints

Algorithms and Theory of Computation. Lecture 11: Network Flow

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

New Attacks on the Concatenation and XOR Hash Combiners

Valency Arguments CHAPTER7

V.4 MapReduce. 1. System Architecture 2. Programming Model 3. Hadoop. Based on MRS Chapter 4 and RU Chapter 2 IR&DM 13/ 14 !74

Lecture 4: DES and block ciphers

Model Counting for Logical Theories

NP-COMPLETE PROBLEMS. 1. Characterizing NP. Proof

CSE 20 DISCRETE MATH SPRING

Notes for Lecture 27

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

Friendly Logics, Fall 2015, Lecture Notes 5

Knowledge Discovery and Data Mining 1 (VO) ( )

Transcription:

Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34

Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday 14:15-15:45 Part 2 Lower bounds for Parallel Algorithms. Friday 14:15-15:45 Part 3 Optimal Parallel Algorithms. Saturday 9-10:30 Part 3 Data Skew. Saturday 11-12:30 Dan Suciu Multi-Joins Lecture 2 March, 2015 2 / 34

Brief Review of the AGM Bound Q(x) = R 1 (x 1 ),..., R l (x l ) Equal Cardinalities: R 1 =... R l = m Q m ρ, where ρ = fractional edge covering number of Q. General Case: R 1 = m 1,..., R l = m l Q min u m u 1 1 mu l l Dan Suciu Multi-Joins Lecture 2 March, 2015 3 / 34

Outline for Lecture 2 Background: Parallel Databases The MPC Model Algorithm for Triangles Lower Bound on the Communication Cost Lower Bound for General Queries Summary Dan Suciu Multi-Joins Lecture 2 March, 2015 4 / 34

Parallel Query Processing: Overview Parallel Databases: Queries and updates GAMMA machine (80 s), today in all commercial DBMS. Typically x10 s nodes. FO in AC 0 ; SQL embarrassingly parallel Parallel Data Analytics: Queries only this course MapReduce, Hadoop, PigLatin, Dremel, Scope, Spark. Typically x100 s or x1000 s nodes. Data reshuffling / communication is new bottleneck. Distributed State: Updates (transactions) will not discuss Replicate objects (3-5 times). Central problem: consistency. E.g. Google, Amazon, Yahoo, Microsoft, Ebay. Eventual consistency (NoSQL, Dynamo, BigTable, Peanuts). Strong consistency: Paxos, Spanner. Dan Suciu Multi-Joins Lecture 2 March, 2015 5 / 34

Parallel Query Processing: Overview Parallel Databases: Queries and updates GAMMA machine (80 s), today in all commercial DBMS. Typically x10 s nodes. FO in AC 0 ; SQL embarrassingly parallel Parallel Data Analytics: Queries only this course MapReduce, Hadoop, PigLatin, Dremel, Scope, Spark. Typically x100 s or x1000 s nodes. Data reshuffling / communication is new bottleneck. Distributed State: Updates (transactions) will not discuss Replicate objects (3-5 times). Central problem: consistency. E.g. Google, Amazon, Yahoo, Microsoft, Ebay. Eventual consistency (NoSQL, Dynamo, BigTable, Peanuts). Strong consistency: Paxos, Spanner. Dan Suciu Multi-Joins Lecture 2 March, 2015 5 / 34

Parallel Query Processing: Overview Parallel Databases: Queries and updates GAMMA machine (80 s), today in all commercial DBMS. Typically x10 s nodes. FO in AC 0 ; SQL embarrassingly parallel Parallel Data Analytics: Queries only this course MapReduce, Hadoop, PigLatin, Dremel, Scope, Spark. Typically x100 s or x1000 s nodes. Data reshuffling / communication is new bottleneck. Distributed State: Updates (transactions) will not discuss Replicate objects (3-5 times). Central problem: consistency. E.g. Google, Amazon, Yahoo, Microsoft, Ebay. Eventual consistency (NoSQL, Dynamo, BigTable, Peanuts). Strong consistency: Paxos, Spanner. Dan Suciu Multi-Joins Lecture 2 March, 2015 5 / 34

Key Concepts Input data of size m is partitioned on p servers, connected by a network. Query processing involves local computation and communication. Performance parameters [Gray&Dewitt 92] Speedup How does performance change when p increases? Ideal: linear speedup. Scaleup How does performance chance when p, m increase at same rate? Ideal: constant scaleup. Speed Speed 1 5 10 15 # processors (=P) # processors (=P) AND data size Dan Suciu Multi-Joins Lecture 2 March, 2015 6 / 34

Data Partition Given a database of size m, partition it on p servers. Balanced partition Each server holds m/p data. Skewed partition Some server holds m/p data. Usually, the input data is already partitioned, but we need to re-partition for a particular problem. Data reshuffling. Dan Suciu Multi-Joins Lecture 2 March, 2015 7 / 34

Example 1: Hash-Partitioned Join Compute Q(x, y, z) = R(x, y) S(y, z), where R = m 1, S = m 2 Input data The input relations R, S are partitioned on the servers. Data reshuffling Given hash function h Domain [p]: Send every tuple R(x, y) to server h(y). Send every tuple S(y, z) to server h(y). Computation In parallel, each server i computes the join R i (x, y) S i (y, z) of its local fragments R i, S i. If y is a key, then the load/server is m 1 /p + m 2 /p (why?) Linear Speedup Otherwise, we may have skew (why?) What is the worst speedup? Dan Suciu Multi-Joins Lecture 2 March, 2015 8 / 34

Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, 2015 9 / 34

Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, 2015 9 / 34

Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, 2015 9 / 34

Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, 2015 9 / 34

Example 2: Broadcast Join Compute Q(x, y, z) = R(x, y) S(y, z), where R = m 1 S = m 2 Input data The input relations R, S are partitioned on the servers. Broadcast Send every tuple S(y, z) to every server Computation In parallel, each server u computes the join R u (x, y) S(y, z) of its local fragment R u with S. If m 2 m 1 /p then the Broadcast Join is very effective. Used a lot in practice. Dan Suciu Multi-Joins Lecture 2 March, 2015 10 / 34

Massively Parallel Communication Model (MPC) [Beame 13] The MPC model is the following: p servers are connected by a network. Servers are infinitely powerful. Input Data of size m is initially uniformly partitioned. Computation = several rounds. One round = local computation plus global communication. Dan Suciu Multi-Joins Lecture 2 March, 2015 11 / 34

Massively Parallel Communication Model (MPC) [Beame 13] The MPC model is the following: p servers are connected by a network. Servers are infinitely powerful. Input Data of size m is initially uniformly partitioned. Computation = several rounds. One round = local computation plus global communication. The only cost is the communication. Definition (The Load of a algorithm on the MPC Model) L u = maximum amount of data received by server u during any round L = max u L u Dan Suciu Multi-Joins Lecture 2 March, 2015 11 / 34

Massively Parallel Communication Model (MPC) Number of servers = p Input (size=m) O(m/p) O(m/p) Server 1.... Server p Input data = size m Algorithm = Several rounds L Server 1 L.... L Round 1 Server p L Round 2 One round = Compute & communicate Server 1 Max communication load per server = L L........ Server p L Round 3 Dan Suciu Multi-Joins Lecture 2 March, 2015 12 / 34

Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, 2015 13 / 34

Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, 2015 13 / 34

Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, 2015 13 / 34

Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, 2015 13 / 34

Examples on the MPC Model Example: Join Q(x, y, z) = R(x, y), S(y, z). Round 1 (Hash-partitioned join) Send tuple R(x, y) to server h(y), send S(y, z) to server h(y), compute the join locally. Load: O(m/p) (assuming no skew). Example: Triangles Q(x, y, z) = R(x, y), S(y, z), T (z, x) Round 1 hash-partitioned join: Aux(x, y, z) = R(x, y), S(y, z) Round 2 hash-partitioned join: Q(x, y, z) = Aux(x, y, z), T (z, x) Load: can be as high as m 2 /p because of the intermediate result!! Can we compute triangles with a smaller load? Dan Suciu Multi-Joins Lecture 2 March, 2015 14 / 34

Examples on the MPC Model Example: Join Q(x, y, z) = R(x, y), S(y, z). Round 1 (Hash-partitioned join) Send tuple R(x, y) to server h(y), send S(y, z) to server h(y), compute the join locally. Load: O(m/p) (assuming no skew). Example: Triangles Q(x, y, z) = R(x, y), S(y, z), T (z, x) Round 1 hash-partitioned join: Aux(x, y, z) = R(x, y), S(y, z) Round 2 hash-partitioned join: Q(x, y, z) = Aux(x, y, z), T (z, x) Load: can be as high as m 2 /p because of the intermediate result!! Can we compute triangles with a smaller load? Dan Suciu Multi-Joins Lecture 2 March, 2015 14 / 34

A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Dan Suciu Multi-Joins Lecture 2 March, 2015 15 / 34

A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Proof Construct an instance where Q = m ρ. One server: receives r L tuples from R j, for all j, hence finds (r L) ρ answers. All p servers find p(r L) ρ answers. It follows: r L m/p 1/ρ. Dan Suciu Multi-Joins Lecture 2 March, 2015 15 / 34

A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Proof Construct an instance where Q = m ρ. One server: receives r L tuples from R j, for all j, hence finds (r L) ρ answers. All p servers find p(r L) ρ answers. It follows: r L m/p 1/ρ. Question For Q = R(x, y), S(y, z) it should be r L m/p 1/2, but we computed a join with load L = m/p. Contradiction? Dan Suciu Multi-Joins Lecture 2 March, 2015 15 / 34

A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Proof Construct an instance where Q = m ρ. One server: receives r L tuples from R j, for all j, hence finds (r L) ρ answers. All p servers find p(r L) ρ answers. It follows: r L m/p 1/ρ. Question For Q = R(x, y), S(y, z) it should be r L m/p 1/2, but we computed a join with load L = m/p. Contradiction? Answer: the algorithm is for skew-free data, the bound for skewed data. Dan Suciu Multi-Joins Lecture 2 March, 2015 15 / 34

One-Round Algorithm for Triangles [Afrati&Ullman 10,Suri&Vassilvitski 11] Q(x, y, z) = R(x, y), S(y, z), T (z, x) Place the p servers in a cube: [p] [p 1/3 ] [p 1/3 ] [p 1/3 ]. Round 1 In parallel, each server does the following: Send R(x, y) to all servers (h 1 (x), h 2 (y), ) Send S(y, z) to all servers (, h 2 (y), h 3 (z)) Send T (z, x) to all servers (h 1 (x),, h 3 (z)) Then compute Q locally Theorem (Beame 14) (1) If h 1, h 2, h 3 are independent hash functions, then the expected load at some server u is E[L u ] = m 1+m 2 +m 3 def = L. [Why do we need independence?] p 2/3 (2) If the data has no skew, then maximum load is O(L) w.h.p. Dan Suciu Multi-Joins Lecture 2 March, 2015 16 / 34

One-Round Algorithm for Triangles [Afrati&Ullman 10,Suri&Vassilvitski 11] Q(x, y, z) = R(x, y), S(y, z), T (z, x) Place the p servers in a cube: [p] [p 1/3 ] [p 1/3 ] [p 1/3 ]. Round 1 In parallel, each server does the following: Send R(x, y) to all servers (h 1 (x), h 2 (y), ) Send S(y, z) to all servers (, h 2 (y), h 3 (z)) Send T (z, x) to all servers (h 1 (x),, h 3 (z)) Then compute Q locally Theorem (Beame 14) (1) If h 1, h 2, h 3 are independent hash functions, then the expected load at some server u is E[L u ] = m 1+m 2 +m 3 def = L. [Why do we need independence?] p 2/3 (2) If the data has no skew, then maximum load is O(L) w.h.p. Dan Suciu Multi-Joins Lecture 2 March, 2015 16 / 34

One-Round Algorithm for Triangles [Afrati&Ullman 10,Suri&Vassilvitski 11] Q(x, y, z) = R(x, y), S(y, z), T (z, x) Place the p servers in a cube: [p] [p 1/3 ] [p 1/3 ] [p 1/3 ]. Round 1 In parallel, each server does the following: Send R(x, y) to all servers (h 1 (x), h 2 (y), ) Send S(y, z) to all servers (, h 2 (y), h 3 (z)) Send T (z, x) to all servers (h 1 (x),, h 3 (z)) Then compute Q locally Theorem (Beame 14) (1) If h 1, h 2, h 3 are independent hash functions, then the expected load at some server u is E[L u ] = m 1+m 2 +m 3 def = L. [Why do we need independence?] p 2/3 (2) If the data has no skew, then maximum load is O(L) w.h.p. Dan Suciu Multi-Joins Lecture 2 March, 2015 16 / 34

Discussion We will call the algorithm HyperCube, following [Beame 13]. Each tuple R(x, y) is replicated p 1/3 times. Hence, load L = m/p 2/3 Partitioning R(x, y) (h 1 (x), h 2 (y), ) more tolerant to skew: Each value x is sent to p 1/3 buckets, each y is sent to p 1/3 buckets. Can tolerate degrees up to m/p 1/3 (better than m/p). Notice that the algorithm only shuffles the data: each server still has to compute the query locally. Important to use worst-case algorithm. Non-linear speedup, because the load is m/p 2/3. Can we we compute triangles with a load of m/p? Dan Suciu Multi-Joins Lecture 2 March, 2015 17 / 34

Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) Assume R + S + T = m. Denote L lower = m/p 2/3. Theorem Any 1-round algorithm that computes Q must have load L L lower, even on database instances without skew. Hence, HyperCube is optimal for computing triangles on permutations. Next, we will prove the theorem. Dan Suciu Multi-Joins Lecture 2 March, 2015 18 / 34

Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) We assume R, S, T are permutations over a domain of size n. Example n = 4: R x y 1 3 2 1 3 4 4 2 S y z 1 4 2 2 3 1 4 3 T z x 1 1 2 2 3 3 4 4 = Q x y z 1 3 1 3 4 3 Question: what is E[ Q ], over random permutations R, S, T? Theorem (Beame 13) Denote L lower = 3n/p 2/3. Let A be an algorithm for Q, with load L < L lower. Then, the expected number of triangles returned by A is E[ A ] ( L ) L lower 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, 2015 19 / 34

Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) We assume R, S, T are permutations over a domain of size n. Example n = 4: R x y 1 3 2 1 3 4 4 2 S y z 1 4 2 2 3 1 4 3 T z x 1 1 2 2 3 3 4 4 = Q x y z 1 3 1 3 4 3 Question: what is E[ Q ], over random permutations R, S, T? Theorem (Beame 13) Denote L lower = 3n/p 2/3. Let A be an algorithm for Q, with load L < L lower. Then, the expected number of triangles returned by A is E[ A ] ( L ) L lower 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, 2015 19 / 34

Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) We assume R, S, T are permutations over a domain of size n. Example n = 4: R x y 1 3 2 1 3 4 4 2 S y z 1 4 2 2 3 1 4 3 T z x 1 1 2 2 3 3 4 4 = Q x y z 1 3 1 3 4 3 Question: what is E[ Q ], over random permutations R, S, T? Theorem (Beame 13) Denote L lower = 3n/p 2/3. Let A be an algorithm for Q, with load L < L lower. Then, the expected number of triangles returned by A is E[ A ] ( L ) L lower 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, 2015 19 / 34

Discussion: Inputs, Messages, Bits Initially, the relations R, S, T are on disjoint servers. This is w.l.o.g. Stronger: assume that R is on server 1, S on server 2, T on server 3. Any server u receives three messages: msg 1 about R msg 2 about S msg 3 about T msg 1 + msg 2 + msg 3 L bits. Messages may encode arbitrary information. E.g. bit 1 says R is even ; bit 2 says R has a 17-cycle ; etc. No lower bound is possible for a fixed input R, S, T : an algorithm may simply check for that input and encode using L = O(1) bits. Instead, the lower bound is a statement about all permutations. Dan Suciu Multi-Joins Lecture 2 March, 2015 20 / 34

Notation: Bits v.s. Tuples Size of db in #tuples m = R + S + T. Size of db in #bits M = R log n + S log n + T log n = m log n where n = size of the domain. If R is a random permutation, then size(r) = log(n!) n log n bits. L lower = m/p 2/3 tuples becomes L lower = 3 log(n!)/p 2/3 bits. Will prove the following, where L, L lower are expressed in bits: L E[ A ] ( L lower ) 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, 2015 21 / 34

Proof Part 1: E[ Q ] on Random Inputs Q(x, y, z) = R(x, y), S(y, z), T (z, x) Lemma E[ Q ] = 1, where the expectation is over random permutations R, S, T. Proof. Note: i, j, k [n], P((i, j) R) = P((j, k) S) = P((k, i) T ) = 1/n. E[ Q ] = P((i, j) R (j, k) S (k, i) T ) i,j,k independence = P((i, j) R) P((j, k) S) P((k, i) T ) = n 3 1 i,j,k n 1 n 1 n = 1 Dan Suciu Multi-Joins Lecture 2 March, 2015 22 / 34

Proof Part 1: E[ Q ] on Random Inputs Q(x, y, z) = R(x, y), S(y, z), T (z, x) Lemma E[ Q ] = 1, where the expectation is over random permutations R, S, T. Proof. Note: i, j, k [n], P((i, j) R) = P((j, k) S) = P((k, i) T ) = 1/n. E[ Q ] = P((i, j) R (j, k) S (k, i) T ) i,j,k independence = P((i, j) R) P((j, k) S) P((k, i) T ) = n 3 1 i,j,k n 1 n 1 n = 1 Dan Suciu Multi-Joins Lecture 2 March, 2015 22 / 34

Proof Part 1: E[ Q ] on Random Inputs Q(x, y, z) = R(x, y), S(y, z), T (z, x) Lemma E[ Q ] = 1, where the expectation is over random permutations R, S, T. Proof. Note: i, j, k [n], P((i, j) R) = P((j, k) S) = P((k, i) T ) = 1/n. E[ Q ] = P((i, j) R (j, k) S (k, i) T ) i,j,k independence = P((i, j) R) P((j, k) S) P((k, i) T ) = n 3 1 i,j,k n 1 n 1 n = 1 Dan Suciu Multi-Joins Lecture 2 March, 2015 22 / 34

Discussion In expectation, there is one triangle! E[ Q ] = 1 Will prove: if algorithm A has load L < L lower, then E[A] = (L/L lower ) 3/2. Dan Suciu Multi-Joins Lecture 2 March, 2015 23 / 34

Proof Part 2: What We Learn from a Message Fix a server u [p], and a message msg 1 it received about R. Definition The set of known tuples is: K 1 (msg 1 ) = {(i, j) R(msg 1 (R) = msg 1 (i, j) R)} Similarly K 2 (msg 2 ), K 3 (msg 3 ) known tuples in S and T. Observation Upon receiving msg 1, msg 2, msg 3, server u can output triangle (i, j, k) iff (i, j) K 1 (msg 1 ) (j, k) K 2 (msg 2 ) (k, i) K 2 (msg 3 ) Dan Suciu Multi-Joins Lecture 2 March, 2015 24 / 34

Proof Part 2: What We Learn from a Message Fix a server u [p], and a message msg 1 it received about R. Definition The set of known tuples is: K 1 (msg 1 ) = {(i, j) R(msg 1 (R) = msg 1 (i, j) R)} Similarly K 2 (msg 2 ), K 3 (msg 3 ) known tuples in S and T. Observation Upon receiving msg 1, msg 2, msg 3, server u can output triangle (i, j, k) iff (i, j) K 1 (msg 1 ) (j, k) K 2 (msg 2 ) (k, i) K 2 (msg 3 ) Dan Suciu Multi-Joins Lecture 2 March, 2015 24 / 34

Proof Part 2: What We Learn from a Message Fix a server u [p], and a message msg 1 it received about R. Definition The set of known tuples is: K 1 (msg 1 ) = {(i, j) R(msg 1 (R) = msg 1 (i, j) R)} Similarly K 2 (msg 2 ), K 3 (msg 3 ) known tuples in S and T. Observation Upon receiving msg 1, msg 2, msg 3, server u can output triangle (i, j, k) iff (i, j) K 1 (msg 1 ) (j, k) K 2 (msg 2 ) (k, i) K 2 (msg 3 ) Dan Suciu Multi-Joins Lecture 2 March, 2015 24 / 34

Discussion The useful information we learn from a message is the set of known tuples. We show next: if L 1 = size(msg 1 ) is small, then K 1 (msg 1 ) is small If you know only a few bits, then you know only a few tuples Later: if K 1, K 2, K 3 are small, the server knows only few triangles (AGM) Dan Suciu Multi-Joins Lecture 2 March, 2015 25 / 34

Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Dan Suciu Multi-Joins Lecture 2 March, 2015 26 / 34

Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). Dan Suciu Multi-Joins Lecture 2 March, 2015 26 / 34

Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n Dan Suciu Multi-Joins Lecture 2 March, 2015 26 / 34

Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) Dan Suciu Multi-Joins Lecture 2 March, 2015 26 / 34

Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) = H(msg 1 (R)) + m1 H(R m 1) P(m 1) chain rule Dan Suciu Multi-Joins Lecture 2 March, 2015 26 / 34

Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) = H(msg 1 (R)) + m1 H(R m 1) P(m 1) chain rule f 1 H(R) + m1 (1 K 1(m 1 ) n ) H(R) P(m 1) K = f 1 H(R) + [1 1 (m 1 ) P(m 1 ) m1 ] H(R) = f n 1 H(R) + [1 E[ K 1(m 1 ) ] ] H(R) n Dan Suciu Multi-Joins Lecture 2 March, 2015 26 / 34

Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) = H(msg 1 (R)) + m1 H(R m 1) P(m 1) chain rule f 1 H(R) + m1 (1 K 1(m 1 ) n ) H(R) P(m 1) K = f 1 H(R) + [1 1 (m 1 ) P(m 1 ) m1 ] H(R) = f n 1 H(R) + [1 E[ K 1(m 1 ) ] ] H(R) n It follows: E[ K 1(m 1) ] f 1n Dan Suciu Multi-Joins Lecture 2 March, 2015 26 / 34

Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, 2015 27 / 34

Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, 2015 27 / 34

Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, 2015 27 / 34

Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, 2015 27 / 34

Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, 2015 27 / 34

Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, 2015 27 / 34

Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, 2015 27 / 34

Discussion of the Lower Bound for the Triangle Query Algorithm A with load L < L lower = m/p 2/3, E[A] (L/L lower ) 3/2 E[ Q ] The result is for Skew-free Data There exists at least one instance on which A fails (trivial). Even if A is randomized, there exists at least one instance where A fails with high probability (Yao s lemma). Parallelism gets harder as p increases. An algorithm with load L = O( m 1 ), with ε < 1/3 reports only O( ) triangles. Fewer, as p 1 ε p 1/3 ε p increases! Dan Suciu Multi-Joins Lecture 2 March, 2015 28 / 34

Discussion of the Lower Bound for the Triangle Query Algorithm A with load L < L lower = m/p 2/3, E[A] (L/L lower ) 3/2 E[ Q ] The result is for Skew-free Data There exists at least one instance on which A fails (trivial). Even if A is randomized, there exists at least one instance where A fails with high probability (Yao s lemma). Parallelism gets harder as p increases. An algorithm with load L = O( m 1 ), with ε < 1/3 reports only O( ) triangles. Fewer, as p 1 ε p 1/3 ε p increases! Dan Suciu Multi-Joins Lecture 2 March, 2015 28 / 34

Generalization to Full Conjunctive Queries Will discuss the equal-cardinality case today. Will discuss the general case in Lecture 3. Dan Suciu Multi-Joins Lecture 2 March, 2015 29 / 34

The Fractional Vertex Cover / Edge Packing Hypergraph: Q = R 1 (x 1 ),..., R l (x l ) Nodes: x 1,..., x k, edges: R 1,..., R l. Definition A fractional vertex cover of Q is a sequence v 1 0,..., v k 0 such that: j i x i R j v i 1 A fractional edge packing of Q is a sequence u 1 0,..., u l 0 such that: i j x i R j u j 1 By duality: min v i v i = max u j u j = τ Dan Suciu Multi-Joins Lecture 2 March, 2015 30 / 34

The Fractional Vertex Cover / Edge Packing Hypergraph: Q = R 1 (x 1 ),..., R l (x l ) Nodes: x 1,..., x k, edges: R 1,..., R l. Definition A fractional vertex cover of Q is a sequence v 1 0,..., v k 0 such that: j i x i R j v i 1 A fractional edge packing of Q is a sequence u 1 0,..., u l 0 such that: i j x i R j u j 1 By duality: min v i v i = max u j u j = τ Dan Suciu Multi-Joins Lecture 2 March, 2015 30 / 34

The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. Dan Suciu Multi-Joins Lecture 2 March, 2015 31 / 34

The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. def v = (v 1,..., v k ) any fractional vertex cover; v 0 = i v i. Organize the p servers in a hypercube: [p] [p v 1 v0 ] [p Choose k independent hash functions h 1,..., h k v k v0 ]. Dan Suciu Multi-Joins Lecture 2 March, 2015 31 / 34

The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. def v = (v 1,..., v k ) any fractional vertex cover; v 0 = i v i. Organize the p servers in a hypercube: [p] [p v 1 v k v0 ] [p v0 ]. Choose k independent hash functions h 1,..., h k Round 1 Each server sends each tuple R j (x j1, x j2,...) to all servers whose coordinates j 1, j 2,... are h j1 (x j1 ), h j2 (x j2 ),... and broadcasts along the missing dimensions. Then, each server computes Q on its local data. Dan Suciu Multi-Joins Lecture 2 March, 2015 31 / 34

The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. def v = (v 1,..., v k ) any fractional vertex cover; v 0 = i v i. Organize the p servers in a hypercube: [p] [p v 1 v k v0 ] [p v0 ]. Choose k independent hash functions h 1,..., h k Round 1 Each server sends each tuple R j (x j1, x j2,...) to all servers whose coordinates j 1, j 2,... are h j1 (x j1 ), h j2 (x j2 ),... and broadcasts along the missing dimensions. Then, each server computes Q on its local data. Theorem The load of the HyperCube algorithm is L = l m p 1/v 0 Proof. Fix a server. E[# tuples in R j ] = = O( m p 1/v 0 ). m p i R j v i /v 0 m p 1/v 0 since i R j v i 1. Dan Suciu Multi-Joins Lecture 2 March, 2015 31 / 34

Lower Bound for General Queries Definition R [n] r is called a matching of arity r if R = n and every column is a key. Example: matching of arity 3, n = 4 R x y z 1 3 2 2 1 4 3 4 3 4 2 1 Theorem Suppose all arities are 2. For any packing u, denote L lower = m p 1/ j u j. Then, for any algorithm A with load L < L lower, it returns E[ A ] (L/L lower ) u E[ Q ] answers, where the expectation is over random matchings Proof in the section today. (Q: what about arities 1?) Dan Suciu Multi-Joins Lecture 2 March, 2015 32 / 34

Lower Bound for General Queries Corollary HyperCube algorithm is optimal. (Because min v i v i = max u j u j = τ.) Dan Suciu Multi-Joins Lecture 2 March, 2015 33 / 34

Summary of Lecture 2 Parallel query engines today compute one join at a time. Issues: communication rounds, intermediate results, skew. The HyperCube algorithm: one round. Restrictions: Equal-cardinalities (will generalize in Lecture 3) Skew-free databases (skew is open, but will discuss in Lecture 4). HyperCube is more resilient to skew than a join. A surprising fact: Parallel algorithms: lower bound given by fractional edge packing. Sequential algorithms: lower given by fractional edge cover. Multiple rounds: all lower bounds [Beame 13] use weaker model. Open problem: an algorithm with load O(n/p) cannot compute Q = R 1 (x 0, x 1 ), R 2 (x 1, x 2 ), R 3 (x 2, x 3 ), R 4 (x 3, x 4 ), R 5 (x 4, x 5 ) in two rounds, over random permutations. Dan Suciu Multi-Joins Lecture 2 March, 2015 34 / 34