Multi-join Query Evaluation on Big Data Lecture 2

Size: px
Start display at page:

Download "Multi-join Query Evaluation on Big Data Lecture 2"

Transcription

1 Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, / 34

2 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday 14:15-15:45 Part 2 Lower bounds for Parallel Algorithms. Friday 14:15-15:45 Part 3 Optimal Parallel Algorithms. Saturday 9-10:30 Part 3 Data Skew. Saturday 11-12:30 Dan Suciu Multi-Joins Lecture 2 March, / 34

3 Brief Review of the AGM Bound Q(x) = R 1 (x 1 ),..., R l (x l ) Equal Cardinalities: R 1 =... R l = m Q m ρ, where ρ = fractional edge covering number of Q. General Case: R 1 = m 1,..., R l = m l Q min u m u 1 1 mu l l Dan Suciu Multi-Joins Lecture 2 March, / 34

4 Outline for Lecture 2 Background: Parallel Databases The MPC Model Algorithm for Triangles Lower Bound on the Communication Cost Lower Bound for General Queries Summary Dan Suciu Multi-Joins Lecture 2 March, / 34

5 Parallel Query Processing: Overview Parallel Databases: Queries and updates GAMMA machine (80 s), today in all commercial DBMS. Typically x10 s nodes. FO in AC 0 ; SQL embarrassingly parallel Parallel Data Analytics: Queries only this course MapReduce, Hadoop, PigLatin, Dremel, Scope, Spark. Typically x100 s or x1000 s nodes. Data reshuffling / communication is new bottleneck. Distributed State: Updates (transactions) will not discuss Replicate objects (3-5 times). Central problem: consistency. E.g. Google, Amazon, Yahoo, Microsoft, Ebay. Eventual consistency (NoSQL, Dynamo, BigTable, Peanuts). Strong consistency: Paxos, Spanner. Dan Suciu Multi-Joins Lecture 2 March, / 34

6 Parallel Query Processing: Overview Parallel Databases: Queries and updates GAMMA machine (80 s), today in all commercial DBMS. Typically x10 s nodes. FO in AC 0 ; SQL embarrassingly parallel Parallel Data Analytics: Queries only this course MapReduce, Hadoop, PigLatin, Dremel, Scope, Spark. Typically x100 s or x1000 s nodes. Data reshuffling / communication is new bottleneck. Distributed State: Updates (transactions) will not discuss Replicate objects (3-5 times). Central problem: consistency. E.g. Google, Amazon, Yahoo, Microsoft, Ebay. Eventual consistency (NoSQL, Dynamo, BigTable, Peanuts). Strong consistency: Paxos, Spanner. Dan Suciu Multi-Joins Lecture 2 March, / 34

7 Parallel Query Processing: Overview Parallel Databases: Queries and updates GAMMA machine (80 s), today in all commercial DBMS. Typically x10 s nodes. FO in AC 0 ; SQL embarrassingly parallel Parallel Data Analytics: Queries only this course MapReduce, Hadoop, PigLatin, Dremel, Scope, Spark. Typically x100 s or x1000 s nodes. Data reshuffling / communication is new bottleneck. Distributed State: Updates (transactions) will not discuss Replicate objects (3-5 times). Central problem: consistency. E.g. Google, Amazon, Yahoo, Microsoft, Ebay. Eventual consistency (NoSQL, Dynamo, BigTable, Peanuts). Strong consistency: Paxos, Spanner. Dan Suciu Multi-Joins Lecture 2 March, / 34

8 Key Concepts Input data of size m is partitioned on p servers, connected by a network. Query processing involves local computation and communication. Performance parameters [Gray&Dewitt 92] Speedup How does performance change when p increases? Ideal: linear speedup. Scaleup How does performance chance when p, m increase at same rate? Ideal: constant scaleup. Speed Speed # processors (=P) # processors (=P) AND data size Dan Suciu Multi-Joins Lecture 2 March, / 34

9 Data Partition Given a database of size m, partition it on p servers. Balanced partition Each server holds m/p data. Skewed partition Some server holds m/p data. Usually, the input data is already partitioned, but we need to re-partition for a particular problem. Data reshuffling. Dan Suciu Multi-Joins Lecture 2 March, / 34

10 Example 1: Hash-Partitioned Join Compute Q(x, y, z) = R(x, y) S(y, z), where R = m 1, S = m 2 Input data The input relations R, S are partitioned on the servers. Data reshuffling Given hash function h Domain [p]: Send every tuple R(x, y) to server h(y). Send every tuple S(y, z) to server h(y). Computation In parallel, each server i computes the join R i (x, y) S i (y, z) of its local fragments R i, S i. If y is a key, then the load/server is m 1 /p + m 2 /p (why?) Linear Speedup Otherwise, we may have skew (why?) What is the worst speedup? Dan Suciu Multi-Joins Lecture 2 March, / 34

11 Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, / 34

12 Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, / 34

13 Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, / 34

14 Understanding Skew Given: R(x, y) with m items, p servers. Send each tuple R(x, y) to server h(y), where h = random function. Claim 1 For each server u, its expected load is E[L u ] = m/p. Why? Claim 2 We say that R is skewed if some value y occurs more than m/p times in R. Then some server has L u > m/p. Claim 3 If R is not skewed, the max u L u is O(m/p) with high probability. (Details in lecture 4.) Take-away: we assume no skew, then: Max-load = O(Expected load) Dan Suciu Multi-Joins Lecture 2 March, / 34

15 Example 2: Broadcast Join Compute Q(x, y, z) = R(x, y) S(y, z), where R = m 1 S = m 2 Input data The input relations R, S are partitioned on the servers. Broadcast Send every tuple S(y, z) to every server Computation In parallel, each server u computes the join R u (x, y) S(y, z) of its local fragment R u with S. If m 2 m 1 /p then the Broadcast Join is very effective. Used a lot in practice. Dan Suciu Multi-Joins Lecture 2 March, / 34

16 Massively Parallel Communication Model (MPC) [Beame 13] The MPC model is the following: p servers are connected by a network. Servers are infinitely powerful. Input Data of size m is initially uniformly partitioned. Computation = several rounds. One round = local computation plus global communication. Dan Suciu Multi-Joins Lecture 2 March, / 34

17 Massively Parallel Communication Model (MPC) [Beame 13] The MPC model is the following: p servers are connected by a network. Servers are infinitely powerful. Input Data of size m is initially uniformly partitioned. Computation = several rounds. One round = local computation plus global communication. The only cost is the communication. Definition (The Load of a algorithm on the MPC Model) L u = maximum amount of data received by server u during any round L = max u L u Dan Suciu Multi-Joins Lecture 2 March, / 34

18 Massively Parallel Communication Model (MPC) Number of servers = p Input (size=m) O(m/p) O(m/p) Server Server p Input data = size m Algorithm = Several rounds L Server 1 L.... L Round 1 Server p L Round 2 One round = Compute & communicate Server 1 Max communication load per server = L L Server p L Round 3 Dan Suciu Multi-Joins Lecture 2 March, / 34

19 Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, / 34

20 Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, / 34

21 Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, / 34

22 Load/Rounds Tradeoff in the MPC Model Naive 1-Round Send entire data to server 1, compute locally. L = m Naive p-rounds At each round, send a m/p-fragment of the data to server 1, then compute locally. L = m/p. Ideal Algorithms 1-Round, load L = m/p (but rarely possible) Real Algorithms O(1) rounds, and L = O(m/p 1 ε ), for 0 ε < 1. Dan Suciu Multi-Joins Lecture 2 March, / 34

23 Examples on the MPC Model Example: Join Q(x, y, z) = R(x, y), S(y, z). Round 1 (Hash-partitioned join) Send tuple R(x, y) to server h(y), send S(y, z) to server h(y), compute the join locally. Load: O(m/p) (assuming no skew). Example: Triangles Q(x, y, z) = R(x, y), S(y, z), T (z, x) Round 1 hash-partitioned join: Aux(x, y, z) = R(x, y), S(y, z) Round 2 hash-partitioned join: Q(x, y, z) = Aux(x, y, z), T (z, x) Load: can be as high as m 2 /p because of the intermediate result!! Can we compute triangles with a smaller load? Dan Suciu Multi-Joins Lecture 2 March, / 34

24 Examples on the MPC Model Example: Join Q(x, y, z) = R(x, y), S(y, z). Round 1 (Hash-partitioned join) Send tuple R(x, y) to server h(y), send S(y, z) to server h(y), compute the join locally. Load: O(m/p) (assuming no skew). Example: Triangles Q(x, y, z) = R(x, y), S(y, z), T (z, x) Round 1 hash-partitioned join: Aux(x, y, z) = R(x, y), S(y, z) Round 2 hash-partitioned join: Q(x, y, z) = Aux(x, y, z), T (z, x) Load: can be as high as m 2 /p because of the intermediate result!! Can we compute triangles with a smaller load? Dan Suciu Multi-Joins Lecture 2 March, / 34

25 A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Dan Suciu Multi-Joins Lecture 2 March, / 34

26 A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Proof Construct an instance where Q = m ρ. One server: receives r L tuples from R j, for all j, hence finds (r L) ρ answers. All p servers find p(r L) ρ answers. It follows: r L m/p 1/ρ. Dan Suciu Multi-Joins Lecture 2 March, / 34

27 A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Proof Construct an instance where Q = m ρ. One server: receives r L tuples from R j, for all j, hence finds (r L) ρ answers. All p servers find p(r L) ρ answers. It follows: r L m/p 1/ρ. Question For Q = R(x, y), S(y, z) it should be r L m/p 1/2, but we computed a join with load L = m/p. Contradiction? Dan Suciu Multi-Joins Lecture 2 March, / 34

28 A Simple Lower Bound for the MPC Model Let Q be a query, with R 1 = = R l = m. Fact For any algorithm for Q with r rounds and load L, it holds: r L m/p 1/ρ. Proof Construct an instance where Q = m ρ. One server: receives r L tuples from R j, for all j, hence finds (r L) ρ answers. All p servers find p(r L) ρ answers. It follows: r L m/p 1/ρ. Question For Q = R(x, y), S(y, z) it should be r L m/p 1/2, but we computed a join with load L = m/p. Contradiction? Answer: the algorithm is for skew-free data, the bound for skewed data. Dan Suciu Multi-Joins Lecture 2 March, / 34

29 One-Round Algorithm for Triangles [Afrati&Ullman 10,Suri&Vassilvitski 11] Q(x, y, z) = R(x, y), S(y, z), T (z, x) Place the p servers in a cube: [p] [p 1/3 ] [p 1/3 ] [p 1/3 ]. Round 1 In parallel, each server does the following: Send R(x, y) to all servers (h 1 (x), h 2 (y), ) Send S(y, z) to all servers (, h 2 (y), h 3 (z)) Send T (z, x) to all servers (h 1 (x),, h 3 (z)) Then compute Q locally Theorem (Beame 14) (1) If h 1, h 2, h 3 are independent hash functions, then the expected load at some server u is E[L u ] = m 1+m 2 +m 3 def = L. [Why do we need independence?] p 2/3 (2) If the data has no skew, then maximum load is O(L) w.h.p. Dan Suciu Multi-Joins Lecture 2 March, / 34

30 One-Round Algorithm for Triangles [Afrati&Ullman 10,Suri&Vassilvitski 11] Q(x, y, z) = R(x, y), S(y, z), T (z, x) Place the p servers in a cube: [p] [p 1/3 ] [p 1/3 ] [p 1/3 ]. Round 1 In parallel, each server does the following: Send R(x, y) to all servers (h 1 (x), h 2 (y), ) Send S(y, z) to all servers (, h 2 (y), h 3 (z)) Send T (z, x) to all servers (h 1 (x),, h 3 (z)) Then compute Q locally Theorem (Beame 14) (1) If h 1, h 2, h 3 are independent hash functions, then the expected load at some server u is E[L u ] = m 1+m 2 +m 3 def = L. [Why do we need independence?] p 2/3 (2) If the data has no skew, then maximum load is O(L) w.h.p. Dan Suciu Multi-Joins Lecture 2 March, / 34

31 One-Round Algorithm for Triangles [Afrati&Ullman 10,Suri&Vassilvitski 11] Q(x, y, z) = R(x, y), S(y, z), T (z, x) Place the p servers in a cube: [p] [p 1/3 ] [p 1/3 ] [p 1/3 ]. Round 1 In parallel, each server does the following: Send R(x, y) to all servers (h 1 (x), h 2 (y), ) Send S(y, z) to all servers (, h 2 (y), h 3 (z)) Send T (z, x) to all servers (h 1 (x),, h 3 (z)) Then compute Q locally Theorem (Beame 14) (1) If h 1, h 2, h 3 are independent hash functions, then the expected load at some server u is E[L u ] = m 1+m 2 +m 3 def = L. [Why do we need independence?] p 2/3 (2) If the data has no skew, then maximum load is O(L) w.h.p. Dan Suciu Multi-Joins Lecture 2 March, / 34

32 Discussion We will call the algorithm HyperCube, following [Beame 13]. Each tuple R(x, y) is replicated p 1/3 times. Hence, load L = m/p 2/3 Partitioning R(x, y) (h 1 (x), h 2 (y), ) more tolerant to skew: Each value x is sent to p 1/3 buckets, each y is sent to p 1/3 buckets. Can tolerate degrees up to m/p 1/3 (better than m/p). Notice that the algorithm only shuffles the data: each server still has to compute the query locally. Important to use worst-case algorithm. Non-linear speedup, because the load is m/p 2/3. Can we we compute triangles with a load of m/p? Dan Suciu Multi-Joins Lecture 2 March, / 34

33 Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) Assume R + S + T = m. Denote L lower = m/p 2/3. Theorem Any 1-round algorithm that computes Q must have load L L lower, even on database instances without skew. Hence, HyperCube is optimal for computing triangles on permutations. Next, we will prove the theorem. Dan Suciu Multi-Joins Lecture 2 March, / 34

34 Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) We assume R, S, T are permutations over a domain of size n. Example n = 4: R x y S y z T z x = Q x y z Question: what is E[ Q ], over random permutations R, S, T? Theorem (Beame 13) Denote L lower = 3n/p 2/3. Let A be an algorithm for Q, with load L < L lower. Then, the expected number of triangles returned by A is E[ A ] ( L ) L lower 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, / 34

35 Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) We assume R, S, T are permutations over a domain of size n. Example n = 4: R x y S y z T z x = Q x y z Question: what is E[ Q ], over random permutations R, S, T? Theorem (Beame 13) Denote L lower = 3n/p 2/3. Let A be an algorithm for Q, with load L < L lower. Then, the expected number of triangles returned by A is E[ A ] ( L ) L lower 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, / 34

36 Lower Bound for Triangle Queries Q(x, y, z) = R(x, y), S(y, z), T (z, x) We assume R, S, T are permutations over a domain of size n. Example n = 4: R x y S y z T z x = Q x y z Question: what is E[ Q ], over random permutations R, S, T? Theorem (Beame 13) Denote L lower = 3n/p 2/3. Let A be an algorithm for Q, with load L < L lower. Then, the expected number of triangles returned by A is E[ A ] ( L ) L lower 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, / 34

37 Discussion: Inputs, Messages, Bits Initially, the relations R, S, T are on disjoint servers. This is w.l.o.g. Stronger: assume that R is on server 1, S on server 2, T on server 3. Any server u receives three messages: msg 1 about R msg 2 about S msg 3 about T msg 1 + msg 2 + msg 3 L bits. Messages may encode arbitrary information. E.g. bit 1 says R is even ; bit 2 says R has a 17-cycle ; etc. No lower bound is possible for a fixed input R, S, T : an algorithm may simply check for that input and encode using L = O(1) bits. Instead, the lower bound is a statement about all permutations. Dan Suciu Multi-Joins Lecture 2 March, / 34

38 Notation: Bits v.s. Tuples Size of db in #tuples m = R + S + T. Size of db in #bits M = R log n + S log n + T log n = m log n where n = size of the domain. If R is a random permutation, then size(r) = log(n!) n log n bits. L lower = m/p 2/3 tuples becomes L lower = 3 log(n!)/p 2/3 bits. Will prove the following, where L, L lower are expressed in bits: L E[ A ] ( L lower ) 3/2 E[ Q ] Dan Suciu Multi-Joins Lecture 2 March, / 34

39 Proof Part 1: E[ Q ] on Random Inputs Q(x, y, z) = R(x, y), S(y, z), T (z, x) Lemma E[ Q ] = 1, where the expectation is over random permutations R, S, T. Proof. Note: i, j, k [n], P((i, j) R) = P((j, k) S) = P((k, i) T ) = 1/n. E[ Q ] = P((i, j) R (j, k) S (k, i) T ) i,j,k independence = P((i, j) R) P((j, k) S) P((k, i) T ) = n 3 1 i,j,k n 1 n 1 n = 1 Dan Suciu Multi-Joins Lecture 2 March, / 34

40 Proof Part 1: E[ Q ] on Random Inputs Q(x, y, z) = R(x, y), S(y, z), T (z, x) Lemma E[ Q ] = 1, where the expectation is over random permutations R, S, T. Proof. Note: i, j, k [n], P((i, j) R) = P((j, k) S) = P((k, i) T ) = 1/n. E[ Q ] = P((i, j) R (j, k) S (k, i) T ) i,j,k independence = P((i, j) R) P((j, k) S) P((k, i) T ) = n 3 1 i,j,k n 1 n 1 n = 1 Dan Suciu Multi-Joins Lecture 2 March, / 34

41 Proof Part 1: E[ Q ] on Random Inputs Q(x, y, z) = R(x, y), S(y, z), T (z, x) Lemma E[ Q ] = 1, where the expectation is over random permutations R, S, T. Proof. Note: i, j, k [n], P((i, j) R) = P((j, k) S) = P((k, i) T ) = 1/n. E[ Q ] = P((i, j) R (j, k) S (k, i) T ) i,j,k independence = P((i, j) R) P((j, k) S) P((k, i) T ) = n 3 1 i,j,k n 1 n 1 n = 1 Dan Suciu Multi-Joins Lecture 2 March, / 34

42 Discussion In expectation, there is one triangle! E[ Q ] = 1 Will prove: if algorithm A has load L < L lower, then E[A] = (L/L lower ) 3/2. Dan Suciu Multi-Joins Lecture 2 March, / 34

43 Proof Part 2: What We Learn from a Message Fix a server u [p], and a message msg 1 it received about R. Definition The set of known tuples is: K 1 (msg 1 ) = {(i, j) R(msg 1 (R) = msg 1 (i, j) R)} Similarly K 2 (msg 2 ), K 3 (msg 3 ) known tuples in S and T. Observation Upon receiving msg 1, msg 2, msg 3, server u can output triangle (i, j, k) iff (i, j) K 1 (msg 1 ) (j, k) K 2 (msg 2 ) (k, i) K 2 (msg 3 ) Dan Suciu Multi-Joins Lecture 2 March, / 34

44 Proof Part 2: What We Learn from a Message Fix a server u [p], and a message msg 1 it received about R. Definition The set of known tuples is: K 1 (msg 1 ) = {(i, j) R(msg 1 (R) = msg 1 (i, j) R)} Similarly K 2 (msg 2 ), K 3 (msg 3 ) known tuples in S and T. Observation Upon receiving msg 1, msg 2, msg 3, server u can output triangle (i, j, k) iff (i, j) K 1 (msg 1 ) (j, k) K 2 (msg 2 ) (k, i) K 2 (msg 3 ) Dan Suciu Multi-Joins Lecture 2 March, / 34

45 Proof Part 2: What We Learn from a Message Fix a server u [p], and a message msg 1 it received about R. Definition The set of known tuples is: K 1 (msg 1 ) = {(i, j) R(msg 1 (R) = msg 1 (i, j) R)} Similarly K 2 (msg 2 ), K 3 (msg 3 ) known tuples in S and T. Observation Upon receiving msg 1, msg 2, msg 3, server u can output triangle (i, j, k) iff (i, j) K 1 (msg 1 ) (j, k) K 2 (msg 2 ) (k, i) K 2 (msg 3 ) Dan Suciu Multi-Joins Lecture 2 March, / 34

46 Discussion The useful information we learn from a message is the set of known tuples. We show next: if L 1 = size(msg 1 ) is small, then K 1 (msg 1 ) is small If you know only a few bits, then you know only a few tuples Later: if K 1, K 2, K 3 are small, the server knows only few triangles (AGM) Dan Suciu Multi-Joins Lecture 2 March, / 34

47 Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Dan Suciu Multi-Joins Lecture 2 March, / 34

48 Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). Dan Suciu Multi-Joins Lecture 2 March, / 34

49 Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n Dan Suciu Multi-Joins Lecture 2 March, / 34

50 Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) Dan Suciu Multi-Joins Lecture 2 March, / 34

51 Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) = H(msg 1 (R)) + m1 H(R m 1) P(m 1) chain rule Dan Suciu Multi-Joins Lecture 2 March, / 34

52 Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) = H(msg 1 (R)) + m1 H(R m 1) P(m 1) chain rule f 1 H(R) + m1 (1 K 1(m 1 ) n ) H(R) P(m 1) K = f 1 H(R) + [1 1 (m 1 ) P(m 1 ) m1 ] H(R) = f n 1 H(R) + [1 E[ K 1(m 1 ) ] ] H(R) n Dan Suciu Multi-Joins Lecture 2 March, / 34

53 Proof Part 3: With Few Bits You Know Only Few Tuples Denote H(R) = log(n!) the entropy of R. Proposition Let f 1 < 1. If msg 1 (R) has f 1 H(R) bits, then E[ K 1 (msg 1 (R)) ] f 1 n, where the expectation is over random permutations R. Proof. Denote k = K 1(m 1). H(R m 1) log[(n k)!] (1 k ) H(R) because log[(n k)!]/(n k) log(n!)/n. n H(R) = H(R, msg 1 (R)) R determines msg 1 (R) = H(msg 1 (R)) + m1 H(R m 1) P(m 1) chain rule f 1 H(R) + m1 (1 K 1(m 1 ) n ) H(R) P(m 1) K = f 1 H(R) + [1 1 (m 1 ) P(m 1 ) m1 ] H(R) = f n 1 H(R) + [1 E[ K 1(m 1 ) ] ] H(R) n It follows: E[ K 1(m 1) ] f 1n Dan Suciu Multi-Joins Lecture 2 March, / 34

54 Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, / 34

55 Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, / 34

56 Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, / 34

57 Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, / 34

58 Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, / 34

59 Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, / 34

60 Proof Part 4: Few Known Tuples form Few Triangles Continue to fix one server u. Its load L = L 1 + L 2 + L 3. Denote: a ij = Pr((i, j) K 1 (msg 1 (R))); similarly b jk, c ki for S, T. Denote A u the set of triangles returned by u: E[ A u ] = i,j,k a ij b jk c ki (( ij a 2 ij )( jk b 2 jk )( ki c 2 ki ))1/2 (Friedgut) a ij 1/n (why?) and ij a ij = E[ K 1 (msg 1 (R)) ] It follows: ij a 2 ij 1/n ij a ij L 1 log(n!) L 1 log(n!) n (why?) E[ A u ] ( L 1 log(n!) L 2 log(n!) L 3 log(n!) )1/2 ( L 1+L 2 +L 3 3 log(n!) )3/2 = ( L M )3/2 triangles. 3/2 All p servers return E[ A ] p ( L M )3/2 = ( L M ) = ( L L lower ) 3/2 triangles. p 2/3 QED Dan Suciu Multi-Joins Lecture 2 March, / 34

61 Discussion of the Lower Bound for the Triangle Query Algorithm A with load L < L lower = m/p 2/3, E[A] (L/L lower ) 3/2 E[ Q ] The result is for Skew-free Data There exists at least one instance on which A fails (trivial). Even if A is randomized, there exists at least one instance where A fails with high probability (Yao s lemma). Parallelism gets harder as p increases. An algorithm with load L = O( m 1 ), with ε < 1/3 reports only O( ) triangles. Fewer, as p 1 ε p 1/3 ε p increases! Dan Suciu Multi-Joins Lecture 2 March, / 34

62 Discussion of the Lower Bound for the Triangle Query Algorithm A with load L < L lower = m/p 2/3, E[A] (L/L lower ) 3/2 E[ Q ] The result is for Skew-free Data There exists at least one instance on which A fails (trivial). Even if A is randomized, there exists at least one instance where A fails with high probability (Yao s lemma). Parallelism gets harder as p increases. An algorithm with load L = O( m 1 ), with ε < 1/3 reports only O( ) triangles. Fewer, as p 1 ε p 1/3 ε p increases! Dan Suciu Multi-Joins Lecture 2 March, / 34

63 Generalization to Full Conjunctive Queries Will discuss the equal-cardinality case today. Will discuss the general case in Lecture 3. Dan Suciu Multi-Joins Lecture 2 March, / 34

64 The Fractional Vertex Cover / Edge Packing Hypergraph: Q = R 1 (x 1 ),..., R l (x l ) Nodes: x 1,..., x k, edges: R 1,..., R l. Definition A fractional vertex cover of Q is a sequence v 1 0,..., v k 0 such that: j i x i R j v i 1 A fractional edge packing of Q is a sequence u 1 0,..., u l 0 such that: i j x i R j u j 1 By duality: min v i v i = max u j u j = τ Dan Suciu Multi-Joins Lecture 2 March, / 34

65 The Fractional Vertex Cover / Edge Packing Hypergraph: Q = R 1 (x 1 ),..., R l (x l ) Nodes: x 1,..., x k, edges: R 1,..., R l. Definition A fractional vertex cover of Q is a sequence v 1 0,..., v k 0 such that: j i x i R j v i 1 A fractional edge packing of Q is a sequence u 1 0,..., u l 0 such that: i j x i R j u j 1 By duality: min v i v i = max u j u j = τ Dan Suciu Multi-Joins Lecture 2 March, / 34

66 The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. Dan Suciu Multi-Joins Lecture 2 March, / 34

67 The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. def v = (v 1,..., v k ) any fractional vertex cover; v 0 = i v i. Organize the p servers in a hypercube: [p] [p v 1 v0 ] [p Choose k independent hash functions h 1,..., h k v k v0 ]. Dan Suciu Multi-Joins Lecture 2 March, / 34

68 The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. def v = (v 1,..., v k ) any fractional vertex cover; v 0 = i v i. Organize the p servers in a hypercube: [p] [p v 1 v k v0 ] [p v0 ]. Choose k independent hash functions h 1,..., h k Round 1 Each server sends each tuple R j (x j1, x j2,...) to all servers whose coordinates j 1, j 2,... are h j1 (x j1 ), h j2 (x j2 ),... and broadcasts along the missing dimensions. Then, each server computes Q on its local data. Dan Suciu Multi-Joins Lecture 2 March, / 34

69 The HyperCube Algorithm for General Queries Q(x) = R 1 (x 1 ),..., R l (x l ), R 1 =... = R l = m p servers. def v = (v 1,..., v k ) any fractional vertex cover; v 0 = i v i. Organize the p servers in a hypercube: [p] [p v 1 v k v0 ] [p v0 ]. Choose k independent hash functions h 1,..., h k Round 1 Each server sends each tuple R j (x j1, x j2,...) to all servers whose coordinates j 1, j 2,... are h j1 (x j1 ), h j2 (x j2 ),... and broadcasts along the missing dimensions. Then, each server computes Q on its local data. Theorem The load of the HyperCube algorithm is L = l m p 1/v 0 Proof. Fix a server. E[# tuples in R j ] = = O( m p 1/v 0 ). m p i R j v i /v 0 m p 1/v 0 since i R j v i 1. Dan Suciu Multi-Joins Lecture 2 March, / 34

70 Lower Bound for General Queries Definition R [n] r is called a matching of arity r if R = n and every column is a key. Example: matching of arity 3, n = 4 R x y z Theorem Suppose all arities are 2. For any packing u, denote L lower = m p 1/ j u j. Then, for any algorithm A with load L < L lower, it returns E[ A ] (L/L lower ) u E[ Q ] answers, where the expectation is over random matchings Proof in the section today. (Q: what about arities 1?) Dan Suciu Multi-Joins Lecture 2 March, / 34

71 Lower Bound for General Queries Corollary HyperCube algorithm is optimal. (Because min v i v i = max u j u j = τ.) Dan Suciu Multi-Joins Lecture 2 March, / 34

72 Summary of Lecture 2 Parallel query engines today compute one join at a time. Issues: communication rounds, intermediate results, skew. The HyperCube algorithm: one round. Restrictions: Equal-cardinalities (will generalize in Lecture 3) Skew-free databases (skew is open, but will discuss in Lecture 4). HyperCube is more resilient to skew than a join. A surprising fact: Parallel algorithms: lower bound given by fractional edge packing. Sequential algorithms: lower given by fractional edge cover. Multiple rounds: all lower bounds [Beame 13] use weaker model. Open problem: an algorithm with load O(n/p) cannot compute Q = R 1 (x 0, x 1 ), R 2 (x 1, x 2 ), R 3 (x 2, x 3 ), R 4 (x 3, x 4 ), R 5 (x 4, x 5 ) in two rounds, over random permutations. Dan Suciu Multi-Joins Lecture 2 March, / 34

Query Processing with Optimal Communication Cost

Query Processing with Optimal Communication Cost Query Processing with Optimal Communication Cost Magdalena Balazinska and Dan Suciu University of Washington AITF 2017 1 Context Past: NSF Big Data grant PhD student Paris Koutris received the ACM SIGMOD

More information

A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries

A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries Bas Ketsman Dan Suciu Abstract We study the optimal communication cost for computing a full conjunctive query

More information

Communication Steps for Parallel Query Processing

Communication Steps for Parallel Query Processing Communication Steps for Parallel Query Processing Paul Beame, Paraschos Koutris and Dan Suciu University of Washington, Seattle, WA {beame,pkoutris,suciu}@cs.washington.edu ABSTRACT We consider the problem

More information

Parallel-Correctness and Transferability for Conjunctive Queries

Parallel-Correctness and Transferability for Conjunctive Queries Parallel-Correctness and Transferability for Conjunctive Queries Tom J. Ameloot1 Gaetano Geck2 Bas Ketsman1 Frank Neven1 Thomas Schwentick2 1 Hasselt University 2 Dortmund University Big Data Too large

More information

Single-Round Multi-Join Evaluation. Bas Ketsman

Single-Round Multi-Join Evaluation. Bas Ketsman Single-Round Multi-Join Evaluation Bas Ketsman Outline 1. Introduction 2. Parallel-Correctness 3. Transferability 4. Special Cases 2 Motivation Single-round Multi-joins Less rounds / barriers Formal framework

More information

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination

Early stopping: the idea. TRB for benign failures. Early Stopping: The Protocol. Termination TRB for benign failures Early stopping: the idea Sender in round : :! send m to all Process p in round! k, # k # f+!! :! if delivered m in round k- and p " sender then 2:!! send m to all 3:!! halt 4:!

More information

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and 7 RC Simulates RA. We now show that DRC (and hence TRC) is at least as expressive as RA. That is, given an RA expression E that mentions at most C, there is an equivalent DRC expression E that mentions

More information

c Copyright 2015 Paraschos Koutris

c Copyright 2015 Paraschos Koutris c Copyright 2015 Paraschos Koutris Query Processing for Massively Parallel Systems Paraschos Koutris A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

Lecture 26 December 1, 2015

Lecture 26 December 1, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 26 December 1, 2015 Scribe: Zezhou Liu 1 Overview Final project presentations on Thursday. Jelani has sent out emails about the logistics:

More information

Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.

Lecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions. CSE533: Information Theory in Computer Science September 8, 010 Lecturer: Anup Rao Lecture 6 Scribe: Lukas Svec 1 A lower bound for perfect hash functions Today we shall use graph entropy to improve the

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

How to Optimally Allocate Resources for Coded Distributed Computing?

How to Optimally Allocate Resources for Coded Distributed Computing? 1 How to Optimally Allocate Resources for Coded Distributed Computing? Qian Yu, Songze Li, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr Department of Electrical Engineering, University of Southern

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information

Notes on MapReduce Algorithms

Notes on MapReduce Algorithms Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our

More information

COMS E F15. Lecture 22: Linearity Testing Sparse Fourier Transform

COMS E F15. Lecture 22: Linearity Testing Sparse Fourier Transform COMS E6998-9 F15 Lecture 22: Linearity Testing Sparse Fourier Transform 1 Administrivia, Plan Thu: no class. Happy Thanksgiving! Tue, Dec 1 st : Sergei Vassilvitskii (Google Research) on MapReduce model

More information

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2018 5. Complexity of Query Evaluation Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 17 April, 2018 Pichler

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

Fault-Tolerant Consensus

Fault-Tolerant Consensus Fault-Tolerant Consensus CS556 - Panagiota Fatourou 1 Assumptions Consensus Denote by f the maximum number of processes that may fail. We call the system f-resilient Description of the Problem Each process

More information

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam Limits to Approximability: When Algorithms Won't Help You Note: Contents of today s lecture won t be on the exam Outline Limits to Approximability: basic results Detour: Provers, verifiers, and NP Graph

More information

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 3: Query Processing Query Processing Decomposition Localization Optimization CS 347 Notes 3 2 Decomposition Same as in centralized system

More information

variance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.

variance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev. Announcements No class monday. Metric embedding seminar. Review expectation notion of high probability. Markov. Today: Book 4.1, 3.3, 4.2 Chebyshev. Remind variance, standard deviation. σ 2 = E[(X µ X

More information

Causality and Time. The Happens-Before Relation

Causality and Time. The Happens-Before Relation Causality and Time The Happens-Before Relation Because executions are sequences of events, they induce a total order on all the events It is possible that two events by different processors do not influence

More information

Lower Bound Techniques for Multiparty Communication Complexity

Lower Bound Techniques for Multiparty Communication Complexity Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand

More information

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane

Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane Randomized Algorithms Lecture 2: A Las Vegas Algorithm for finding the closest pair of points in the plane Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized

More information

Complexity theory for fellow CS students

Complexity theory for fellow CS students This document contains some basics of the complexity theory. It is mostly based on the lecture course delivered at CS dept. by Meran G. Furugyan. Differences are subtle. Disclaimer of warranties apply:

More information

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Viveck R. Cadambe EE Department, Pennsylvania State University, University Park, PA, USA viveck@engr.psu.edu Nancy Lynch

More information

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms

CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms Tim Roughgarden March 3, 2016 1 Preamble In CS109 and CS161, you learned some tricks of

More information

Stream Computation and Arthur- Merlin Communication

Stream Computation and Arthur- Merlin Communication Stream Computation and Arthur- Merlin Communication Justin Thaler, Yahoo! Labs Joint Work with: Amit Chakrabarti, Dartmouth Graham Cormode, University of Warwick Andrew McGregor, Umass Amherst Suresh Venkatasubramanian,

More information

Second main application of Chernoff: analysis of load balancing. Already saw balls in bins example. oblivious algorithms only consider self packet.

Second main application of Chernoff: analysis of load balancing. Already saw balls in bins example. oblivious algorithms only consider self packet. Routing Second main application of Chernoff: analysis of load balancing. Already saw balls in bins example synchronous message passing bidirectional links, one message per step queues on links permutation

More information

Randomized Algorithms. Lecture 4. Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010

Randomized Algorithms. Lecture 4. Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010 Randomized Algorithms Lecture 4 Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010 1 Pairwise independent hash functions In the previous lecture we encountered two families

More information

A Lower Bound for Boolean Satisfiability on Turing Machines

A Lower Bound for Boolean Satisfiability on Turing Machines A Lower Bound for Boolean Satisfiability on Turing Machines arxiv:1406.5970v1 [cs.cc] 23 Jun 2014 Samuel C. Hsieh Computer Science Department, Ball State University March 16, 2018 Abstract We establish

More information

Probabilistic Proofs of Existence of Rare Events. Noga Alon

Probabilistic Proofs of Existence of Rare Events. Noga Alon Probabilistic Proofs of Existence of Rare Events Noga Alon Department of Mathematics Sackler Faculty of Exact Sciences Tel Aviv University Ramat-Aviv, Tel Aviv 69978 ISRAEL 1. The Local Lemma In a typical

More information

Stochastic Submodular Cover with Limited Adaptivity

Stochastic Submodular Cover with Limited Adaptivity Stochastic Submodular Cover with Limited Adaptivity Arpit Agarwal Sepehr Assadi Sanjeev Khanna Abstract In the submodular cover problem, we are given a non-negative monotone submodular function f over

More information

arxiv: v7 [cs.db] 22 Dec 2015

arxiv: v7 [cs.db] 22 Dec 2015 It s all a matter of degree: Using degree information to optimize multiway joins Manas R. Joglekar 1 and Christopher M. Ré 2 1 Department of Computer Science, Stanford University 450 Serra Mall, Stanford,

More information

Lecture 13: 04/23/2014

Lecture 13: 04/23/2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 13: 04/23/2014 Spring 2014 Scribe: Psallidas Fotios Administrative: Submit HW problem solutions by Wednesday,

More information

Queries and Materialized Views on Probabilistic Databases

Queries and Materialized Views on Probabilistic Databases Queries and Materialized Views on Probabilistic Databases Nilesh Dalvi Christopher Ré Dan Suciu September 11, 2008 Abstract We review in this paper some recent yet fundamental results on evaluating queries

More information

NP-Complete Reductions 2

NP-Complete Reductions 2 x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 12 22 32 CS 447 11 13 21 23 31 33 Algorithms NP-Complete Reductions 2 Prof. Gregory Provan Department of Computer Science University College Cork 1 Lecture Outline NP-Complete

More information

Database Theory VU , SS Ehrenfeucht-Fraïssé Games. Reinhard Pichler

Database Theory VU , SS Ehrenfeucht-Fraïssé Games. Reinhard Pichler Database Theory Database Theory VU 181.140, SS 2018 7. Ehrenfeucht-Fraïssé Games Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Pichler 15

More information

Section 6 Fault-Tolerant Consensus

Section 6 Fault-Tolerant Consensus Section 6 Fault-Tolerant Consensus CS586 - Panagiota Fatourou 1 Description of the Problem Consensus Each process starts with an individual input from a particular value set V. Processes may fail by crashing.

More information

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting

More information

Symmetry and hardness of submodular maximization problems

Symmetry and hardness of submodular maximization problems Symmetry and hardness of submodular maximization problems Jan Vondrák 1 1 Department of Mathematics Princeton University Jan Vondrák (Princeton University) Symmetry and hardness 1 / 25 Submodularity Definition

More information

Finding similar items

Finding similar items Finding similar items CSE 344, section 10 June 2, 2011 In this section, we ll go through some examples of finding similar item sets. We ll directly compare all pairs of sets being considered using the

More information

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ). CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,...,

More information

arxiv: v1 [cs.ai] 21 Sep 2014

arxiv: v1 [cs.ai] 21 Sep 2014 Oblivious Bounds on the Probability of Boolean Functions WOLFGANG GATTERBAUER, Carnegie Mellon University DAN SUCIU, University of Washington arxiv:409.6052v [cs.ai] 2 Sep 204 This paper develops upper

More information

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved. Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should

More information

Ma/CS 117c Handout # 5 P vs. NP

Ma/CS 117c Handout # 5 P vs. NP Ma/CS 117c Handout # 5 P vs. NP We consider the possible relationships among the classes P, NP, and co-np. First we consider properties of the class of NP-complete problems, as opposed to those which are

More information

Communication Complexity

Communication Complexity Communication Complexity Jaikumar Radhakrishnan School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai 31 May 2012 J 31 May 2012 1 / 13 Plan 1 Examples, the model, the

More information

Lecture 4 Chiu Yuen Koo Nikolai Yakovenko. 1 Summary. 2 Hybrid Encryption. CMSC 858K Advanced Topics in Cryptography February 5, 2004

Lecture 4 Chiu Yuen Koo Nikolai Yakovenko. 1 Summary. 2 Hybrid Encryption. CMSC 858K Advanced Topics in Cryptography February 5, 2004 CMSC 858K Advanced Topics in Cryptography February 5, 2004 Lecturer: Jonathan Katz Lecture 4 Scribe(s): Chiu Yuen Koo Nikolai Yakovenko Jeffrey Blank 1 Summary The focus of this lecture is efficient public-key

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 6: Computational Complexity of Learning Proper vs Improper Learning Learning Using FIND-CONS For any family of hypothesis

More information

On Locating-Dominating Codes in Binary Hamming Spaces

On Locating-Dominating Codes in Binary Hamming Spaces Discrete Mathematics and Theoretical Computer Science 6, 2004, 265 282 On Locating-Dominating Codes in Binary Hamming Spaces Iiro Honkala and Tero Laihonen and Sanna Ranto Department of Mathematics and

More information

6.842 Randomness and Computation Lecture 5

6.842 Randomness and Computation Lecture 5 6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its

More information

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 4: Proof of Shannon s theorem and an explicit code CSE 533: Error-Correcting Codes (Autumn 006 Lecture 4: Proof of Shannon s theorem and an explicit code October 11, 006 Lecturer: Venkatesan Guruswami Scribe: Atri Rudra 1 Overview Last lecture we stated

More information

Lecture 21: P vs BPP 2

Lecture 21: P vs BPP 2 Advanced Complexity Theory Spring 206 Prof. Dana Moshkovitz Lecture 2: P vs BPP 2 Overview In the previous lecture, we began our discussion of pseudorandomness. We presented the Blum- Micali definition

More information

CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing

CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing CS 347 Distributed Databases and Transaction Processing Notes03: Query Processing Hector Garcia-Molina Zoltan Gyongyi CS 347 Notes 03 1 Query Processing! Decomposition! Localization! Optimization CS 347

More information

10.4 The Kruskal Katona theorem

10.4 The Kruskal Katona theorem 104 The Krusal Katona theorem 141 Example 1013 (Maximum weight traveling salesman problem We are given a complete directed graph with non-negative weights on edges, and we must find a maximum weight Hamiltonian

More information

CSE 20 DISCRETE MATH WINTER

CSE 20 DISCRETE MATH WINTER CSE 20 DISCRETE MATH WINTER 2016 http://cseweb.ucsd.edu/classes/wi16/cse20-ab/ Today's learning goals Define and differentiate between important sets Use correct notation when describing sets: {...}, intervals

More information

From Secure MPC to Efficient Zero-Knowledge

From Secure MPC to Efficient Zero-Knowledge From Secure MPC to Efficient Zero-Knowledge David Wu March, 2017 The Complexity Class NP NP the class of problems that are efficiently verifiable a language L is in NP if there exists a polynomial-time

More information

Lecture 38. CSE 331 Dec 4, 2017

Lecture 38. CSE 331 Dec 4, 2017 Lecture 38 CSE 331 Dec 4, 2017 Quiz 2 1:00-1:10pm Lecture starts at 1:15pm Final Reminder Review sessions/extra OH Shortest Path Problem Input: (Directed) Graph G=(V,E) and for every edge e has a cost

More information

Algorithms for Data Science

Algorithms for Data Science Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based

More information

Flow Algorithms for Two Pipelined Filtering Problems

Flow Algorithms for Two Pipelined Filtering Problems Flow Algorithms for Two Pipelined Filtering Problems Anne Condon, University of British Columbia Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Ning Wu, Polytechnic

More information

Lecture 12: Interactive Proofs

Lecture 12: Interactive Proofs princeton university cos 522: computational complexity Lecture 12: Interactive Proofs Lecturer: Sanjeev Arora Scribe:Carl Kingsford Recall the certificate definition of NP. We can think of this characterization

More information

Application: Bucket Sort

Application: Bucket Sort 5.2.2. Application: Bucket Sort Bucket sort breaks the log) lower bound for standard comparison-based sorting, under certain assumptions on the input We want to sort a set of =2 integers chosen I+U@R from

More information

6.852: Distributed Algorithms Fall, Class 10

6.852: Distributed Algorithms Fall, Class 10 6.852: Distributed Algorithms Fall, 2009 Class 10 Today s plan Simulating synchronous algorithms in asynchronous networks Synchronizers Lower bound for global synchronization Reading: Chapter 16 Next:

More information

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking

Real-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking Real-Time Course Clock synchronization 1 Clocks Processor p has monotonically increasing clock function C p (t) Clock has drift rate For t1 and t2, with t2 > t1 (1-ρ)(t2-t1)

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges

More information

Privacy-Preserving Data Mining

Privacy-Preserving Data Mining CS 380S Privacy-Preserving Data Mining Vitaly Shmatikov slide 1 Reading Assignment Evfimievski, Gehrke, Srikant. Limiting Privacy Breaches in Privacy-Preserving Data Mining (PODS 2003). Blum, Dwork, McSherry,

More information

Polynomial Identity Testing and Circuit Lower Bounds

Polynomial Identity Testing and Circuit Lower Bounds Polynomial Identity Testing and Circuit Lower Bounds Robert Špalek, CWI based on papers by Nisan & Wigderson, 1994 Kabanets & Impagliazzo, 2003 1 Randomised algorithms For some problems (polynomial identity

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Methods that transform the original network into a tighter and tighter representations

Methods that transform the original network into a tighter and tighter representations hapter pproximation of inference: rc, path and i-consistecy Methods that transform the original network into a tighter and tighter representations all 00 IS 75 - onstraint Networks rc-consistency X, Y,

More information

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.

More information

Distributed Oblivious RAM for Secure Two-Party Computation

Distributed Oblivious RAM for Secure Two-Party Computation Seminar in Distributed Computing Distributed Oblivious RAM for Secure Two-Party Computation Steve Lu & Rafail Ostrovsky Philipp Gamper Philipp Gamper 2017-04-25 1 Yao s millionaires problem Two millionaires

More information

PCP Theorem and Hardness of Approximation

PCP Theorem and Hardness of Approximation PCP Theorem and Hardness of Approximation An Introduction Lee Carraher and Ryan McGovern Department of Computer Science University of Cincinnati October 27, 2003 Introduction Assuming NP P, there are many

More information

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of

More information

Lecture 2: January 18

Lecture 2: January 18 CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 2: January 18 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They

More information

CS632 Notes on Relational Query Languages I

CS632 Notes on Relational Query Languages I CS632 Notes on Relational Query Languages I A. Demers 6 Feb 2003 1 Introduction Here we define relations, and introduce our notational conventions, which are taken almost directly from [AD93]. We begin

More information

Theory of Computer Science to Msc Students, Spring Lecture 2

Theory of Computer Science to Msc Students, Spring Lecture 2 Theory of Computer Science to Msc Students, Spring 2007 Lecture 2 Lecturer: Dorit Aharonov Scribe: Bar Shalem and Amitai Gilad Revised: Shahar Dobzinski, March 2007 1 BPP and NP The theory of computer

More information

Notes on Complexity Theory Last updated: November, Lecture 10

Notes on Complexity Theory Last updated: November, Lecture 10 Notes on Complexity Theory Last updated: November, 2015 Lecture 10 Notes by Jonathan Katz, lightly edited by Dov Gordon. 1 Randomized Time Complexity 1.1 How Large is BPP? We know that P ZPP = RP corp

More information

INSTITUTE OF MATHEMATICS THE CZECH ACADEMY OF SCIENCES. A trade-off between length and width in resolution. Neil Thapen

INSTITUTE OF MATHEMATICS THE CZECH ACADEMY OF SCIENCES. A trade-off between length and width in resolution. Neil Thapen INSTITUTE OF MATHEMATICS THE CZECH ACADEMY OF SCIENCES A trade-off between length and width in resolution Neil Thapen Preprint No. 59-2015 PRAHA 2015 A trade-off between length and width in resolution

More information

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms

Algorithms for Collective Communication. Design and Analysis of Parallel Algorithms Algorithms for Collective Communication Design and Analysis of Parallel Algorithms Source A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing, Chapter 4, 2003. Outline One-to-all

More information

The Complexity of a Reliable Distributed System

The Complexity of a Reliable Distributed System The Complexity of a Reliable Distributed System Rachid Guerraoui EPFL Alexandre Maurer EPFL Abstract Studying the complexity of distributed algorithms typically boils down to evaluating how the number

More information

Theoretical Cryptography, Lecture 10

Theoretical Cryptography, Lecture 10 Theoretical Cryptography, Lecture 0 Instructor: Manuel Blum Scribe: Ryan Williams Feb 20, 2006 Introduction Today we will look at: The String Equality problem, revisited What does a random permutation

More information

Math 262A Lecture Notes - Nechiporuk s Theorem

Math 262A Lecture Notes - Nechiporuk s Theorem Math 6A Lecture Notes - Nechiporuk s Theore Lecturer: Sa Buss Scribe: Stefan Schneider October, 013 Nechiporuk [1] gives a ethod to derive lower bounds on forula size over the full binary basis B The lower

More information

CS505: Distributed Systems

CS505: Distributed Systems Department of Computer Science CS505: Distributed Systems Lecture 10: Consensus Outline Consensus impossibility result Consensus with S Consensus with Ω Consensus Most famous problem in distributed computing

More information

Lecture 16: Communication Complexity

Lecture 16: Communication Complexity CSE 531: Computational Complexity I Winter 2016 Lecture 16: Communication Complexity Mar 2, 2016 Lecturer: Paul Beame Scribe: Paul Beame 1 Communication Complexity In (two-party) communication complexity

More information

Parallel Pipelined Filter Ordering with Precedence Constraints

Parallel Pipelined Filter Ordering with Precedence Constraints Parallel Pipelined Filter Ordering with Precedence Constraints Amol Deshpande #, Lisa Hellerstein 2 # University of Maryland, College Park, MD USA amol@cs.umd.edu Polytechnic Institute of NYU, Brooklyn,

More information

Algorithms and Theory of Computation. Lecture 11: Network Flow

Algorithms and Theory of Computation. Lecture 11: Network Flow Algorithms and Theory of Computation Lecture 11: Network Flow Xiaohui Bei MAS 714 September 18, 2018 Nanyang Technological University MAS 714 September 18, 2018 1 / 26 Flow Network A flow network is a

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at

More information

New Attacks on the Concatenation and XOR Hash Combiners

New Attacks on the Concatenation and XOR Hash Combiners New Attacks on the Concatenation and XOR Hash Combiners Itai Dinur Department of Computer Science, Ben-Gurion University, Israel Abstract. We study the security of the concatenation combiner H 1(M) H 2(M)

More information

Valency Arguments CHAPTER7

Valency Arguments CHAPTER7 CHAPTER7 Valency Arguments In a valency argument, configurations are classified as either univalent or multivalent. Starting from a univalent configuration, all terminating executions (from some class)

More information

V.4 MapReduce. 1. System Architecture 2. Programming Model 3. Hadoop. Based on MRS Chapter 4 and RU Chapter 2 IR&DM 13/ 14 !74

V.4 MapReduce. 1. System Architecture 2. Programming Model 3. Hadoop. Based on MRS Chapter 4 and RU Chapter 2 IR&DM 13/ 14 !74 V.4 MapReduce. System Architecture 2. Programming Model 3. Hadoop Based on MRS Chapter 4 and RU Chapter 2!74 Why MapReduce? Large clusters of commodity computers (as opposed to few supercomputers) Challenges:

More information

Lecture 4: DES and block ciphers

Lecture 4: DES and block ciphers Lecture 4: DES and block ciphers Johan Håstad, transcribed by Ernir Erlingsson 2006-01-25 1 DES DES is a 64 bit block cipher with a 56 bit key. It selects a 64 bit block and modifies it depending on the

More information

Model Counting for Logical Theories

Model Counting for Logical Theories Model Counting for Logical Theories Wednesday Dmitry Chistikov Rayna Dimitrova Department of Computer Science University of Oxford, UK Max Planck Institute for Software Systems (MPI-SWS) Kaiserslautern

More information

NP-COMPLETE PROBLEMS. 1. Characterizing NP. Proof

NP-COMPLETE PROBLEMS. 1. Characterizing NP. Proof T-79.5103 / Autumn 2006 NP-complete problems 1 NP-COMPLETE PROBLEMS Characterizing NP Variants of satisfiability Graph-theoretic problems Coloring problems Sets and numbers Pseudopolynomial algorithms

More information

CSE 20 DISCRETE MATH SPRING

CSE 20 DISCRETE MATH SPRING CSE 20 DISCRETE MATH SPRING 2016 http://cseweb.ucsd.edu/classes/sp16/cse20-ac/ Today's learning goals Describe computer representation of sets with bitstrings Define and compute the cardinality of finite

More information

Notes for Lecture 27

Notes for Lecture 27 U.C. Berkeley CS276: Cryptography Handout N27 Luca Trevisan April 30, 2009 Notes for Lecture 27 Scribed by Madhur Tulsiani, posted May 16, 2009 Summary In this lecture we begin the construction and analysis

More information

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms Anne Benoit, Fanny Dufossé and Yves Robert LIP, École Normale Supérieure de Lyon, France {Anne.Benoit Fanny.Dufosse

More information

Friendly Logics, Fall 2015, Lecture Notes 5

Friendly Logics, Fall 2015, Lecture Notes 5 Friendly Logics, Fall 2015, Lecture Notes 5 Val Tannen 1 FO definability In these lecture notes we restrict attention to relational vocabularies i.e., vocabularies consisting only of relation symbols (or

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Map-Reduce Denis Helic KTI, TU Graz Oct 24, 2013 Denis Helic (KTI, TU Graz) KDDM1 Oct 24, 2013 1 / 82 Big picture: KDDM Probability Theory Linear Algebra

More information