B669 Sublinear Algorithms for Big Data

Size: px
Start display at page:

Download "B669 Sublinear Algorithms for Big Data"

Transcription

1 B669 Sublinear Algorithms for Big Data Qin Zhang 1-1

2 2-1 Part 1: Sublinear in Space

3 The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store everything. Applications: Internet router, stock data, ad auction, flight logs on tape, etc. (next 4 slides, in courtesy of Jeff Phillps) 3-1

4 Network routers Packets limited space Router Internet Router data per day: at least 1 Terabyte packet takes 8 nanoseconds to pass through router few million packets per second What statistics can we keep on data? For example, want to detect anomalies for security. 4-1

5 Telephone Switch txt, msg limited space Switch Cell phones connect through switches each message 1000 Bytes 500 million calls / day 1 Terabyte per month second Search for characteristics for dropped calls? 5-1

6 Ad Auction page view ad click keyword search limited space Server ad served delivery model Serving Ads on web Google, Yahoo!, Microsoft Yahoo.com viewed 77 trillion times 2 million / hour Each page serves ads; which ones? How to update ad delivery model? 6-1

7 Flight Logs on Tape CPU statistics All airplane logs over Washington, DC About flights per day. 50 years, total about 9 million flights Each flight has trajectory, passenger count, control dialog. Stored on Tape. Can only make 1 (or O(1)) pass! What statistics can be gathered? 7-1

8 1.1 Distinct Elements RAM How many distinct elements? Approximation needed. CPU 8-1

9 The FM sketch Denote the stream by A = a 1, a 2,..., a m, where m is the length of the stream, which is unknown at the beginning. Let n be the item universe. Let f i be the frequency of item i in the steam. a i = (i, ) denote f i f i +. This algorithm only works for = 1 (insertion only). 9-1

10 The FM sketch Denote the stream by A = a 1, a 2,..., a m, where m is the length of the stream, which is unknown at the beginning. Let n be the item universe. Let f i be the frequency of item i in the steam. a i = (i, ) denote f i f i +. This algorithm only works for = 1 (insertion only). The algorithm (Flajoet and Martin 83) 1. Choose a random hash function h : [n] [n] from a 2-universal family. Set z = 0. Let zeros(h(e)) be the # tailing zeros of the binary representation of h(e). 2. For each new coming item e, if zeros(h(e)) > z, then set z = zeros(h(e)); 3. Output 2 z

11 The FM sketch Denote the stream by A = a 1, a 2,..., a m, where m is the length of the stream, which is unknown at the beginning. Let n be the item universe. Let f i be the frequency of item i in the steam. a i = (i, ) denote f i f i +. This algorithm only works for = 1 (insertion only). The algorithm (Flajoet and Martin 83) 1. Choose a random hash function h : [n] [n] from a 2-universal family. Set z = 0. Let zeros(h(e)) be the # tailing zeros of the binary representation of h(e). 2. For each new coming item e, if zeros(h(e)) > z, then set z = zeros(h(e)); 3. Output 2 z+0.5. Analysis (on board) 9-3

12 The FM sketch 9-4 Denote the stream by A = a 1, a 2,..., a m, where m is the length of the stream, which is unknown at the beginning. Let n be the item universe. Let f i be the frequency of item i in the steam. a i = (i, ) denote f i f i +. This algorithm only works for = 1 (insertion only). The algorithm (Flajoet and Martin 83) 1. Choose a random hash function h : [n] [n] from a 2-universal family. Set z = 0. Let zeros(h(e)) be the # tailing zeros of the binary representation of h(e). 2. For each new coming item e, if zeros(h(e)) > z, then set z = zeros(h(e)); 3. Output 2 z+0.5. Analysis (on board) Theorem The number of distinct elements can be O(1)-approximated with probability 2/3 using O(log n) bits.

13 Probability amplification Can we boost the success probability to 1 δ? The idea is to run k = Θ(log(1/δ)) copies of this algorithm in parallel, using mutually independent random hash functions, and output the median of the k answers. 10-1

14 An improved algorithm Idea: two-level hashing. The algorithm (Bar-Yossef et al. 02) 1. Choose a random hash function h : [n] [n] from a 2-universal family. Set z = 0, B =. Choose a secondary 2-universal hash function g : [n] [(log n/ɛ) O(1) ]. 2. For each new coming item e, if zeros(h(e)) z, then (a) set B = B {g(e)}; (b) if B > c/ɛ 2 then set z = z + 1 and rehash B. 3. Output B 2 z. 11-1

15 An improved algorithm Idea: two-level hashing. The algorithm (Bar-Yossef et al. 02) 1. Choose a random hash function h : [n] [n] from a 2-universal family. Set z = 0, B =. Choose a secondary 2-universal hash function g : [n] [(log n/ɛ) O(1) ]. 2. For each new coming item e, if zeros(h(e)) z, then (a) set B = B {g(e)}; (b) if B > c/ɛ 2 then set z = z + 1 and rehash B. 3. Output B 2 z. Analysis (on board) 11-2

16 An improved algorithm Idea: two-level hashing. The algorithm (Bar-Yossef et al. 02) 1. Choose a random hash function h : [n] [n] from a 2-universal family. Set z = 0, B =. Choose a secondary 2-universal hash function g : [n] [(log n/ɛ) O(1) ]. 2. For each new coming item e, if zeros(h(e)) z, then (a) set B = B {g(e)}; (b) if B > c/ɛ 2 then set z = z + 1 and rehash B. 3. Output B 2 z. Analysis (on board) Theorem The number of distinct elements can be (1 + ɛ)-approximated with probability 2/3 using O(log n + 1/ɛ 2 (log(1/ɛ) + log log n)) bits. 11-3

17 Nice algorithms, but only work for insertion-only sequences

18 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v 13-1

19 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M = Mv answer v Simple and useful Perfect for streaming and distributed computations. Work for insertion+deletion sequences (that is, can be either positive or negative). 13-2

20 Distinct elements using linear sketches Search version Decision version Let D be # distinct elements: If D T (1 + ɛ), then answer YES. If D T /(1 + ɛ), then answer NO. Try T = 1, (1 + ɛ), (1 + ɛ) 2,

21 Now, the decision problem The algorithm 1. Select a random set S {1, 2,..., n}, s.t. for each i, independently, we have Pr[i S] = 1/T 2. Make a pass over the stream, maintaining Sum S (x) = i S x i Note: this is a linear sketch. 3. If Sum S (x) > 0, return YES, otherwise return NO. 15-1

22 Now, the decision problem 15-2 The algorithm 1. Select a random set S {1, 2,..., n}, s.t. for each i, independently, we have Pr[i S] = 1/T 2. Make a pass over the stream, maintaining Sum S (x) = i S x i Note: this is a linear sketch. 3. If Sum S (x) > 0, return YES, otherwise return NO. Lemma Let P = Pr[Sum S (x) = 0]. If T is large enough, and ɛ is small enough, then If D T (1 + ɛ), then P < 1/e ɛ/3. If D T /(1 + ɛ), then P > 1/e + ɛ/3. Proof (on board)

23 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. 16-1

24 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. Lemma If the constant C is large enough, then this algorithm reports a correct answer with probability 1 δ. 16-2

25 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. Lemma If the constant C is large enough, then this algorithm reports a correct answer with probability 1 δ. Proof (on board) 16-3

26 Amplify the success probability Repeat to amplify the success probability 1. Select k sets S 1,..., S k as in previous algorithm, for k = C log(1/δ)/ɛ 2, C > 0 2. Let Z be the number of values of Sum Sj (x) that are equal to 0, j = 1,..., k. 3. If Z < k/e then report YES, otherwise report NO. Lemma If the constant C is large enough, then this algorithm reports a correct answer with probability 1 δ. Proof (on board) 16-4 Theorem The number of distinct elements can be (1 ± ɛ)-approximated with probability 1 δ using O(log 2 n log(1/δ)/ɛ 3 ) bits.

27 Question: can we make FM sketch linear? 17-1

28 Heavy Hitters

29 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } 19-1

30 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } L p Heavy Hitter Problem: Given φ, φ, (often φ = φ ɛ), return a set S such that HH p φ (x) S HHp φ (x) 19-2

31 Heavy hitters L p heavy hitter set: HH p φ (x) = {i : x i φ x p } L p Heavy Hitter Problem: Given φ, φ, (often φ = φ ɛ), return a set S such that HH p φ (x) S HHp φ (x) L p Point Query Problem: Given ɛ, after reading the whole stream, given i, report x i = x i ± ɛ x p 19-3

32 Space-saving: an algorithm for insertion only Algorithm Space-saving [Metwally et al. 05] When a new item e comes, we have two cases. 1. If e is already in the array. We just increment f (e) by 1 and reinsert the (e, f (e)) into the array. 2. If e is not in the array, we create a new tuple (e, MIN + 1) where MIN = min{f (e i ) : e i is in the array}. We always keep the array sorted according to f (e i ), and then MIN is just the estimated frequency of the last item. If the length array is larger than 1/ɛ, we delete the last tuple. Cost: O(1/ɛ log n) bits. 20-1

33 Space-saving: an algorithm for insertion only Algorithm Space-saving [Metwally et al. 05] When a new item e comes, we have two cases. 1. If e is already in the array. We just increment f (e) by 1 and reinsert the (e, f (e)) into the array. 2. If e is not in the array, we create a new tuple (e, MIN + 1) where MIN = min{f (e i ) : e i is in the array}. We always keep the array sorted according to f (e i ), and then MIN is just the estimated frequency of the last item. If the length array is larger than 1/ɛ, we delete the last tuple. Cost: O(1/ɛ log n) bits. Proof (on board) 20-2

34 L 1 point query Algorithm Count-Min [Cormode and Muthu 05] Pick d (d = log(1/δ)) independent hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 2/ɛ) from a 2-universal family. Maintain d vectors Z 1,..., Z d where Z t = {Z1, t..., Zw t } such that = i:h t (i)=j x i Z t j Estimator: x i = min t Z t h t (i) 21-1

35 L 1 point query Algorithm Count-Min [Cormode and Muthu 05] Pick d (d = log(1/δ)) independent hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 2/ɛ) from a 2-universal family. Maintain d vectors Z 1,..., Z d where Z t = {Z1, t..., Zw t } such that = i:h t (i)=j x i Z t j Estimator: x i = min t Z t h t (i) Analysis (on board) 21-2

36 L 1 point query Algorithm Count-Min [Cormode and Muthu 05] Pick d (d = log(1/δ)) independent hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 2/ɛ) from a 2-universal family. Maintain d vectors Z 1,..., Z d where Z t = {Z1, t..., Zw t } such that = i:h t (i)=j x i Z t j Estimator: x i = min t Z t h t (i) Analysis (on board) Theorem We can solve L 1 point query, with approximation ɛ, and failure probability δ by storing O(1/ɛ log(1/δ) log n) bits. 21-3

37 L 2 point query Algorithm Count-Sketch [Charikar et. al. 05] Pick d (d = log(1/δ)) independent hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 3/ɛ 2 ) from a 2-universal family. Pick d (d = log(1/δ)) independent hash functions g 1,..., g d where g i : {1,..., n} { 1, 1} from a 2-universal family. Maintain d vectors Z 1,..., Z d where Z t = {Z1, t..., Zw t } such that Zj t = i:h t (i)=j g t(i)x i Estimator: x i = median{z 1 h 1 (i),..., Z d h d (i)} 22-1

38 L 2 point query Algorithm Count-Sketch [Charikar et. al. 05] Pick d (d = log(1/δ)) independent hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 3/ɛ 2 ) from a 2-universal family. Pick d (d = log(1/δ)) independent hash functions g 1,..., g d where g i : {1,..., n} { 1, 1} from a 2-universal family. Maintain d vectors Z 1,..., Z d where Z t = {Z1, t..., Zw t } such that Zj t = i:h t (i)=j g t(i)x i Estimator: x i Analysis (on board) = median{z 1 h 1 (i),..., Z d h d (i)} 22-2

39 L 2 point query Algorithm Count-Sketch [Charikar et. al. 05] Pick d (d = log(1/δ)) independent hash functions h 1,..., h d where h i : {1,..., n} {1,..., w} (w = 3/ɛ 2 ) from a 2-universal family. Pick d (d = log(1/δ)) independent hash functions g 1,..., g d where g i : {1,..., n} { 1, 1} from a 2-universal family. Maintain d vectors Z 1,..., Z d where Z t = {Z1, t..., Zw t } such that Zj t = i:h t (i)=j g t(i)x i Estimator: x i Analysis (on board) Theorem = median{z 1 h 1 (i),..., Z d h d (i)} We can solve L 2 point query, with approximation ɛ, and failure probability δ by storing O(1/ɛ 2 log(1/δ) log n) bits. 22-3

40 L 2 point query (an alternative approach) The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s 23-1

41 L 2 point query (an alternative approach) The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s Theorem Johnson-Linderstrauss Lemma x, we have (1 ɛ) x 2 Rx 2 (1 + ɛ) x 2 w.p. 1 δ. 23-2

42 L 2 point query (an alternative approach) The algorithm: [Gilbert, Kotidis, Muthukrishnan and Strauss 01] Maintain a sketch Rx such that s = Rx 2 = (1 ± ɛ) x 2 (R is a O(1/ɛ 2 log(1/δ)) n matrix, which can be constructed, e.g., by taking each cell to be N (0, 1)) Estimator: x i = (1 Rx/s Re i 2 2 /2)s Theorem Johnson-Linderstrauss Lemma x, we have (1 ɛ) x 2 Rx 2 (1 + ɛ) x 2 w.p. 1 δ. Theorem We can solve L 2 point query, with approximation ɛ, and failure probability δ by storing O(1/ɛ 2 log(1/δ) log n) bits. 23-3

43 24-1 Which one is better?

44 Attribution Some of the contents are borrowed from Amit Chakrabarti s course Pitor Indyk s course and Andrew McGregor s course CS711S12/index.html 25-1

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Linear Sketches A Useful Tool in Streaming and Compressive Sensing Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =

More information

Heavy Hitters. Piotr Indyk MIT. Lecture 4

Heavy Hitters. Piotr Indyk MIT. Lecture 4 Heavy Hitters Piotr Indyk MIT Last Few Lectures Recap (last few lectures) Update a vector x Maintain a linear sketch Can compute L p norm of x (in zillion different ways) Questions: Can we do anything

More information

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science

More information

Lecture 6 September 13, 2016

Lecture 6 September 13, 2016 CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]

More information

Data Streams & Communication Complexity

Data Streams & Communication Complexity Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x

More information

Wavelet decomposition of data streams. by Dragana Veljkovic

Wavelet decomposition of data streams. by Dragana Veljkovic Wavelet decomposition of data streams by Dragana Veljkovic Motivation Continuous data streams arise naturally in: telecommunication and internet traffic retail and banking transactions web server log records

More information

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick,

Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Title: Count-Min Sketch Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting, sketch

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

11 Heavy Hitters Streaming Majority

11 Heavy Hitters Streaming Majority 11 Heavy Hitters A core mining problem is to find items that occur more than one would expect. These may be called outliers, anomalies, or other terms. Statistical models can be layered on top of or underneath

More information

4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S

4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) 12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters) Many streaming algorithms use random hashing functions to compress data. They basically randomly map some data items on top of each other.

More information

Distributed Statistical Estimation of Matrix Products with Applications

Distributed Statistical Estimation of Matrix Products with Applications Distributed Statistical Estimation of Matrix Products with Applications David Woodruff CMU Qin Zhang IUB PODS 2018 June, 2018 1-1 The Distributed Computation Model p-norms, heavy-hitters,... A {0, 1} m

More information

CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms

CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the

More information

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make

More information

Finding Frequent Items in Probabilistic Data

Finding Frequent Items in Probabilistic Data Finding Frequent Items in Probabilistic Data Qin Zhang, Hong Kong University of Science & Technology Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology SIGMOD 2008

More information

Algorithms for Querying Noisy Distributed/Streaming Datasets

Algorithms for Querying Noisy Distributed/Streaming Datasets Algorithms for Querying Noisy Distributed/Streaming Datasets Qin Zhang Indiana University Bloomington Sublinear Algo Workshop @ JHU Jan 9, 2016 1-1 The big data models The streaming model (Alon, Matias

More information

Sublinear Algorithms for Big Data

Sublinear Algorithms for Big Data Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 2: Sublinear in Communication Sublinear in communication The model x 1 = 010011 x 2 = 111011 x 3 = 111111 x k = 100011 Applicaitons They want to

More information

Computing the Entropy of a Stream

Computing the Entropy of a Stream Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound

More information

Data Stream Methods. Graham Cormode S. Muthukrishnan

Data Stream Methods. Graham Cormode S. Muthukrishnan Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating

More information

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014

CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /

More information

6 Filtering and Streaming

6 Filtering and Streaming Casus ubique valet; semper tibi pendeat hamus: Quo minime credas gurgite, piscis erit. [Luck affects everything. Let your hook always be cast. Where you least expect it, there will be a fish.] Publius

More information

Estimating Dominance Norms of Multiple Data Streams

Estimating Dominance Norms of Multiple Data Streams Estimating Dominance Norms of Multiple Data Streams Graham Cormode and S. Muthukrishnan Abstract. There is much focus in the algorithms and database communities on designing tools to manage and mine data

More information

Improved Algorithms for Distributed Entropy Monitoring

Improved Algorithms for Distributed Entropy Monitoring Improved Algorithms for Distributed Entropy Monitoring Jiecao Chen Indiana University Bloomington jiecchen@umail.iu.edu Qin Zhang Indiana University Bloomington qzhangcs@indiana.edu Abstract Modern data

More information

Tight Bounds for Distributed Functional Monitoring

Tight Bounds for Distributed Functional Monitoring Tight Bounds for Distributed Functional Monitoring Qin Zhang MADALGO, Aarhus University Joint with David Woodruff, IBM Almaden NII Shonan meeting, Japan Jan. 2012 1-1 The distributed streaming model (a.k.a.

More information

The Count-Min-Sketch and its Applications

The Count-Min-Sketch and its Applications The Count-Min-Sketch and its Applications Jannik Sundermeier Abstract In this thesis, we want to reveal how to get rid of a huge amount of data which is at least dicult or even impossible to store in local

More information

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a

More information

Mining Data Streams. The Stream Model Sliding Windows Counting 1 s

Mining Data Streams. The Stream Model Sliding Windows Counting 1 s Mining Data Streams The Stream Model Sliding Windows Counting 1 s 1 The Stream Model Data enters at a rapid rate from one or more input ports. The system cannot store the entire stream. How do you make

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms

More information

Lower Bound Techniques for Multiparty Communication Complexity

Lower Bound Techniques for Multiparty Communication Complexity Lower Bound Techniques for Multiparty Communication Complexity Qin Zhang Indiana University Bloomington Based on works with Jeff Phillips, Elad Verbin and David Woodruff 1-1 The multiparty number-in-hand

More information

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk

Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk Computer Science and Artificial Intelligence Laboratory Technical Report -CSAIL-TR-2008-024 May 2, 2008 Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk massachusetts institute of technology,

More information

Big Data. Big data arises in many forms: Common themes:

Big Data. Big data arises in many forms: Common themes: Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity

More information

Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard,

Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, Morten Houmøller Nygaard, Estimating Frequencies and Finding Heavy Hitters Jonas Nicolai Hovmand, 2011 3884 Morten Houmøller Nygaard, 2011 4582 Master s Thesis, Computer Science June 2016 Main Supervisor: Gerth Stølting Brodal

More information

Declaring Independence via the Sketching of Sketches. Until August Hire Me!

Declaring Independence via the Sketching of Sketches. Until August Hire Me! Declaring Independence via the Sketching of Sketches Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Until August 08 -- Hire Me! The Problem The Problem

More information

2 How many distinct elements are in a stream?

2 How many distinct elements are in a stream? Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition

More information

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY

More information

Tight Bounds for Distributed Streaming

Tight Bounds for Distributed Streaming Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming

More information

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery

Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan Sparse signal recovery measurements:

More information

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in

More information

Improved Range-Summable Random Variable Construction Algorithms

Improved Range-Summable Random Variable Construction Algorithms Improved Range-Summable Random Variable Construction Algorithms A. R. Calderbank A. Gilbert K. Levchenko S. Muthukrishnan M. Strauss January 19, 2005 Abstract Range-summable universal hash functions, also

More information

Combining geometry and combinatorics

Combining geometry and combinatorics Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss

More information

Leveraging Big Data: Lecture 13

Leveraging Big Data: Lecture 13 Leveraging Big Data: Lecture 13 http://www.cohenwang.com/edith/bigdataclass2013 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo What are Linear Sketches? Linear Transformations of the input vector

More information

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS341 info session is on Thu 3/1 5pm in Gates415 CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/28/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

More information

1 Estimating Frequency Moments in Streams

1 Estimating Frequency Moments in Streams CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature

More information

Sketching in Adversarial Environments

Sketching in Adversarial Environments Sketching in Adversarial Environments Ilya Mironov Moni Naor Gil Segev Abstract We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch

More information

A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows

A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows A Simpler and More Efficient Deterministic Scheme for Finding Frequent Items over Sliding Windows ABSTRACT L.K. Lee Department of Computer Science The University of Hong Kong Pokfulam, Hong Kong lklee@cs.hku.hk

More information

CR-precis: A deterministic summary structure for update data streams

CR-precis: A deterministic summary structure for update data streams CR-precis: A deterministic summary structure for update data streams Sumit Ganguly 1 and Anirban Majumder 2 1 Indian Institute of Technology, Kanpur 2 Lucent Technologies, Bangalore Abstract. We present

More information

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan Connection between... Sparse Approximation and Compressed

More information

Hierarchical Heavy Hitters with the Space Saving Algorithm

Hierarchical Heavy Hitters with the Space Saving Algorithm Hierarchical Heavy Hitters with the Space Saving Algorithm Michael Mitzenmacher Thomas Steinke Justin Thaler School of Engineering and Applied Sciences Harvard University, Cambridge, MA 02138 Email: {jthaler,

More information

Lecture 2 Sept. 8, 2015

Lecture 2 Sept. 8, 2015 CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],

More information

A Pass-Efficient Algorithm for Clustering Census Data

A Pass-Efficient Algorithm for Clustering Census Data A Pass-Efficient Algorithm for Clustering Census Data Kevin Chang Yale University Ravi Kannan Yale University Abstract We present a number of streaming algorithms for a basic clustering problem for massive

More information

Classic Network Measurements meets Software Defined Networking

Classic Network Measurements meets Software Defined Networking Classic Network s meets Software Defined Networking Ran Ben Basat, Technion, Israel Joint work with Gil Einziger and Erez Waisbard (Nokia Bell Labs) Roy Friedman (Technion) and Marcello Luzieli (UFGRS)

More information

Some notes on streaming algorithms continued

Some notes on streaming algorithms continued U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

Lecture 3 Frequency Moments, Heavy Hitters

Lecture 3 Frequency Moments, Heavy Hitters COMS E6998-9: Algorithmic Techniques for Massive Data Sep 15, 2015 Lecture 3 Frequency Moments, Heavy Hitters Instructor: Alex Andoni Scribes: Daniel Alabi, Wangda Zhang 1 Introduction This lecture is

More information

Data Stream Algorithms. S. Muthu Muthukrishnan

Data Stream Algorithms. S. Muthu Muthukrishnan Data Stream Algorithms Notes from a series of lectures by S. Muthu Muthukrishnan Guest Lecturer: Andrew McGregor The 2009 Barbados Workshop on Computational Complexity March 1st March 8th, 2009 Organizer:

More information

Improved Concentration Bounds for Count-Sketch

Improved Concentration Bounds for Count-Sketch Improved Concentration Bounds for Count-Sketch Gregory T. Minton 1 Eric Price 2 1 MIT MSR New England 2 MIT IBM Almaden UT Austin 2014-01-06 Gregory T. Minton, Eric Price (IBM) Improved Concentration Bounds

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 Allan Borodin March 13, 2016 1 / 33 Lecture 9 Announcements 1 I have the assignments graded by Lalla. 2 I have now posted five questions

More information

An Improved Frequent Items Algorithm with Applications to Web Caching

An Improved Frequent Items Algorithm with Applications to Web Caching An Improved Frequent Items Algorithm with Applications to Web Caching Kevin Chen and Satish Rao UC Berkeley Abstract. We present a simple, intuitive algorithm for the problem of finding an approximate

More information

Efficient Sketches for Earth-Mover Distance, with Applications

Efficient Sketches for Earth-Mover Distance, with Applications Efficient Sketches for Earth-Mover Distance, with Applications Alexandr Andoni MIT andoni@mit.edu Khanh Do Ba MIT doba@mit.edu Piotr Indyk MIT indyk@mit.edu David Woodruff IBM Almaden dpwoodru@us.ibm.com

More information

A Mergeable Summaries

A Mergeable Summaries A Mergeable Summaries Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi We study the mergeability of data summaries. Informally speaking, mergeability requires

More information

Succinct Approximate Counting of Skewed Data

Succinct Approximate Counting of Skewed Data Succinct Approximate Counting of Skewed Data David Talbot Google Inc., Mountain View, CA, USA talbot@google.com Abstract Practical data analysis relies on the ability to count observations of objects succinctly

More information

Zero-One Laws for Sliding Windows and Universal Sketches

Zero-One Laws for Sliding Windows and Universal Sketches Zero-One Laws for Sliding Windows and Universal Sketches Vladimir Braverman 1, Rafail Ostrovsky 2, and Alan Roytman 3 1 Department of Computer Science, Johns Hopkins University, USA vova@cs.jhu.edu 2 Department

More information

Lecture 5: Hashing. David Woodruff Carnegie Mellon University

Lecture 5: Hashing. David Woodruff Carnegie Mellon University Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of

More information

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan Biased Quantiles Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com Quantiles Quantiles summarize data

More information

Data Stream Analytics

Data Stream Analytics Data Stream Analytics V. CHRISTOPHIDES vassilis.christophides@inria.fr https://who.rocq.inria.fr/vassilis.christophides/big/ Ecole CentraleSupélec Winter 2018 1 Traffic control Big Data : Velocity IP network

More information

Streaming Graph Computations with a Helpful Advisor. Justin Thaler Graham Cormode and Michael Mitzenmacher

Streaming Graph Computations with a Helpful Advisor. Justin Thaler Graham Cormode and Michael Mitzenmacher Streaming Graph Computations with a Helpful Advisor Justin Thaler Graham Cormode and Michael Mitzenmacher Thanks to Andrew McGregor A few slides borrowed from IITK Workshop on Algorithms for Processing

More information

Lecture 2: Streaming Algorithms

Lecture 2: Streaming Algorithms CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality

More information

Sketching and Streaming for Distributions

Sketching and Streaming for Distributions Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of Technology University of California, San Diego Main Material: Stable distributions, pseudo-random generators,

More information

Tracking Distributed Aggregates over Time-based Sliding Windows

Tracking Distributed Aggregates over Time-based Sliding Windows Tracking Distributed Aggregates over Time-based Sliding Windows Graham Cormode and Ke Yi Abstract. The area of distributed monitoring requires tracking the value of a function of distributed data as new

More information

Streaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:www.mmds.org.

Streaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:www.mmds.org. Streaming - 2 Bloom Filters, Distinct Item counting, Computing moments credits:www.mmds.org http://www.mmds.org Outline More algorithms for streams: 2 Outline More algorithms for streams: (1) Filtering

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

A Near-Optimal Algorithm for Computing the Entropy of a Stream

A Near-Optimal Algorithm for Computing the Entropy of a Stream A Near-Optimal Algorithm for Computing the Entropy of a Stream Amit Chakrabarti ac@cs.dartmouth.edu Graham Cormode graham@research.att.com Andrew McGregor andrewm@seas.upenn.edu Abstract We describe a

More information

Finding Frequent Items in Data Streams

Finding Frequent Items in Data Streams Finding Frequent Items in Data Streams Moses Charikar 1, Kevin Chen 2, and Martin Farach-Colton 3 1 Princeton University moses@cs.princeton.edu 2 UC Berkeley kevinc@cs.berkeley.edu 3 Rutgers University

More information

Distributed Summaries

Distributed Summaries Distributed Summaries Graham Cormode graham@research.att.com Pankaj Agarwal(Duke) Zengfeng Huang (HKUST) Jeff Philips (Utah) Zheiwei Wei (HKUST) Ke Yi (HKUST) Summaries Summaries allow approximate computations:

More information

Introduction to Hashtables

Introduction to Hashtables Introduction to HashTables Boise State University March 5th 2015 Hash Tables: What Problem Do They Solve What Problem Do They Solve? Why not use arrays for everything? 1 Arrays can be very wasteful: Example

More information

Trusting the Cloud with Practical Interactive Proofs

Trusting the Cloud with Practical Interactive Proofs Trusting the Cloud with Practical Interactive Proofs Graham Cormode G.Cormode@warwick.ac.uk Amit Chakrabarti (Dartmouth) Andrew McGregor (U Mass Amherst) Justin Thaler (Harvard/Yahoo!) Suresh Venkatasubramanian

More information

Lecture 2. Frequency problems

Lecture 2. Frequency problems 1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments

More information

compare to comparison and pointer based sorting, binary trees

compare to comparison and pointer based sorting, binary trees Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:

More information

Locality Sensitive Hashing

Locality Sensitive Hashing Locality Sensitive Hashing February 1, 016 1 LSH in Hamming space The following discussion focuses on the notion of Locality Sensitive Hashing which was first introduced in [5]. We focus in the case of

More information

Lecture 1 September 3, 2013

Lecture 1 September 3, 2013 CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 1 September 3, 2013 Scribes: Andrew Wang and Andrew Liu 1 Course Logistics The problem sets can be found on the course website: http://people.seas.harvard.edu/~minilek/cs229r/index.html

More information

Presenting Tree Inventory. Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University

Presenting Tree Inventory. Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University Presenting Tree Inventory Tomislav Sapic GIS Technologist Faculty of Natural Resources Management Lakehead University Suggested Options 1. Print out a Google Maps satellite image of the inventoried block

More information

Part 1: Hashing and Its Many Applications

Part 1: Hashing and Its Many Applications 1 Part 1: Hashing and Its Many Applications Sid C-K Chau Chi-Kin.Chau@cl.cam.ac.u http://www.cl.cam.ac.u/~cc25/teaching Why Randomized Algorithms? 2 Randomized Algorithms are algorithms that mae random

More information

A Mergeable Summaries

A Mergeable Summaries A Mergeable Summaries Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi We study the mergeability of data summaries. Informally speaking, mergeability requires

More information

Lecture Lecture 9 October 1, 2015

Lecture Lecture 9 October 1, 2015 CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest

More information

B490 Mining the Big Data

B490 Mining the Big Data B490 Mining the Big Data 1 Finding Similar Items Qin Zhang 1-1 Motivations Finding similar documents/webpages/images (Approximate) mirror sites. Application: Don t want to show both when Google. 2-1 Motivations

More information

Sketching in Adversarial Environments

Sketching in Adversarial Environments Sketching in Adversarial Environments Ilya Mironov Moni Naor Gil Segev Abstract We formalize a realistic model for computations over massive data sets. The model, referred to as the adversarial sketch

More information

Communication-Efficient Computation on Distributed Noisy Datasets. Qin Zhang Indiana University Bloomington

Communication-Efficient Computation on Distributed Noisy Datasets. Qin Zhang Indiana University Bloomington Communication-Efficient Computation on Distributed Noisy Datasets Qin Zhang Indiana University Bloomington SPAA 15 June 15, 2015 1-1 Model of computation The coordinator model: k sites and 1 coordinator.

More information

Hierarchical Sampling from Sketches: Estimating Functions over Data Streams

Hierarchical Sampling from Sketches: Estimating Functions over Data Streams Hierarchical Sampling from Sketches: Estimating Functions over Data Streams Sumit Ganguly 1 and Lakshminath Bhuvanagiri 2 1 Indian Institute of Technology, Kanpur 2 Google Inc., Bangalore Abstract. We

More information

Hashing. Data organization in main memory or disk

Hashing. Data organization in main memory or disk Hashing Data organization in main memory or disk sequential, binary trees, The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison

More information

Approximate Counts and Quantiles over Sliding Windows

Approximate Counts and Quantiles over Sliding Windows Approximate Counts and Quantiles over Sliding Windows Arvind Arasu Stanford University arvinda@cs.stanford.edu Gurmeet Singh Manku Stanford University manku@cs.stanford.edu Abstract We consider the problem

More information

Lecture 01 August 31, 2017

Lecture 01 August 31, 2017 Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed

More information

Improved Sketching of Hamming Distance with Error Correcting

Improved Sketching of Hamming Distance with Error Correcting Improved Setching of Hamming Distance with Error Correcting Ohad Lipsy Bar-Ilan University Ely Porat Bar-Ilan University Abstract We address the problem of setching the hamming distance of data streams.

More information

Priority sampling for estimation of arbitrary subset sums

Priority sampling for estimation of arbitrary subset sums Priority sampling for estimation of arbitrary subset sums NICK DUFFIELD AT&T Labs Research and CARSTEN LUND AT&T Labs Research and MIKKEL THORUP AT&T Labs Research From a high volume stream of weighted

More information

arxiv: v1 [cs.ds] 3 Feb 2018

arxiv: v1 [cs.ds] 3 Feb 2018 A Model for Learned Bloom Filters and Related Structures Michael Mitzenmacher 1 arxiv:1802.00884v1 [cs.ds] 3 Feb 2018 Abstract Recent work has suggested enhancing Bloom filters by using a pre-filter, based

More information

Lecture Lecture 25 November 25, 2014

Lecture Lecture 25 November 25, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,

More information

Join-Distinct Aggregate Estimation over Update Streams

Join-Distinct Aggregate Estimation over Update Streams Join-Distinct Aggregate Estimation over Update Streams Sumit Ganguly IIT Kanpur sganguly@cse.iitk.ac.in Minos Garofalakis Bell Laboratories minos@bell-labs.com Amit Kumar IIT Delhi amitk@cse.iitd.ernet.in

More information

Introduction to Randomized Algorithms III

Introduction to Randomized Algorithms III Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability

More information

Algorithms lecture notes 1. Hashing, and Universal Hash functions

Algorithms lecture notes 1. Hashing, and Universal Hash functions Algorithms lecture notes 1 Hashing, and Universal Hash functions Algorithms lecture notes 2 Can we maintain a dictionary with O(1) per operation? Not in the deterministic sense. But in expectation, yes.

More information

Range-efficient computation of F 0 over massive data streams

Range-efficient computation of F 0 over massive data streams Range-efficient computation of F 0 over massive data streams A. Pavan Dept. of Computer Science Iowa State University pavan@cs.iastate.edu Srikanta Tirthapura Dept. of Elec. and Computer Engg. Iowa State

More information

Sketching Probabilistic Data Streams

Sketching Probabilistic Data Streams Sketching Probabilistic Data Streams Graham Cormode AT&T Labs Research 18 Park Avenue, Florham Park NJ graham@research.att.com Minos Garofalakis Yahoo! Research and UC Berkeley 2821 Mission College Blvd,

More information