Lecture 2 Sept. 8, 2015
|
|
- Primrose Johnson
- 5 years ago
- Views:
Transcription
1 CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ], < ɛ <, and µ E i X i, P ( i X i µ > ɛµ) < e ɛ µ/3 Today Distinct elements Norm estimation (if there s time) 3 Distinct elements (F ) Problem: Given a stream of integers i,..., i m [n] where [n] : {,,..., n}, we want to output the number of distinct elements seen. 3. Straightforward algorithms. Keep a bit array of length n. Flip bit if a number is seen.. Store the whole stream. Takes m lg n bits. We can solve with O(min(n, m lg n)) bits. 3. Randomized approximation We can settle for outputting t s.t. P ( t t > ɛt) < δ. The original solution was by Flajolet and Martin [].
2 3.3 Idealized algorithm. Pick random function h : [n] [, ] (idealized, since we can t actually nicely store this). Maintain counter X min i stream h(i) 3. Output /X Intuition. X is a random variable that s the minimum of t i.i.d Unif(, ) r.v.s. Claim. EX t+. Proof. EX t + P (X > λ)dλ P ( i str, h(i) > λ)dλ t P (h(i r ) > λ)dλ r ( λ) t dλ Claim. EX (t+)(t+) Proof. EX P (X > λ)dλ P (X > λ)dλ ( λ) t dλ u λ u t ( u)du (t + )(t + ) This gives V ar[x] EX (EX) t, and furthermore V ar[x] < (EX). (t+) (t+) (t+)
3 4 FM+ We average together multiple estimates from the idealized algorithm FM.. Instantiate q /ɛ η FMs independently. Let X i come from FM i. 3. Output /Z, where Z q i X i. We have that E(Z) t+, and V ar(z) t q (t+) (t+) <. q(t+) Claim 3. P ( Z t+ > ɛ t+ ) < η Proof. Chebyshev. P ( Z t + > ɛ (t + ) ) < t + ɛ q(t + ) η Claim 4. P ( ( Z ) t > O(ɛ)t) < η Proof. By the previous claim, with probability η we have ( ± ɛ) t+ ( ± O(ɛ))(t + ) ( ± O(ɛ))t ± O(ɛ) 5 FM++ We take the median of multiple estimates from FM+.. Instantiate s 36 ln(/δ) independent copies of FM+ with η /3.. Output the median t of {/Z j } s j where Z j is from the jth copy of FM+. Claim 5. P ( t t > ɛt) < δ Proof. Let Y j { if (/Z j ) t > ɛt else We have EY j P (Y j ) < /3 from the choice of η. The probability we seek to bound is equivalent to the probability that the median fails, i.e. at least half of the FM+ estimates have Y j. In other words, s Y j > s/ j 3
4 We then get that P ( Y j > s/) P ( Y j s/3 > s/6) () Make the simplifying assumption that EY j /3 (this turns out to be stronger than EY j < /3. Then equation becomes using Chernoff, as desired. P ( Y j E Y j > E Y j ) < e ( ) s/3 3 < δ The final space required, ignoring h, is O( lg(/δ) ɛ ) for O(lg(/δ)) copies of FM+ that require O(/ɛ ) space each. 6 k-wise independent functions Definition 6. A family H of functions mapping [a] to [b] is k-wise independent if j,..., j k [b] and distinct i,..., i k [a], P h H (h(i ) j... h(i k ) j k ) /b k Example. The set H of all functions [a] [b] is k-wise independent for every k. H b a so h is representable in a lg b bits. Example. Let a b q for q p r a prime power, then H poly, the set of degree k polynomials with coefficients in F q, the finite field of order q. H poly q k so h is representable in k lg p bits. Claim 7. H poly is k-wise independent. Proof. Interpolation. 7 Non-idealized FM First, we get an O()-approximation in O(lg n) bits, i.e. our estimate t satisfies t/c t Ct for some constant C.. Pick h from -wise family [n] [n], for n a power of (round up if necessary). Maintain X max i str lsb(h(i)) where lsb is the least significant bit of a number 3. Output X 4
5 For fixed j, let Z j be the number of i in stream with lsb(h(i)) j. Let Z >j be the number of i with lsb(h(i)) > j. Let Y i { lsb(h(i)) j else Then Z j i str Y i. We can compute EZ j t/ j+ and similarly EZ >j t( j ) < t/j+ j+3 and also V ar[z j ] V ar[ Y i ] E( Y i ) (E Y i ) i,i E(Y i Y i ) Since h is from a -wise family, Y i are pairwise independent, so E(Y i Y i ) E(Y i )E(Y i ). We can then show V ar[z j ] < t/ j+ Now for j lg t 5, we have 6 EZ j 3 P (Z j ) P ( Z j EZ j 6) < /5 by Chebyshev. For j lg t + 5 EZ >j /6 P (Z >j ) < /6 by Markov. This means with good probability the max lsb will be above j but below j, in a constant range. This gives us a 3-approximation, i.e. constant approximation. 8 Refine to + ɛ Trivial solution. Algorithm TS stores first C/ɛ distinct elements. This is correct if t C/ɛ. Algorithm.. Instantiate TS,..., TS lg n. Pick g : [n] [n] from -wise family 3. Feed i to TS lsb(g(i)) 4. Output j+ out where t/ j+ /ɛ. 5
6 Let B j be the number of distinct elements hashed by g to TS j. Then EB j t/ j+ Q j. By Chebyshev B j Q j ± O( Q j ) with good probability. This equals ( ± O(ɛ))Q j if Q j /ɛ. Final space: C ɛ (lg n) O( ɛ lg n) bits. It is known that space O(/ɛ + log n) is achievable [4], and furthermore this is optimal [, 5] (also see [3]). References [] Noga Alon, Yossi Matias, Mario Szegedy The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci. 58(): 37 47, 999. [] Philippe Flajolet, G. Nigel Martin Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 3():8 9, 985. [3] T. S. Jayram, Ravi Kumar, D. Sivakumar: The One-Way Communication Complexity of Hamming Distance. Theory of Computing 4(): 9 35, 8. [4] Daniel M. Kane, Jelani Nelson, David P. Woodruff An optimal algorithm for the distinct elements problem. In Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 4 5,. [5] David P. Woodruff. Optimal space lower bounds for all frequency moments. In SODA, pages 67 75, 4. 6
1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More informationLecture 2: Streaming Algorithms
CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality
More informationLecture Lecture 25 November 25, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,
More informationCSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2
CSCI8980 Algorithmic Techniques for Big Data September, 03 Dr. Barna Saha Lecture Scribe: Matt Nohelty Overview We continue our discussion on data streaming models where streams of elements are coming
More informationLecture Lecture 9 October 1, 2015
CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest
More informationLecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationCS 591, Lecture 9 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 9 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 22nd, 2017 Announcement We will cover the Monday s 2/20 lecture (President s day) by appending half
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More informationLecture 01 August 31, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed
More informationRange-efficient computation of F 0 over massive data streams
Range-efficient computation of F 0 over massive data streams A. Pavan Dept. of Computer Science Iowa State University pavan@cs.iastate.edu Srikanta Tirthapura Dept. of Elec. and Computer Engg. Iowa State
More informationLecture 1 September 3, 2013
CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 1 September 3, 2013 Scribes: Andrew Wang and Andrew Liu 1 Course Logistics The problem sets can be found on the course website: http://people.seas.harvard.edu/~minilek/cs229r/index.html
More informationCS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014
CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /
More informationCS 229r: Algorithms for Big Data Fall Lecture 17 10/28
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation
More informationLecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1
Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a
More informationAn Optimal Algorithm for Large Frequency Moments Using O(n 1 2/k ) Bits
An Optimal Algorithm for Large Frequency Moments Using O(n 1 2/k ) Bits Vladimir Braverman 1, Jonathan Katzman 2, Charles Seidell 3, and Gregory Vorsanger 4 1 Johns Hopkins University, Department of Computer
More informationLecture 4: Hashing and Streaming Algorithms
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected
More informationRandomness and Computation March 13, Lecture 3
0368.4163 Randomness and Computation March 13, 2009 Lecture 3 Lecturer: Ronitt Rubinfeld Scribe: Roza Pogalnikova and Yaron Orenstein Announcements Homework 1 is released, due 25/03. Lecture Plan 1. Do
More informationLecture 3 Frequency Moments, Heavy Hitters
COMS E6998-9: Algorithmic Techniques for Massive Data Sep 15, 2015 Lecture 3 Frequency Moments, Heavy Hitters Instructor: Alex Andoni Scribes: Daniel Alabi, Wangda Zhang 1 Introduction This lecture is
More informationLecture 1: Introduction to Sublinear Algorithms
CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for
More informationLecture 13 March 7, 2017
CS 224: Advanced Algorithms Spring 2017 Prof. Jelani Nelson Lecture 13 March 7, 2017 Scribe: Hongyao Ma Today PTAS/FPTAS/FPRAS examples PTAS: knapsack FPTAS: knapsack FPRAS: DNF counting Approximation
More informationLecture 2 Sep 5, 2017
CS 388R: Randomized Algorithms Fall 2017 Lecture 2 Sep 5, 2017 Prof. Eric Price Scribe: V. Orestis Papadigenopoulos and Patrick Rall NOTE: THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS 1
More information6.842 Randomness and Computation Lecture 5
6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its
More informationData Stream Methods. Graham Cormode S. Muthukrishnan
Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating
More informationConsistent Sampling with Replacement
Consistent Sampling with Replacement Ronald L. Rivest MIT CSAIL rivest@mit.edu arxiv:1808.10016v1 [cs.ds] 29 Aug 2018 August 31, 2018 Abstract We describe a very simple method for consistent sampling that
More informationAs mentioned, we will relax the conditions of our dictionary data structure. The relaxations we
CSE 203A: Advanced Algorithms Prof. Daniel Kane Lecture : Dictionary Data Structures and Load Balancing Lecture Date: 10/27 P Chitimireddi Recap This lecture continues the discussion of dictionary data
More informationLecture 4: Sampling, Tail Inequalities
Lecture 4: Sampling, Tail Inequalities Variance and Covariance Moment and Deviation Concentration and Tail Inequalities Sampling and Estimation c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1 /
More informationA Near-Optimal Algorithm for Computing the Entropy of a Stream
A Near-Optimal Algorithm for Computing the Entropy of a Stream Amit Chakrabarti ac@cs.dartmouth.edu Graham Cormode graham@research.att.com Andrew McGregor andrewm@seas.upenn.edu Abstract We describe a
More informationLecture Notes 3 Convergence (Chapter 5)
Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationLecture 4 Thursday Sep 11, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)
More information6 Filtering and Streaming
Casus ubique valet; semper tibi pendeat hamus: Quo minime credas gurgite, piscis erit. [Luck affects everything. Let your hook always be cast. Where you least expect it, there will be a fish.] Publius
More information4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationLinear Sketches A Useful Tool in Streaming and Compressive Sensing
Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =
More informationLecture Lecture 3 Tuesday Sep 09, 2014
CS 4: Advanced Algorithms Fall 04 Lecture Lecture 3 Tuesday Sep 09, 04 Prof. Jelani Nelson Scribe: Thibaut Horel Overview In the previous lecture we finished covering data structures for the predecessor
More informationLecture 4 February 2nd, 2017
CS 224: Advanced Algorithms Spring 2017 Prof. Jelani Nelson Lecture 4 February 2nd, 2017 Scribe: Rohil Prasad 1 Overview In the last lecture we covered topics in hashing, including load balancing, k-wise
More informationCMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms
CMSC 858F: Algorithmic Lower Bounds: Fun with Hardness Proofs Fall 2014 Introduction to Streaming Algorithms Instructor: Mohammad T. Hajiaghayi Scribe: Huijing Gong November 11, 2014 1 Overview In the
More informationTopics in Probabilistic Combinatorics and Algorithms Winter, Basic Derandomization Techniques
Topics in Probabilistic Combinatorics and Algorithms Winter, 016 3. Basic Derandomization Techniques Definition. DTIME(t(n)) : {L : L can be decided deterministically in time O(t(n)).} EXP = { L: L can
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More informationBasic Probabilistic Checking 3
CS294: Probabilistically Checkable and Interactive Proofs February 21, 2017 Basic Probabilistic Checking 3 Instructor: Alessandro Chiesa & Igor Shinkar Scribe: Izaak Meckler Today we prove the following
More informationLecture 11 October 7, 2013
CS 4: Advanced Algorithms Fall 03 Prof. Jelani Nelson Lecture October 7, 03 Scribe: David Ding Overview In the last lecture we talked about set cover: Sets S,..., S m {,..., n}. S has cost c S. Goal: Cover
More informationBig Data. Big data arises in many forms: Common themes:
Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity
More informationSublinear-Time Algorithms
Lecture 20 Sublinear-Time Algorithms Supplemental reading in CLRS: None If we settle for approximations, we can sometimes get much more efficient algorithms In Lecture 8, we saw polynomial-time approximations
More informationSparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationHomework 4 Solutions
CS 174: Combinatorics and Discrete Probability Fall 01 Homework 4 Solutions Problem 1. (Exercise 3.4 from MU 5 points) Recall the randomized algorithm discussed in class for finding the median of a set
More informationData Streams & Communication Complexity
Data Streams & Communication Complexity Lecture 1: Simple Stream Statistics in Small Space Andrew McGregor, UMass Amherst 1/25 Data Stream Model Stream: m elements from universe of size n, e.g., x 1, x
More informationFinding Frequent Items in Data Streams
Finding Frequent Items in Data Streams Moses Charikar 1, Kevin Chen 2, and Martin Farach-Colton 3 1 Princeton University moses@cs.princeton.edu 2 UC Berkeley kevinc@cs.berkeley.edu 3 Rutgers University
More informationProblem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)
Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationArthur-Merlin Streaming Complexity
Weizmann Institute of Science Joint work with Ran Raz Data Streams The data stream model is an abstraction commonly used for algorithms that process network traffic using sublinear space. A data stream
More informationNotes for Lecture 14 v0.9
U.C. Berkeley CS27: Computational Complexity Handout N14 v0.9 Professor Luca Trevisan 10/25/2002 Notes for Lecture 14 v0.9 These notes are in a draft version. Please give me any comments you may have,
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More informationCMPUT 675: Approximation Algorithms Fall 2014
CMPUT 675: Approximation Algorithms Fall 204 Lecture 25 (Nov 3 & 5): Group Steiner Tree Lecturer: Zachary Friggstad Scribe: Zachary Friggstad 25. Group Steiner Tree In this problem, we are given a graph
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 2-1 Part 1: Sublinear in Space The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store
More informationHow Philippe Flipped Coins to Count Data
1/18 How Philippe Flipped Coins to Count Data Jérémie Lumbroso LIP6 / INRIA Rocquencourt December 16th, 2011 0. DATA STREAMING ALGORITHMS Stream: a (very large) sequence S over (also very large) domain
More informationLecture 7: Fingerprinting. David Woodruff Carnegie Mellon University
Lecture 7: Fingerprinting David Woodruff Carnegie Mellon University How to Pick a Random Prime How to pick a random prime in the range {1, 2,, M}? How to pick a random integer X? Pick a uniformly random
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in
More informationAdvanced Algorithm Design: Hashing and Applications to Compact Data Representation
Advanced Algorithm Design: Hashing and Applications to Compact Data Representation Lectured by Prof. Moses Chariar Transcribed by John McSpedon Feb th, 20 Cucoo Hashing Recall from last lecture the dictionary
More informationRevisiting Frequency Moment Estimation in Random Order Streams
Revisiting Frequency Moment Estimation in Random Order Streams arxiv:803.02270v [cs.ds] 6 Mar 208 Vladimir Braverman, Johns Hopkins University David Woodruff, Carnegie Mellon University March 6, 208 Abstract
More information1-Pass Relative-Error L p -Sampling with Applications
-Pass Relative-Error L p -Sampling with Applications Morteza Monemizadeh David P Woodruff Abstract For any p [0, 2], we give a -pass poly(ε log n)-space algorithm which, given a data stream of length m
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Lecture 9 Allan Borodin March 13, 2016 1 / 33 Lecture 9 Announcements 1 I have the assignments graded by Lalla. 2 I have now posted five questions
More informationLecture 10 September 27, 2016
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 10 September 27, 2016 Scribes: Quinten McNamara & William Hoza 1 Overview In this lecture, we focus on constructing coresets, which are
More informationBlock Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk
Computer Science and Artificial Intelligence Laboratory Technical Report -CSAIL-TR-2008-024 May 2, 2008 Block Heavy Hitters Alexandr Andoni, Khanh Do Ba, and Piotr Indyk massachusetts institute of technology,
More informationCS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms
CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms Tim Roughgarden March 3, 2016 1 Preamble In CS109 and CS161, you learned some tricks of
More informationLower Bounds for Testing Bipartiteness in Dense Graphs
Lower Bounds for Testing Bipartiteness in Dense Graphs Andrej Bogdanov Luca Trevisan Abstract We consider the problem of testing bipartiteness in the adjacency matrix model. The best known algorithm, due
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/26/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 2 More algorithms
More information25.2 Last Time: Matrix Multiplication in Streaming Model
EE 381V: Large Scale Learning Fall 01 Lecture 5 April 18 Lecturer: Caramanis & Sanghavi Scribe: Kai-Yang Chiang 5.1 Review of Streaming Model Streaming model is a new model for presenting massive data.
More informationThe quantum complexity of approximating the frequency moments
The quantum complexity of approximating the frequency moments Ashley Montanaro May 1, 2015 Abstract The th frequency moment of a sequence of integers is defined as F = j n j, where n j is the number of
More informationSketching Probabilistic Data Streams
Sketching Probabilistic Data Streams Graham Cormode AT&T Labs Research 18 Park Avenue, Florham Park NJ graham@research.att.com Minos Garofalakis Yahoo! Research and UC Berkeley 2821 Mission College Blvd,
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationVery Sparse Random Projections
Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD 06] Presented by: Aditya Menon UCSD March 4, 2009 Presented by: Aditya Menon (UCSD) Very Sparse Random Projections March 4,
More information1 Approximate Counting by Random Sampling
COMP8601: Advanced Topics in Theoretical Computer Science Lecture 5: More Measure Concentration: Counting DNF Satisfying Assignments, Hoeffding s Inequality Lecturer: Hubert Chan Date: 19 Sep 2013 These
More informationLecture 2 September 4, 2014
CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the
More informationCS 229r Notes. Sam Elder. November 26, 2013
CS 9r Notes Sam Elder November 6, 013 Tuesday, September 3, 013 Standard information: Instructor: Jelani Nelson Grades (see course website for details) will be based on scribing (10%), psets (40%) 1, and
More informationLecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality
Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek Bhrushundi
More informationTopic: Sampling, Medians of Means method and DNF counting Date: October 6, 2004 Scribe: Florin Oprea
15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Sampling, Medians of Means method and DNF counting Date: October 6, 200 Scribe: Florin Oprea 8.1 Introduction In this lecture we will consider
More informationOptimality of the Johnson-Lindenstrauss Lemma
Optimality of the Johnson-Lindenstrauss Lemma Kasper Green Larsen Jelani Nelson September 7, 2016 Abstract For any integers d, n 2 and 1/(min{n, d}) 0.4999 < ε < 1, we show the existence of a set of n
More information6.841/18.405J: Advanced Complexity Wednesday, April 2, Lecture Lecture 14
6.841/18.405J: Advanced Complexity Wednesday, April 2, 2003 Lecture Lecture 14 Instructor: Madhu Sudan In this lecture we cover IP = PSPACE Interactive proof for straightline programs. Straightline program
More informationInterval Selection in the streaming model
Interval Selection in the streaming model Pascal Bemmann Abstract In the interval selection problem we are given a set of intervals via a stream and want to nd the maximum set of pairwise independent intervals.
More informationLecture 5: Two-point Sampling
Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise
More informationTime lower bounds for nonadaptive turnstile streaming algorithms
Time lower bounds for nonadaptive turnstile streaming algorithms Kasper Green Larsen Jelani Nelson Huy L. Nguy ên July 8, 2014 Abstract We say a turnstile streaming algorithm is non-adaptive if, during
More informationProbability Revealing Samples
Krzysztof Onak IBM Research Xiaorui Sun Microsoft Research Abstract In the most popular distribution testing and parameter estimation model, one can obtain information about an underlying distribution
More informationFrequency Estimators
Frequency Estimators Outline for Today Randomized Data Structures Our next approach to improving performance. Count-Min Sketches A simple and powerful data structure for estimating frequencies. Count Sketches
More informationSparser Johnson-Lindenstrauss Transforms
Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)
More informationExpectation, inequalities and laws of large numbers
Chapter 3 Expectation, inequalities and laws of large numbers 3. Expectation and Variance Indicator random variable Let us suppose that the event A partitions the sample space S, i.e. A A S. The indicator
More informationLower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming
Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming Sudipto Guha 1 and Andrew McGregor 2 1 University of Pennsylvania sudipto@cis.upenn.edu 2 University of California, San Diego
More informationStanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010
Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/2/200 Counting Problems Today we describe counting problems and the class #P that they define, and we show that every counting
More informationAn Improved Frequent Items Algorithm with Applications to Web Caching
An Improved Frequent Items Algorithm with Applications to Web Caching Kevin Chen and Satish Rao UC Berkeley Abstract. We present a simple, intuitive algorithm for the problem of finding an approximate
More information2 Completing the Hardness of approximation of Set Cover
CSE 533: The PCP Theorem and Hardness of Approximation (Autumn 2005) Lecture 15: Set Cover hardness and testing Long Codes Nov. 21, 2005 Lecturer: Venkat Guruswami Scribe: Atri Rudra 1 Recap We will first
More informationCS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS341 info session is on Thu 3/1 5pm in Gates415 CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/28/18 Jure Leskovec, Stanford CS246: Mining Massive Datasets,
More informationCS5112: Algorithms and Data Structures for Applications
CS5112: Algorithms and Data Structures for Applications Lecture 14: Exponential decay; convolution Ramin Zabih Some content from: Piotr Indyk; Wikipedia/Google image search; J. Leskovec, A. Rajaraman,
More informationStreaming and communication complexity of Hamming distance
Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern
More informationHierarchical Sampling from Sketches: Estimating Functions over Data Streams
Hierarchical Sampling from Sketches: Estimating Functions over Data Streams Sumit Ganguly 1 and Lakshminath Bhuvanagiri 2 1 Indian Institute of Technology, Kanpur 2 Google Inc., Bangalore Abstract. We
More informationCMPSCI 711: More Advanced Algorithms
CMPSCI 711: More Advanced Algorithms Section 1-1: Sampling Andrew McGregor Last Compiled: April 29, 2012 1/14 Concentration Bounds Theorem (Markov) Let X be a non-negative random variable with expectation
More informationPRGs for space-bounded computation: INW, Nisan
0368-4283: Space-Bounded Computation 15/5/2018 Lecture 9 PRGs for space-bounded computation: INW, Nisan Amnon Ta-Shma and Dean Doron 1 PRGs Definition 1. Let C be a collection of functions C : Σ n {0,
More information1 Randomized Computation
CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at
More informationKousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13
Discrete Mathematics & Mathematical Reasoning Chapter 7 (continued): Markov and Chebyshev s Inequalities; and Examples in probability: the birthday problem Kousha Etessami U. of Edinburgh, UK Kousha Etessami
More informationLecture 4: Two-point Sampling, Coupon Collector s problem
Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms
More informationTight Bounds for Distributed Streaming
Tight Bounds for Distributed Streaming (a.k.a., Distributed Functional Monitoring) David Woodruff IBM Research Almaden Qin Zhang MADALGO, Aarhus Univ. STOC 12 May 22, 2012 1-1 The distributed streaming
More information