CSCI8980 Algorithmic Techniques for Big Data September 12, Lecture 2
|
|
- Myles Blankenship
- 6 years ago
- Views:
Transcription
1 CSCI8980 Algorithmic Techniques for Big Data September, 03 Dr. Barna Saha Lecture Scribe: Matt Nohelty Overview We continue our discussion on data streaming models where streams of elements are coming in and main memory space is not sufficient to hold all the data. We begin by discussing the Chernoff Bound and demonstrating it s proof. We then look at the Universal Hash Family and discuss pairwise, k-wise, and fully independent hash functions. Next, we dive deeper into algorithms used to count distinct items in a stream and discuss two algorithms and analyze them. Chernoff Bound The Chernoff Bound is commonly used to show randomization algorithms produce results of acceptable quality or to determine the number of runs needed to acheive a result of a certain probability. Many data streaming algorithms have components of randomization so the Chernoff Bound is frequently used with these algorithms. The Chernoff Bound produces tighter bounds than the Markov Inequality or Chebyshev Inequality but it requires assumptions that those two do not. The Chernoff Bound requires it s input to be independent Bernoulli random variables which the other two inequalities do not. Theorem (The Chernoff Bound). Let X, X...X n be n independent Bernoulli random variables with Pr(X i ) p i. Let X X i. Hence, [ ] E[X] E Xi E [X i ] Pr(X i ) p i µ(say). Then the Chernoff Bound says for any ɛ > 0 Pr(X > ( + ɛ)µ) Pr(X < ( ɛ)µ) ( e ɛ ( + ɛ) ɛ ( e ɛ ( ɛ) ɛ ) µ and ) µ When 0 < ɛ < the above expression can be further simplified to Pr(X > ( + ɛ)µ) e µɛ 3 and Pr(X < ( ɛ)µ) e µɛ Hence Pr( X µ > ɛµ) e µɛ 3
2 Proof of the Chernoff Upper Bound The upper bound of the Chernoff Bound states: Pr(X > ( + ɛ)µ) e µɛ 3 Proof. P r(e tx e t(+ɛ)µ ) for any t > 0 P r(e tx e t(+ɛ)µ ) E[e tx ] by Markov Inequality et(+ɛ)µ Expand x in the numerator: E[e tx ] E[e t xi ] E[e tx e tx...e txn ] all are independent by base assumption in Chernoff Bound n E[e tx i ] i n [p i e t + ( p i )] i n [ + p i (e t )] i n [e p i (e t )] because e x > + x i e n i p i(e t ) e (et )µ Using the simplified numerator in the Chernoff Bound yields E[eet ] e t(+ɛ)µ Differentiating to find t where the above is minimized results in t ln( + ɛ) Returning to the upper bound with t. Expand x in the numerator: P r(x ( + ɛ)µ) e(e(ln(+ɛ) )µ e +ɛ)ln(+ɛ)µ ) µ ( e ɛ ( + ɛ) (+ɛ) e µ[(+ɛ)ln(+ɛ) ɛ] [ ] ] e µ (+ɛ) [ɛ ɛ + ɛ ɛ e µ [ ɛ ] ɛ [ e µ ɛ ɛ3 6 e µ ɛ ( ɛ ) ] e µ ɛ 3 which is the upper bound of the Chernoff Bound
3 The proof of the lower bound of the Chernoff Bound can be found using similar logic as the proof of the upper bound. Universal Hash Family The Univeral Hash Family is a family of hash functions H {h h : [N] [M]} is called a pairwise independent family of hash functions if for all i j [N] and any k, l [M] P r h H [h(i) k h(j) l] is a strongly universal hash family () M A hash function is pairwise independent if property holds. This definition can be extend to form k-wise hash functions as well. K-wise hash functions are important because they allow for efficient construction of hash families. Fully independent hash functions generally require large space requirements. Hash functions are uniform over [M] P r h H [h(i) k] M () P r h H [h(i) h(j)] M is a weakly universal hash family (3) To Construct a pairwise independent hash family: Let p be a prime. For any a, b Z p {0,,,...p }, define h a,b : Z p Z p by h a,b (x) ax + bmodp. The resulting collection of functions H {h a,b a, b Z p } is a pairwise independent hash family. 3 Counting Distinct Items Given a stream of data a, find the total number of distinct items in the stream. For the purpose of this discussion, we assume the stream to too large to be stored in main memory. a a a...a m a i (j, µ) where j [, n] and µ m represents the number of elements in the stream n represents the maximum number of distinct elements that could be in the stream. The goal is to find the actual number of distinct elements, DE. However, because we cannot store a in main memory, we must approximate DE. This approximation will be denoted DE. We want to find DE such that the following constraint holds with probablilty ( δ). 3
4 ( ɛ)de DE DE( + ɛ) for ɛ > 0 (4) 4 Algorithm - Count Distinct Items The following algorithm attempts to guess the actual value of DE by looping through exponentially growing values of t. For each guess, the algorithm calls EST IMAT E which returns YES if there are at least t distinct values, otherwise it returns NO. EST IMAT E returns the correct answer with probability ( δ) as we will see later. Following the for loop, we have a list of YES/NO values corresponding to each t. The algorithm returns the largest value of t which has a value YES. Algorithm COUNT DISTINCT ITEMS[a, ɛ, δ] ɛ ɛ/ for t, ( + ɛ ), ( + ɛ ),... ( + ɛ ) log n +ɛ do δ ɛ δ logn {Run in parallel} b t EST IMAT E(a, t, ɛ, δ ) {b t is a boolean variable YES/NO} end for return the smallest value of t such that b t YES and b t NO if no such t exists, return n Below is an example of the output produced by the for loop in Algorithm. This is the likely output produced in the case where ( + ɛ ) DE ( + ɛ ). t YES t ( + ɛ ) YES t ( + ɛ ) NO t ( + ɛ ) 3 NO... t n NO As the example illustrates, the resulting DE satisfies the constraint: ( ɛ)de DE DE(+ɛ) Proof. For each t, we get the correct result with probability δ ɛ δ logn and there are log +ɛ n different values for t. P r(error for any t) δ P r(error in at least one t) t P r(error for any t) log +ɛ nδ ɛ lognδ δ P r(no error in any t) < δ 4
5 5 Algorithm - ESTIMATE EST IMAT E randomly selects c ɛ log δ hash functions from a fully-independent hash family. The hash function h is of the form h : [...n] [...t]. We then compute the hash value for every value of in the stream for each hash function. If the hash function ever returns, use YES for this t, otherwise use NO. Finally, count the number of NO values and if it s greater than or equal to c log ɛ δ, return NO, otherwise return YES. EST IMAT E returns the correct answer with probabily ( δ) because there are c ɛ log δ hash functions used and the most common answer wins. This minimizes the impact of the randomization in the hash functions. Algorithm [ESTIMATE(a, t, ɛ, δ )] count 0 for t, c log ɛ δ do Select a hash function h i uniformly and randomly from a fully-independent hash family H {run in parallel} b i t NO repeat Consider the current element in the stream a, say a i (j, µ) if h i (j) then b i t YES, BREAK end if until a is exhasted if b i t NO then count count + end if end for if count e c ɛ return NO else return YES end if log δ then Proof. The goal is to return YES when DE > ( + ɛ)t and to return NO when DE < ( ɛ)t. Let h i be the i th run through the for loop. There are k runs where k c ɛ log δ P r(h i (j) ) t by definition of h P r(return NO for the i th run) P r(none of the distinct elements are mapped to by h i ) ( t )DE 5
6 Lemma. Consider the i th round of EST IMAT E(a, t, ɛ, δ ) for any i [ c ɛ log δ ] If DE > ( + ɛ)t and ɛ < then P r[b i t NO] e ɛ e P r(i th run returns NO) ( t )(+ɛ)t e (+ɛ) when t is large e ( ɛ + ɛ...) e ɛ e + ɛ e e ɛ e If DE < ( ɛ)t and ɛ < then P r[b i t NO] e + ɛ e P r(i th run returns NO) ( e )( ɛ)t e + ɛ e by the same logic as above Lemma 3. Demostrates the bounds of the error in Algorithm. If DE > ( + ɛ )t then P r[b t NO] δ If DE < ( ɛ )t then P r[b t Y ES] δ P r(algorithm returns NO) P r(x > k e ) because we return NO if more than k e e ɛ ck runs return NO Define a random variable x i if algorithm returns NO, otherwise x i 0. x E[x] xi E[x] P r(xi ) P r(i th run returns NO) k( e + ɛ ) by Lemma e 6
7 Re-write P r(x > k e ) in the form of the Chernoff Bound P r(x > ( + ɛ )E[x]) ( + ɛ )k( e ɛ e ) k by using the value of E[x] from above e ( + ɛ )( ɛ ) + ɛ ɛ P r(x > k e ) e ɛ µ 3 e ɛ ck δ using k c log ɛ δ The lower bound can be demonstrated with similar logic to what was done to prove the upper c bound above. This shows that when run enough times, log ɛ δ, we can minimize the probability for error to a sufficient level. Lemma 4. If DE t > ɛ t then P r[error] δ Using the Union Bound, we know the total P r[error] cannot exceed the sum of the P r[error] of the lower bound and the P r[error] of the upper bound. δ + δ δ Lemma 5. For all t such that DE t > ɛ t then P r[error] δ Theorem 6. Algorithm returns an estimate of DE within ( ± ɛ) with probability ( δ). Theorem 6 shows that this algorithm to count distinct items has achieved our goal of finding an algorithm that computes DE under the following accuracy constraint: ( ɛ)de DE DE( + ɛ) for ɛ > 0 and does so with probability ( δ). 6 Space and Time Complexity of Count Distinct Items Space Complexity: O( ɛ 3 log n(log δ +log logn+log ɛ )) Time Complexity: O( ɛ 3 log n(log δ +log logn+log ɛ )) Ignoring constants, there are ɛ logn copies that need to be stored and each requires bit. The space complexity of EST IMAT E is log logn ɛ ɛδ Expanding this space complexity yields: (loglogn + log ɛ ɛ + log δ ) Combining the space complexity and number of copies yields the total space complexity: 7
8 O( ɛ 3 logn(log δ + log logn + log )) (5) ɛ The time complexity can be computed in the same way as the space complexity. In practice, the space and time dependency on ɛ 3 is generally problematic. The optimal lower bound on space complexity for counting distinct items in a stream was shown to be Ω( + log n). ɛ References [] Daniel M. Kane, Jelani Nelson and David P. Woodruff. An Optimal Algorithm for the Distinct Elements Problem. PODS 00:
Lecture 2 Sept. 8, 2015
CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],
More informationLecture 3 Sept. 4, 2014
CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.
More information1 Estimating Frequency Moments in Streams
CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature
More informationLecture 2: Streaming Algorithms
CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality
More informationLecture 1 September 3, 2013
CS 229r: Algorithms for Big Data Fall 2013 Prof. Jelani Nelson Lecture 1 September 3, 2013 Scribes: Andrew Wang and Andrew Liu 1 Course Logistics The problem sets can be found on the course website: http://people.seas.harvard.edu/~minilek/cs229r/index.html
More informationRandomness and Computation March 13, Lecture 3
0368.4163 Randomness and Computation March 13, 2009 Lecture 3 Lecturer: Ronitt Rubinfeld Scribe: Roza Pogalnikova and Yaron Orenstein Announcements Homework 1 is released, due 25/03. Lecture Plan 1. Do
More informationTail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).
Tail Inequalities William Hunt Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV William.Hunt@mail.wvu.edu Introduction In this chapter, we are interested
More informationCSE 190, Great ideas in algorithms: Pairwise independent hash functions
CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required
More informationThe space complexity of approximating the frequency moments
The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate
More informationChernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.
Chernoff Bounds Theme: try to show that it is unlikely a random variable X is far away from its expectation. The more you know about X, the better the bound you obtain. Markov s inequality: use E[X ] Chebyshev
More informationLecture 4: Hashing and Streaming Algorithms
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 4: Hashing and Streaming Algorithms Lecturer: Shayan Oveis Gharan 01/18/2017 Scribe: Yuqing Ai Disclaimer: These notes have not been subjected
More informationTopics in Probabilistic Combinatorics and Algorithms Winter, Basic Derandomization Techniques
Topics in Probabilistic Combinatorics and Algorithms Winter, 016 3. Basic Derandomization Techniques Definition. DTIME(t(n)) : {L : L can be decided deterministically in time O(t(n)).} EXP = { L: L can
More informationSome notes on streaming algorithms continued
U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.
More informationLecture 6 September 13, 2016
CS 395T: Sublinear Algorithms Fall 206 Prof. Eric Price Lecture 6 September 3, 206 Scribe: Shanshan Wu, Yitao Chen Overview Recap of last lecture. We talked about Johnson-Lindenstrauss (JL) lemma [JL84]
More informationExpectation, inequalities and laws of large numbers
Chapter 3 Expectation, inequalities and laws of large numbers 3. Expectation and Variance Indicator random variable Let us suppose that the event A partitions the sample space S, i.e. A A S. The indicator
More informationLecture 2. Frequency problems
1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments
More informationRandomized algorithm
Tutorial 4 Joyce 2009-11-24 Outline Solution to Midterm Question 1 Question 2 Question 1 Question 2 Question 3 Question 4 Question 5 Solution to Midterm Solution to Midterm Solution to Midterm Question
More informationCS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms
CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms Tim Roughgarden March 3, 2016 1 Preamble In CS109 and CS161, you learned some tricks of
More informationHomework 4 Solutions
CS 174: Combinatorics and Discrete Probability Fall 01 Homework 4 Solutions Problem 1. (Exercise 3.4 from MU 5 points) Recall the randomized algorithm discussed in class for finding the median of a set
More informationLecture 1: Introduction to Sublinear Algorithms
CSE 522: Sublinear (and Streaming) Algorithms Spring 2014 Lecture 1: Introduction to Sublinear Algorithms March 31, 2014 Lecturer: Paul Beame Scribe: Paul Beame Too much data, too little time, space for
More informationLecture 01 August 31, 2017
Sketching Algorithms for Big Data Fall 2017 Prof. Jelani Nelson Lecture 01 August 31, 2017 Scribe: Vinh-Kha Le 1 Overview In this lecture, we overviewed the six main topics covered in the course, reviewed
More informationNotes on MapReduce Algorithms
Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our
More informationLecture 5: The Principle of Deferred Decisions. Chernoff Bounds
Randomized Algorithms Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized
More informationCSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )
CSE548, AMS54: Analysis of Algorithms, Spring 014 Date: May 1 Final In-Class Exam ( :35 PM 3:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your relative
More informationCS 591, Lecture 9 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 9 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 22nd, 2017 Announcement We will cover the Monday s 2/20 lecture (President s day) by appending half
More information14.1 Finding frequent elements in stream
Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours
More information6.842 Randomness and Computation Lecture 5
6.842 Randomness and Computation 2012-02-22 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Michael Forbes 1 Overview Today we will define the notion of a pairwise independent hash function, and discuss its
More informationLecture Lecture 3 Tuesday Sep 09, 2014
CS 4: Advanced Algorithms Fall 04 Lecture Lecture 3 Tuesday Sep 09, 04 Prof. Jelani Nelson Scribe: Thibaut Horel Overview In the previous lecture we finished covering data structures for the predecessor
More informationAs mentioned, we will relax the conditions of our dictionary data structure. The relaxations we
CSE 203A: Advanced Algorithms Prof. Daniel Kane Lecture : Dictionary Data Structures and Load Balancing Lecture Date: 10/27 P Chitimireddi Recap This lecture continues the discussion of dictionary data
More informationCSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9
CSE 55 Randomized Algorithms & Probabilistic Analysis Spring 01 Lecture : April 9 Lecturer: Anna Karlin Scribe: Tyler Rigsby & John MacKinnon.1 Kinds of randomization in algorithms So far in our discussion
More informationLecture 5: Hashing. David Woodruff Carnegie Mellon University
Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of
More informationLecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]
U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 4 Professor Satish Rao September 7, 2010 Lecturer: Satish Rao Last revised September 13, 2010 Lecture 4 1 Deviation bounds. Deviation bounds
More informationCS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)
CS5314 Randomized Algorithms Lecture 15: Balls, Bins, Random Graphs (Hashing) 1 Objectives Study various hashing schemes Apply balls-and-bins model to analyze their performances 2 Chain Hashing Suppose
More information6.1 Occupancy Problem
15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.
More informationLecture 4: Sampling, Tail Inequalities
Lecture 4: Sampling, Tail Inequalities Variance and Covariance Moment and Deviation Concentration and Tail Inequalities Sampling and Estimation c Hung Q. Ngo (SUNY at Buffalo) CSE 694 A Fun Course 1 /
More informationStanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010
Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/2/200 Counting Problems Today we describe counting problems and the class #P that they define, and we show that every counting
More informationApproximation Algorithms
Approximation Algorithms What do you do when a problem is NP-complete? or, when the polynomial time solution is impractically slow? assume input is random, do expected performance. Eg, Hamiltonian path
More information2 How many distinct elements are in a stream?
Dealing with Massive Data January 31, 2011 Lecture 2: Distinct Element Counting Lecturer: Sergei Vassilvitskii Scribe:Ido Rosen & Yoonji Shin 1 Introduction We begin by defining the stream formally. Definition
More informationLecture 4 February 2nd, 2017
CS 224: Advanced Algorithms Spring 2017 Prof. Jelani Nelson Lecture 4 February 2nd, 2017 Scribe: Rohil Prasad 1 Overview In the last lecture we covered topics in hashing, including load balancing, k-wise
More informationLecture 5: Two-point Sampling
Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise
More informationLecture 4 Thursday Sep 11, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)
More informationTail Inequalities Randomized Algorithms. Sariel Har-Peled. December 20, 2002
Tail Inequalities 497 - Randomized Algorithms Sariel Har-Peled December 0, 00 Wir mssen wissen, wir werden wissen (We must know, we shall know) David Hilbert 1 Tail Inequalities 1.1 The Chernoff Bound
More informationProblem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)
Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.
More informationHash Tables. Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a
Hash Tables Given a set of possible keys U, such that U = u and a table of m entries, a Hash function h is a mapping from U to M = {1,..., m}. A collision occurs when two hashed elements have h(x) =h(y).
More informationThe Monte Carlo Method
The Monte Carlo Method Example: estimate the value of π. Choose X and Y independently and uniformly at random in [0, 1]. Let Pr(Z = 1) = π 4. 4E[Z] = π. { 1 if X Z = 2 + Y 2 1, 0 otherwise, Let Z 1,...,
More informationLecture 5. 1 Review (Pairwise Independence and Derandomization)
6.842 Randomness and Computation September 20, 2017 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Tom Kolokotrones 1 Review (Pairwise Independence and Derandomization) As we discussed last time, we can
More informationLecture Lecture 9 October 1, 2015
CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest
More informationvariance of independent variables: sum of variances So chebyshev predicts won t stray beyond stdev.
Announcements No class monday. Metric embedding seminar. Review expectation notion of high probability. Markov. Today: Book 4.1, 3.3, 4.2 Chebyshev. Remind variance, standard deviation. σ 2 = E[(X µ X
More informationAdvanced Algorithm Design: Hashing and Applications to Compact Data Representation
Advanced Algorithm Design: Hashing and Applications to Compact Data Representation Lectured by Prof. Moses Chariar Transcribed by John McSpedon Feb th, 20 Cucoo Hashing Recall from last lecture the dictionary
More informationLecture 23: Alternation vs. Counting
CS 710: Complexity Theory 4/13/010 Lecture 3: Alternation vs. Counting Instructor: Dieter van Melkebeek Scribe: Jeff Kinne & Mushfeq Khan We introduced counting complexity classes in the previous lecture
More informationTwelfth Problem Assignment
EECS 401 Not Graded PROBLEM 1 Let X 1, X 2,... be a sequence of independent random variables that are uniformly distributed between 0 and 1. Consider a sequence defined by (a) Y n = max(x 1, X 2,..., X
More informationLecture Examples of problems which have randomized algorithms
6.841 Advanced Complexity Theory March 9, 2009 Lecture 10 Lecturer: Madhu Sudan Scribe: Asilata Bapat Meeting to talk about final projects on Wednesday, 11 March 2009, from 5pm to 7pm. Location: TBA. Includes
More informationLecture 13 March 7, 2017
CS 224: Advanced Algorithms Spring 2017 Prof. Jelani Nelson Lecture 13 March 7, 2017 Scribe: Hongyao Ma Today PTAS/FPTAS/FPRAS examples PTAS: knapsack FPTAS: knapsack FPRAS: DNF counting Approximation
More informationSparser Johnson-Lindenstrauss Transforms
Sparser Johnson-Lindenstrauss Transforms Jelani Nelson Princeton February 16, 212 joint work with Daniel Kane (Stanford) Random Projections x R d, d huge store y = Sx, where S is a k d matrix (compression)
More informationData Stream Methods. Graham Cormode S. Muthukrishnan
Data Stream Methods Graham Cormode graham@dimacs.rutgers.edu S. Muthukrishnan muthu@cs.rutgers.edu Plan of attack Frequent Items / Heavy Hitters Counting Distinct Elements Clustering items in Streams Motivating
More information1 Randomized Computation
CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at
More informationKousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13
Discrete Mathematics & Mathematical Reasoning Chapter 7 (continued): Markov and Chebyshev s Inequalities; and Examples in probability: the birthday problem Kousha Etessami U. of Edinburgh, UK Kousha Etessami
More informationProblem Set 2. Assigned: Mon. November. 23, 2015
Pseudorandomness Prof. Salil Vadhan Problem Set 2 Assigned: Mon. November. 23, 2015 Chi-Ning Chou Index Problem Progress 1 SchwartzZippel lemma 1/1 2 Robustness of the model 1/1 3 Zero error versus 1-sided
More informationU.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 1/29/2002. Notes for Lecture 3
U.C. Bereley CS278: Computational Complexity Handout N3 Professor Luca Trevisan 1/29/2002 Notes for Lecture 3 In this lecture we will define the probabilistic complexity classes BPP, RP, ZPP and we will
More information11.1 Set Cover ILP formulation of set cover Deterministic rounding
CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use
More informationWith high probability
With high probability So far we have been mainly concerned with expected behaviour: expected running times, expected competitive ratio s. But it would often be much more interesting if we would be able
More informationNotes on Discrete Probability
Columbia University Handout 3 W4231: Analysis of Algorithms September 21, 1999 Professor Luca Trevisan Notes on Discrete Probability The following notes cover, mostly without proofs, the basic notions
More informationLecture 4: Two-point Sampling, Coupon Collector s problem
Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms
More informationAn Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems
An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems Arnab Bhattacharyya, Palash Dey, and David P. Woodruff Indian Institute of Science, Bangalore {arnabb,palash}@csa.iisc.ernet.in
More informationApproximate Counting and Markov Chain Monte Carlo
Approximate Counting and Markov Chain Monte Carlo A Randomized Approach Arindam Pal Department of Computer Science and Engineering Indian Institute of Technology Delhi March 18, 2011 April 8, 2011 Arindam
More informationThe Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan
The Communication Complexity of Correlation Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan Transmitting Correlated Variables (X, Y) pair of correlated random variables Transmitting
More informationBig Data. Big data arises in many forms: Common themes:
Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity
More informationSparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More informationCS 580: Algorithm Design and Analysis
CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Announcements: Homework 6 deadline extended to April 24 th at 11:59 PM Course Evaluation Survey: Live until 4/29/2018
More information25.2 Last Time: Matrix Multiplication in Streaming Model
EE 381V: Large Scale Learning Fall 01 Lecture 5 April 18 Lecturer: Caramanis & Sanghavi Scribe: Kai-Yang Chiang 5.1 Review of Streaming Model Streaming model is a new model for presenting massive data.
More informationFrequency Estimators
Frequency Estimators Outline for Today Randomized Data Structures Our next approach to improving performance. Count-Min Sketches A simple and powerful data structure for estimating frequencies. Count Sketches
More informationRandomized Complexity Classes; RP
Randomized Complexity Classes; RP Let N be a polynomial-time precise NTM that runs in time p(n) and has 2 nondeterministic choices at each step. N is a polynomial Monte Carlo Turing machine for a language
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationLecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing
Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of
More informationProbability Background
Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second
More informationIntroduction to discrete probability. The rules Sample space (finite except for one example)
Algorithms lecture notes 1 Introduction to discrete probability The rules Sample space (finite except for one example) say Ω. P (Ω) = 1, P ( ) = 0. If the items in the sample space are {x 1,..., x n }
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More information18.175: Lecture 8 Weak laws and moment-generating/characteristic functions
18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 18.175 Lecture 8 1 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach
More informationLecture 5: Probabilistic tools and Applications II
T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More information1 Approximate Counting by Random Sampling
COMP8601: Advanced Topics in Theoretical Computer Science Lecture 5: More Measure Concentration: Counting DNF Satisfying Assignments, Hoeffding s Inequality Lecturer: Hubert Chan Date: 19 Sep 2013 These
More informationCS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014
CS 598CSC: Algorithms for Big Data Lecture date: Sept 11, 2014 Instructor: Chandra Cheuri Scribe: Chandra Cheuri The Misra-Greis deterministic counting guarantees that all items with frequency > F 1 /
More informationCOS 341: Discrete Mathematics
COS 341: Discrete Mathematics Final Exam Fall 2006 Print your name General directions: This exam is due on Monday, January 22 at 4:30pm. Late exams will not be accepted. Exams must be submitted in hard
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 2 Luca Trevisan August 29, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analysis Handout Luca Trevisan August 9, 07 Scribe: Mahshid Montazer Lecture In this lecture, we study the Max Cut problem in random graphs. We compute the probable
More informationCommon-Knowledge / Cheat Sheet
CSE 521: Design and Analysis of Algorithms I Fall 2018 Common-Knowledge / Cheat Sheet 1 Randomized Algorithm Expectation: For a random variable X with domain, the discrete set S, E [X] = s S P [X = s]
More informationExpectation of geometric distribution. Variance and Standard Deviation. Variance: Examples
Expectation of geometric distribution Variance and Standard Deviation What is the probability that X is finite? Can now compute E(X): Σ k=f X (k) = Σ k=( p) k p = pσ j=0( p) j = p ( p) = E(X) = Σ k=k (
More informationAdvanced topic: Space complexity
Advanced topic: Space complexity CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2016 1/28 Review: time complexity We have looked at how long it takes to
More informationCS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32
CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most
More informationPRAMs. M 1 M 2 M p. globaler Speicher
PRAMs A PRAM (parallel random access machine) consists of p many identical processors M,..., M p (RAMs). Processors can read from/write to a shared (global) memory. Processors work synchronously. M M 2
More informationImpagliazzo s Hardcore Lemma
Average Case Complexity February 8, 011 Impagliazzo s Hardcore Lemma ofessor: Valentine Kabanets Scribe: Hoda Akbari 1 Average-Case Hard Boolean Functions w.r.t. Circuits In this lecture, we are going
More informationLecture 24: Approximate Counting
CS 710: Complexity Theory 12/1/2011 Lecture 24: Approximate Counting Instructor: Dieter van Melkebeek Scribe: David Guild and Gautam Prakriya Last time we introduced counting problems and defined the class
More informationRandomized Algorithms Multiple Choice Test
4435 Randomized Algorithms Multiple Choice Test Sample test: only 8 questions 24 minutes (Real test has 30 questions 90 minutes) Årskort Name Each of the following 8 questions has 4 possible answers of
More informationLecture 6. Today we shall use graph entropy to improve the obvious lower bound on good hash functions.
CSE533: Information Theory in Computer Science September 8, 010 Lecturer: Anup Rao Lecture 6 Scribe: Lukas Svec 1 A lower bound for perfect hash functions Today we shall use graph entropy to improve the
More informationIn a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers.
Suppose you are a content delivery network. In a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers. How to distribute requests to balance the
More informationRandomized Algorithms. Zhou Jun
Randomized Algorithms Zhou Jun 1 Content 13.1 Contention Resolution 13.2 Global Minimum Cut 13.3 *Random Variables and Expectation 13.4 Randomized Approximation Algorithm for MAX 3- SAT 13.6 Hashing 13.7
More informationLecture Lecture 25 November 25, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture Lecture 25 November 25, 2014 Prof. Jelani Nelson Scribe: Keno Fischer 1 Today Finish faster exponential time algorithms (Inclusion-Exclusion/Zeta Transform,
More informationThe diameter of a random Cayley graph of Z q
The diameter of a random Cayley graph of Z q Gideon Amir Ori Gurel-Gurevich September 4, 009 Abstract Consider the Cayley graph of the cyclic group of prime order q with k uniformly chosen generators.
More informationNon-Interactive Zero Knowledge (II)
Non-Interactive Zero Knowledge (II) CS 601.442/642 Modern Cryptography Fall 2017 S 601.442/642 Modern CryptographyNon-Interactive Zero Knowledge (II) Fall 2017 1 / 18 NIZKs for NP: Roadmap Last-time: Transformation
More informationSolutions to Problem Set 4
UC Berkeley, CS 174: Combinatorics and Discrete Probability (Fall 010 Solutions to Problem Set 4 1. (MU 5.4 In a lecture hall containing 100 people, you consider whether or not there are three people in
More information