Application: Bucket Sort
|
|
- Grant Hubbard
- 6 years ago
- Views:
Transcription
1 Application: Bucket Sort Bucket sort breaks the log) lower bound for standard comparison-based sorting, under certain assumptions on the input We want to sort a set of =2 integers chosen I+U@R from the range [0,2 ), where Using Bucket sort, we can sort the numbers in expected time ) Expectation is over the choice of the random input, Bucket sort is a deterministic algorithm MAT RandAl, Spring Feb Bucket sort works in two stages First we place the elements into buckets The th bucket holds all elements whose first binary digits correspond to the number E.g., if =2 bucket 3 contains all elements whose first 10 binary digits are When the elements of the th bucket all come before those in the th bucket in the sorted order Assuming that each element can be placed in the appropriate bucket in (1) time, this stage requires only () time MAT RandAl, Spring Feb
2 Because the elements to be sorted are chosen uniformly, the number of elements that land in a specific bucket follows a binomial distribution,1 ) Buckets can be implemented using linked lists In the second stage, each bucket is sorted using any standard quadratic time algorithm (e.g., Bubblesort or Insertion sort) Concatenating the sorted lists from each bucket in order gives the sorted order for the elements It remains to show that the expected time spent in the second stage is only () MAT RandAl, Spring Feb The result relies on our assumption regarding the input distribution. Under the uniform distribution, Bucket sort falls naturally into the balls and bins model: the elements are balls, buckets are bins, and each ball falls uniformly at random into a bin Let be the number of elements that land in the th bucket The time to sort the th bucket is then at most for some constant MAT RandAl, Spring Feb
3 The expected time spent sorting is at most The second equality follows from symmetry: is the same for all buckets Since,1 ), using earlier results yields 1) = +1=2 1 <2 Hence the total expected time spent in the second stage is at most, so Bucket sort runs in expected linear time MAT RandAl, Spring Feb The Poisson Distribution We now consider the probability that a given bin is empty in balls and bins model as well as the expected number of empty bins For the first bin to be empty, it must be missed by all balls Since each ball hits the first bin with probability, the probability the first bin remains empty is 1 MAT RandAl, Spring Feb
4 Symmetry: the probability is the same for all bins If is a RV that is 1 when the th bin is empty and 0 otherwise, then = Let represent the number of empty bins Then, by the linearity of expectations, = 1 Thus, the expected fraction of empty bins is approximately This approximation is very good even for moderately size values of and MAT RandAl, Spring Feb Generalize to find the expected fraction of bins with balls for any constant The probability that a given bin has balls is 1 1 = 1 1)+1)! 1 When and are large compared to, the second factor on the RHS is approx., and the third factor is approx. MAT RandAl, Spring Feb
5 Hence the probability that a given bin has balls is approximately! and the expected number of bins with exactly balls is approximately Definition 5.1: A discrete Poisson random variable with parameter is given by the following probability distribution on = 0, 1,2, : Pr =! MAT RandAl, Spring Feb The expectation of this random variable is : ]= Pr =! 1)!! Because probabilities sum to 1 MAT RandAl, Spring Feb
6 In the context of throwing balls into bins, the distribution of the number of balls in a bin is approximately Poisson with, which is exactly the average number of balls per bin, as one might expect Lemma 5.2: The sum of a finite number of independent Poisson random variables is a Poisson random variable. MAT RandAl, Spring Feb Lemma 5.3: The MGF of a Poisson RV with parameter, is. Proof: For any, =! )!. MAT RandAl, Spring Feb
7 Differentiating yields: + 1) Setting =0gives MAT RandAl, Spring Feb Given two independent Poisson RVs and with means and, apply Theorem 4.3 to prove ) This is the MGF of a Poisson RV with mean By Theorem 4.2, the MGF uniquely defines the distribution, and hence the sum is a Poisson RV with mean MAT RandAl, Spring Feb
8 Theorem 5.4: Let be a Poisson RV with parameter. 1. If, then Pr ; 2. If, then Pr. Proof: For any >0and, Pr = Pr. MAT RandAl, Spring Feb Plugging in the expression for the MGF of the Poisson distribution, we have Pr. Choosing = ln >0gives Pr = The proof of 2 is similar. MAT RandAl, Spring Feb
9 Limit of the Binomial Distribution The Poisson distribution is the limit distribution of the binomial distribution with parameters and, when is large and is small Theorem 5.5: Let ), where is a function of and lim is a constant that is independent of. Then, for any fixed, lim Pr =!. MAT RandAl, Spring Feb This theorem directly applies to the balls-andbins scenario Consider the situation where there are balls and bins, where is a function of and lim Let be the number of balls in a specific bin Then, 1/) Theorem 5.5 thus applies and says that lim Pr =.! matching the earlier approximation MAT RandAl, Spring Feb
10 Consider the # of spelling or grammatical mistakes in a book Model such mistakes s.t. each word is likely to have an error with some very small probability The # of errors is a binomial RV with large and small and can be treated as a Poisson RV As another example, consider the # of chocolate chips inside a chocolate chip cookie Model by splitting the volume of the cookie into a large # of small disjoint compartments, so that a chip lands in each with some probability Now the # of chips in a cookie roughly follows a Poisson distribution MAT RandAl, Spring Feb The Poisson Approximation The main difficulty in balls-and-bins problems is handling that dependencies naturally arise If, e.g., bin 1 is empty, then it is less likely that bin 2 is empty because the balls must now be distributed among 1bins More concretely: if we know the number of balls in the first 1bins, then the number of balls in the last bin is completely determined The loads of the bins are not independent MAT RandAl, Spring Feb
11 The distribution of the number of balls in a given bin is approximately Poisson with mean We would like to say that the joint distribution of the number of balls in all the bins is well approximated by assuming the load at each bin is an independent Poisson RV with mean This would allow us to treat bin loads as independent RVs We show here that we can do this when we are concerned with sufficiently rare events MAT RandAl, Spring Feb Suppose that balls are thrown into bins I+U@R, and let ) be the number of balls in the th bin, Let ),, ) be independent Poisson RVs with mean In the first case, there are balls in total In the second case we know only that is the expected number of balls in all of the bins If, using the Poisson distribution, we end up with balls, then we do indeed have that the distribution is the same as if we threw balls into bins randomly MAT RandAl, Spring Feb
12 Theorem 5.6: The distribution ),, ) ) conditioned on is the same as ) ),,, regardless of the value of. Proof: When throwing balls into bins, the probability that ),, ) =,, for any,, satisfying is given by ; ;! =! MAT RandAl, Spring Feb Now, for any,, with =, consider the probability that ) ),, =,, Conditioned on ),, ) ) Pr ),, ) satisfying =,, ) = Pr ) Pr ) MAT RandAl, Spring Feb
13 The probability that is!, since the are independent Poisson RVs with mean. Also, by Lemma 5.2, the sum of the is itself a Poisson RV with mean. Hence we have: =!!! =! proving the theorem. MAT RandAl, Spring Feb With this we can prove strong results about any function on the loads of the bins Theorem 5.7: Let,, )be a nonnegative function. Then ) ),,,,. This holds for any nonnegative function on the number of balls in the bins In particular, if is the indicator that is 1 if some event occurs and 0 otherwise, then the theorem gives bounds on the probability of events MAT RandAl, Spring Feb
14 We call the scenario in which the number of balls in the bins are taken to be independent Poisson RVs with mean the Poisson case The scenario where balls are thrown into bins is the exact case Corollary 5.9: Any event that takes place with probability in the Poisson case takes place with probability at most the exact case. Proof: Let be the indicator function of the event. In this case, ] is just the probability that the event occurs, and the result follows immediately from Theorem 5.7. MAT RandAl, Spring Feb Any event that happens with small probability in the Poisson case also happens with small probability in the exact case In the analysis of algorithms we often want to show that certain events happen with small probability This result says that we can utilize an analysis of the Poisson approximation to obtain a bound for the exact case The Poisson approximation is easier to analyze because the numbers of balls in each bin are independent random variables MAT RandAl, Spring Feb
15 We can actually do even a little bit better in many natural cases Theorem 5.10: Let,, )be a nonnegative ) ) function such that,, is either monotonically increasing or monotonically decreasing in. Then ) ),,,, The following corollary is immediate: MAT RandAl, Spring Feb Corollary 5.11: Let be an event whose probability is either monotonically increasing or monotonically decreasing in the number of balls. If has probability in the Poisson case, then has probability at most in the exact case. Consider again the maximum load problem for the case A union bound argument shows that the maximum load is at most 3lnlnlnw.h.p. Using the Poisson approximation, we prove the following almost-matching lower bound on the maximum load MAT RandAl, Spring Feb
16 Lemma 5.12: When balls are thrown into bins, the maximum load is at least = with probability at least for sufficiently large. Proof: In the Poisson case, the probability that bin 1 has load at least is at least!, which is the probability it has load exactly, Pr =!. In the Poisson case, all bins are independent, so the probability that no bin has load at least is at most 1!! MAT RandAl, Spring Feb We need to choose so that!, for then (by Thm 5.7) we will have that the probability that the maximum load is not at least in the exact case is at most < 1. This will give the lemma Because the maximum load is clearly monotonically increasing in the number of balls, we could also apply the slightly better Thm 5.10, but this would not affect the argument substantially It therefore suffices to show that ln, or equivalently that ln! < ln ln ln ln From Lemma 5.8 (not shown), it follows that: MAT RandAl, Spring Feb
17 when (and hence = ln/lnln) are suitably large. Hence, for suitably large, ln ln+ln = ln ln ln ln ln ln ln + ln ln ln ln + ln ln ln ln ln ln ln lnln ln ln ln ln. The last two inequalities use the fact that ln ln = (ln / ln ln ). MAT RandAl, Spring Feb Application: Hashing Consider a password checker, which prevents people from using easily cracked passwords by keeping a dictionary of unacceptable ones The application would check if the requested password is unacceptable A checker could store the unacceptable passwords alphabetically and do a binary search on the dictionary to check a proposed password A binary search would require (log) time for words MAT RandAl, Spring Feb
18 Chain Hashing Another possibility is to place the words into bins and search the appropriate bin for the word Words in a bin are represented by a linked list The placement of words into bins is accomplished by using a hash function A hash function from a universe into a range [0, 1] can be thought of as a way of placing items from the universe into bins MAT RandAl, Spring Feb Here the universe consist of possible password strings The collection of bins is called a hash table This approach to hashing is called chain hashing Using a hash table turns the dictionary problem into a balls-and-bins problem If our dictionary of unacceptable passwords consists of words and the range of the hash function is [0, 1], then we can model the distribution of words in bins with the same distribution as balls placed randomly in bins MAT RandAl, Spring Feb
19 It is a strong assumption to presume that a hash function maps words into bins in a fashion that appears random, so that the location of each word is independent and identically distributed (i.i.d) We assume that for each, the probability that )=is 1/ (for 1) and that the values of ) for each are independent of each other This does not mean that every evaluation of ) yields a different random answer The value of ) is fixed for all time; it is just equally likely to take on any value in the range MAT RandAl, Spring Feb Consider the search time when there are bins and words To search for an item, we first hash it to find the bin that it lies in and then search sequentially through the linked list for it If we search for a word that is not in our dictionary, the expected number of words in the bin the word hashes to is If we search for a word that is in our dictionary, the expected number of other words in that word's bin is 1)/, so the expected number of words in the bin is 1 + ( 1)/ MAT RandAl, Spring Feb
20 If we choose = bins for our hash table, then the expected number of words we must search through in a bin is constant If the hashing takes constant time, then the total expected time for the search is constant The maximum time to search for a word, however, is proportional to the maximum number of words in a bin We have shown that when this maximum load is lnlnln with probability close to 1, and hence w.h.p. this is the maximum search time in such a hash table MAT RandAl, Spring Feb While this is still faster than the required time for standard binary search, it is much slower than the average, which can be a drawback for many applications Another drawback of chain hashing can be wasted space If we use bins for items, several of the bins will be empty, potentially leading to wasted space The space wasted can be traded off against the search time by making the average number of words per bin larger than 1 MAT RandAl, Spring Feb
21 Hashing: Bit Strings Now save space instead of time Consider, again, the problem of keeping a dictionary of unsuitable passwords Assume that a password is restricted to be eight ASCII characters, which requires 64 bits (8 bytes) to represent Suppose we use a hash function to map each word into a 32-bit string This string is a short fingerprint for the word MAT RandAl, Spring Feb We keep the fingerprints in a sorted list To check if a proposed password is unacceptable, we calculate its fingerprint and look for it on the list, say by a binary search If the fingerprint is on the list, we declare the password unacceptable In this case, our password checker may not give the correct answer! It is possible that an acceptable password is rejected because its fingerprint matches the fingerprint of an unacceptable password MAT RandAl, Spring Feb
22 Hence there is some chance that hashing will yield a false positive: it may falsely declare a match when there is not an actual match The fingerprints do not uniquely identify the associated word This is the only type of mistake this algorithm can make Allowing false positives means our algorithm is overly conservative, which is probably acceptable Letting easily cracked passwords through, however, would probably not be acceptable MAT RandAl, Spring Feb Place in a more general context: describe as an approximate set membership problem Suppose we have a set =,, of elements from a large universe We want to be able to quickly answer queries of the form "Is an element of?" We want also like the representation to take as little space as possible To save space, we are willing to allow occasional mistakes in the form of false positives Here the unallowable passwords correspond to our set MAT RandAl, Spring Feb
23 How large should the range of the hash function used to create the fingerprints be? How many bits should be in a fingerprint? Obviously, we want to choose the number of bits that gives an acceptable probability for a false positive match The probability that an acceptable password has a fingerprint that is different from any specific unallowable password in is 12 If the set has size, then the probability of a false positive for an acceptable password is MAT RandAl, Spring Feb If we want this probability of a false positive to be less than a constant, we need which implies that log ln 1 (1 ) I.e., we need lg bits If we, however, use =2lgbits, then the probability of a false positive falls to 1 < 1 If we have 2 = 65,536 words, then using 32 bits yields a FP Pr of just less than 1/65,536 MAT RandAl, Spring Feb
24 5.6. Random Graphs There are many NP-hard computational problems defined on graphs: Hamiltonian cycle, independent set, vertex cover, Are these problems hard for most inputs or just for a relatively small fraction of all graphs? Random graph models provide a probabilistic setting for studying such questions Most of the work on random graphs has focused on two closely related models, and MAT RandAl, Spring Feb Random Graph Models In we consider all undirected graphs on distinct vertices,, A graph with a given set of edges has probability One way to generate a random graph in is to consider each of the possible edges in some order and then independently add each edge to the graph with probability MAT RandAl, Spring Feb
25 The expected number of edges in the graph is therefore, and each vertex has expected degree 1) In the model, we consider all undirected graphs on vertices with exactly edges There are with equal probability possible graphs, each selected One way to generate a graph uniformly from the graphs in is to start with a graph with no edges MAT RandAl, Spring Feb Choose one of the possible edges uniformly at random and add it to the edges in the graph Now choose one of the remaining 1 possible edges I+U@R and add it to the graph Continue similarly until there are edges The and models are related: When, the number of edges in a random graph in is concentrated around, and conditioned on a graph from having edges, that graph is uniform over all the graphs from MAT RandAl, Spring Feb
26 There are many similarities between random graphs and the balls-and-bins models Throwing edges into the graph as in the model is like throwing balls into bins However, since each edge has two endpoints, each edge is like throwing two balls at once into two different bins The pairing adds a rich structure that does not exist in the balls-and-bins model Yet we can often utilize the relation between the two models to simplify analysis in random graph models MAT RandAl, Spring Feb
:s ej2mttlm-(iii+j2mlnm )(J21nm/m-lnm/m)
BALLS, BINS, AND RANDOM GRAPHS We use the Chernoff bound for the Poisson distribution (Theorem 5.4) to bound this probability, writing the bound as Pr(X 2: x) :s ex-ill-x In(x/m). For x = m + J2m In m,
More informationThe first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have
The first bound is the strongest, the other two bounds are often easier to state and compute Proof: Applying Markov's inequality, for any >0 we have Pr (1 + ) = Pr For any >0, we can set = ln 1+ (4.4.1):
More informationCS5314 Randomized Algorithms. Lecture 15: Balls, Bins, Random Graphs (Hashing)
CS5314 Randomized Algorithms Lecture 15: Balls, Bins, Random Graphs (Hashing) 1 Objectives Study various hashing schemes Apply balls-and-bins model to analyze their performances 2 Chain Hashing Suppose
More informationRandomized Load Balancing:The Power of 2 Choices
Randomized Load Balancing: The Power of 2 Choices June 3, 2010 Balls and Bins Problem We have m balls that are thrown into n bins, the location of each ball chosen independently and uniformly at random
More informationAlgorithms for Data Science
Algorithms for Data Science CSOR W4246 Eleni Drinea Computer Science Department Columbia University Tuesday, December 1, 2015 Outline 1 Recap Balls and bins 2 On randomized algorithms 3 Saving space: hashing-based
More informationBalls & Bins. Balls into Bins. Revisit Birthday Paradox. Load. SCONE Lab. Put m balls into n bins uniformly at random
Balls & Bins Put m balls into n bins uniformly at random Seoul National University 1 2 3 n Balls into Bins Name: Chong kwon Kim Same (or similar) problems Birthday paradox Hash table Coupon collection
More informationProblem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)
Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 24 Section #8 Hashing, Skip Lists 3/20/7 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look at
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationLecture: Analysis of Algorithms (CS )
Lecture: Analysis of Algorithms (CS483-001) Amarda Shehu Spring 2017 1 Outline of Today s Class 2 Choosing Hash Functions Universal Universality Theorem Constructing a Set of Universal Hash Functions Perfect
More informationIntroduction to Hash Tables
Introduction to Hash Tables Hash Functions A hash table represents a simple but efficient way of storing, finding, and removing elements. In general, a hash table is represented by an array of cells. In
More informationLecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing
Bloom filters and Hashing 1 Introduction The Bloom filter, conceived by Burton H. Bloom in 1970, is a space-efficient probabilistic data structure that is used to test whether an element is a member of
More informationCS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 32
CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 32 CS 473: Algorithms, Spring 2018 Universal Hashing Lecture 10 Feb 15, 2018 Most
More information1 Maintaining a Dictionary
15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition
More informationLecture 04: Balls and Bins: Birthday Paradox. Birthday Paradox
Lecture 04: Balls and Bins: Overview In today s lecture we will start our study of balls-and-bins problems We shall consider a fundamental problem known as the Recall: Inequalities I Lemma Before we begin,
More informationAs mentioned, we will relax the conditions of our dictionary data structure. The relaxations we
CSE 203A: Advanced Algorithms Prof. Daniel Kane Lecture : Dictionary Data Structures and Load Balancing Lecture Date: 10/27 P Chitimireddi Recap This lecture continues the discussion of dictionary data
More informationCSE 190, Great ideas in algorithms: Pairwise independent hash functions
CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required
More informationcompare to comparison and pointer based sorting, binary trees
Admin Hashing Dictionaries Model Operations. makeset, insert, delete, find keys are integers in M = {1,..., m} (so assume machine word size, or unit time, is log m) can store in array of size M using power:
More information12 Hash Tables Introduction Chaining. Lecture 12: Hash Tables [Fa 10]
Calvin: There! I finished our secret code! Hobbes: Let s see. Calvin: I assigned each letter a totally random number, so the code will be hard to crack. For letter A, you write 3,004,572,688. B is 28,731,569½.
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationMotivation. Dictionaries. Direct Addressing. CSE 680 Prof. Roger Crawfis
Motivation Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis Arrays provide an indirect way to access a set. Many times we need an association between two sets, or a set of keys and associated
More informationSolutions to Problem Set 4
UC Berkeley, CS 174: Combinatorics and Discrete Probability (Fall 010 Solutions to Problem Set 4 1. (MU 5.4 In a lecture hall containing 100 people, you consider whether or not there are three people in
More informationLecture 5: Hashing. David Woodruff Carnegie Mellon University
Lecture 5: Hashing David Woodruff Carnegie Mellon University Hashing Universal hashing Perfect hashing Maintaining a Dictionary Let U be a universe of keys U could be all strings of ASCII characters of
More information6.1 Occupancy Problem
15-859(M): Randomized Algorithms Lecturer: Anupam Gupta Topic: Occupancy Problems and Hashing Date: Sep 9 Scribe: Runting Shi 6.1 Occupancy Problem Bins and Balls Throw n balls into n bins at random. 1.
More informationIntroduction to Randomized Algorithms III
Introduction to Randomized Algorithms III Joaquim Madeira Version 0.1 November 2017 U. Aveiro, November 2017 1 Overview Probabilistic counters Counting with probability 1 / 2 Counting with probability
More information2. This exam consists of 15 questions. The rst nine questions are multiple choice Q10 requires two
CS{74 Combinatorics & Discrete Probability, Fall 96 Final Examination 2:30{3:30pm, 7 December Read these instructions carefully. This is a closed book exam. Calculators are permitted. 2. This exam consists
More informationIn a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers.
Suppose you are a content delivery network. In a five-minute period, you get a certain number m of requests. Each needs to be served from one of your n servers. How to distribute requests to balance the
More informationHashing. Why Hashing? Applications of Hashing
12 Hashing Why Hashing? Hashing A Search algorithm is fast enough if its time performance is O(log 2 n) For 1 5 elements, it requires approx 17 operations But, such speed may not be applicable in real-world
More informationPart 1: Hashing and Its Many Applications
1 Part 1: Hashing and Its Many Applications Sid C-K Chau Chi-Kin.Chau@cl.cam.ac.u http://www.cl.cam.ac.u/~cc25/teaching Why Randomized Algorithms? 2 Randomized Algorithms are algorithms that mae random
More informationRandomized algorithm
Tutorial 4 Joyce 2009-11-24 Outline Solution to Midterm Question 1 Question 2 Question 1 Question 2 Question 3 Question 4 Question 5 Solution to Midterm Solution to Midterm Solution to Midterm Question
More informationHash tables. Hash tables
Basic Probability Theory Two events A, B are independent if Conditional probability: Pr[A B] = Pr[A] Pr[B] Pr[A B] = Pr[A B] Pr[B] The expectation of a (discrete) random variable X is E[X ] = k k Pr[X
More informationHash tables. Hash tables
Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key
More informationTheorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )
Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr Pr = Pr Pr Pr() Pr Pr. We are given three coins and are told that two of the coins are fair and the
More informationShow that the following problems are NP-complete
Show that the following problems are NP-complete April 7, 2018 Below is a list of 30 exercises in which you are asked to prove that some problem is NP-complete. The goal is to better understand the theory
More informationHash tables. Hash tables
Dictionary Definition A dictionary is a data-structure that stores a set of elements where each element has a unique key, and supports the following operations: Search(S, k) Return the element whose key
More informationBasics of hashing: k-independence and applications
Basics of hashing: k-independence and applications Rasmus Pagh Supported by: 1 Agenda Load balancing using hashing! - Analysis using bounded independence! Implementation of small independence! Case studies:!
More informationPOISSON PROCESSES 1. THE LAW OF SMALL NUMBERS
POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted
More informationTheoretical Cryptography, Lecture 10
Theoretical Cryptography, Lecture 0 Instructor: Manuel Blum Scribe: Ryan Williams Feb 20, 2006 Introduction Today we will look at: The String Equality problem, revisited What does a random permutation
More informationCS 320, Fall Dr. Geri Georg, Instructor 320 NP 1
NP CS 320, Fall 2017 Dr. Geri Georg, Instructor georg@colostate.edu 320 NP 1 NP Complete A class of problems where: No polynomial time algorithm has been discovered No proof that one doesn t exist 320
More information1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).
CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 8 Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According to clinical trials,
More informationModule 1: Analyzing the Efficiency of Algorithms
Module 1: Analyzing the Efficiency of Algorithms Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 39217 E-mail: natarajan.meghanathan@jsums.edu What is an Algorithm?
More informationCSCB63 Winter Week 11 Bloom Filters. Anna Bretscher. March 30, / 13
CSCB63 Winter 2019 Week 11 Bloom Filters Anna Bretscher March 30, 2019 1 / 13 Today Bloom Filters Definition Expected Complexity Applications 2 / 13 Bloom Filters (Specification) A bloom filter is a probabilistic
More informationCache-Oblivious Hashing
Cache-Oblivious Hashing Zhewei Wei Hong Kong University of Science & Technology Joint work with Rasmus Pagh, Ke Yi and Qin Zhang Dictionary Problem Store a subset S of the Universe U. Lookup: Does x belong
More informationCS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =
CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14 1 Probability First, recall a couple useful facts from last time about probability: Linearity of expectation: E(aX + by ) = ae(x)
More information15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018
15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018 Today we ll talk about a topic that is both very old (as far as computer science
More informationMAT 243 Test 2 SOLUTIONS, FORM A
MAT Test SOLUTIONS, FORM A 1. [10 points] Give a recursive definition for the set of all ordered pairs of integers (x, y) such that x < y. Solution: Let S be the set described above. Note that if (x, y)
More informationLecture 4: Two-point Sampling, Coupon Collector s problem
Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms
More informationPCP Theorem and Hardness of Approximation
PCP Theorem and Hardness of Approximation An Introduction Lee Carraher and Ryan McGovern Department of Computer Science University of Cincinnati October 27, 2003 Introduction Assuming NP P, there are many
More informationAdvanced Implementations of Tables: Balanced Search Trees and Hashing
Advanced Implementations of Tables: Balanced Search Trees and Hashing Balanced Search Trees Binary search tree operations such as insert, delete, retrieve, etc. depend on the length of the path to the
More informationPOISSON RANDOM VARIABLES
POISSON RANDOM VARIABLES Suppose a random phenomenon occurs with a mean rate of occurrences or happenings per unit of time or length or area or volume, etc. Note: >. Eamples: 1. Cars passing through an
More information6.854 Advanced Algorithms
6.854 Advanced Algorithms Homework Solutions Hashing Bashing. Solution:. O(log U ) for the first level and for each of the O(n) second level functions, giving a total of O(n log U ) 2. Suppose we are using
More informationIntroduction to Algorithms March 10, 2010 Massachusetts Institute of Technology Spring 2010 Professors Piotr Indyk and David Karger Quiz 1
Introduction to Algorithms March 10, 2010 Massachusetts Institute of Technology 6.006 Spring 2010 Professors Piotr Indyk and David Karger Quiz 1 Quiz 1 Do not open this quiz booklet until directed to do
More informationNP, polynomial-time mapping reductions, and NP-completeness
NP, polynomial-time mapping reductions, and NP-completeness In the previous lecture we discussed deterministic time complexity, along with the time-hierarchy theorem, and introduced two complexity classes:
More informationAditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016
Lecture 1: Introduction and Review We begin with a short introduction to the course, and logistics. We then survey some basics about approximation algorithms and probability. We also introduce some of
More informationLecture 4 Thursday Sep 11, 2014
CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)
More information4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.
Math 10B with Professor Stankova Worksheet, Midterm #2; Wednesday, 3/21/2018 GSI name: Roy Zhao 1 Problems 1.1 Bayes Theorem 1. Suppose a test is 99% accurate and 1% of people have a disease. What is the
More information1 True/False. Math 10B with Professor Stankova Worksheet, Discussion #9; Thursday, 2/15/2018 GSI name: Roy Zhao
Math 10B with Professor Stankova Worksheet, Discussion #9; Thursday, 2/15/2018 GSI name: Roy Zhao 1 True/False 1. True False When we solve a problem one way, it is not useful to try to solve it in a second
More informationAssignment 4: Solutions
Math 340: Discrete Structures II Assignment 4: Solutions. Random Walks. Consider a random walk on an connected, non-bipartite, undirected graph G. Show that, in the long run, the walk will traverse each
More informationLecture 5: January 30
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationCSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Nov 30. Final In-Class Exam. ( 7:05 PM 8:20 PM : 75 Minutes )
CSE548, AMS542: Analysis of Algorithms, Fall 2016 Date: Nov 30 Final In-Class Exam ( 7:05 PM 8:20 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your
More informationInsert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park
1617 Preview Data Structure Review COSC COSC Data Structure Review Linked Lists Stacks Queues Linked Lists Singly Linked List Doubly Linked List Typical Functions s Hash Functions Collision Resolution
More informationCS 591, Lecture 7 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 7 Data Analytics: Theory and Applications Boston University Babis Tsourakakis February 13th, 2017 Bloom Filter Approximate membership problem Highly space-efficient randomized data structure
More informationCMPT307: Complexity Classes: P and N P Week 13-1
CMPT307: Complexity Classes: P and N P Week 13-1 Xian Qiu Simon Fraser University xianq@sfu.ca Strings and Languages an alphabet Σ is a finite set of symbols {0, 1}, {T, F}, {a, b,..., z}, N a string x
More information1 The Basic Counting Principles
1 The Basic Counting Principles The Multiplication Rule If an operation consists of k steps and the first step can be performed in n 1 ways, the second step can be performed in n ways [regardless of how
More information34.1 Polynomial time. Abstract problems
< Day Day Up > 34.1 Polynomial time We begin our study of NP-completeness by formalizing our notion of polynomial-time solvable problems. These problems are generally regarded as tractable, but for philosophical,
More informationProblem set 1, Real Analysis I, Spring, 2015.
Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n
More informationLecture 7: Fingerprinting. David Woodruff Carnegie Mellon University
Lecture 7: Fingerprinting David Woodruff Carnegie Mellon University How to Pick a Random Prime How to pick a random prime in the range {1, 2,, M}? How to pick a random integer X? Pick a uniformly random
More informationTheory of Computer Science
Theory of Computer Science E1. Complexity Theory: Motivation and Introduction Malte Helmert University of Basel May 18, 2016 Overview: Course contents of this course: logic How can knowledge be represented?
More information1 More finite deterministic automata
CS 125 Section #6 Finite automata October 18, 2016 1 More finite deterministic automata Exercise. Consider the following game with two players: Repeatedly flip a coin. On heads, player 1 gets a point.
More informationTrees/Intro to counting
Trees/Intro to counting Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ April 29, 2016 Equivalence between rooted and unrooted trees Goal
More informationCuckoo Hashing and Cuckoo Filters
Cuckoo Hashing and Cuckoo Filters Noah Fleming May 7, 208 Preliminaries A dictionary is an abstract data type that stores a collection of elements, located by their key. It supports operations: insert,
More informationBloom Filters, Minhashes, and Other Random Stuff
Bloom Filters, Minhashes, and Other Random Stuff Brian Brubach University of Maryland, College Park StringBio 2018, University of Central Florida What? Probabilistic Space-efficient Fast Not exact Why?
More informationCSCE 750 Final Exam Answer Key Wednesday December 7, 2005
CSCE 750 Final Exam Answer Key Wednesday December 7, 2005 Do all problems. Put your answers on blank paper or in a test booklet. There are 00 points total in the exam. You have 80 minutes. Please note
More informationComputational Complexity. IE 496 Lecture 6. Dr. Ted Ralphs
Computational Complexity IE 496 Lecture 6 Dr. Ted Ralphs IE496 Lecture 6 1 Reading for This Lecture N&W Sections I.5.1 and I.5.2 Wolsey Chapter 6 Kozen Lectures 21-25 IE496 Lecture 6 2 Introduction to
More informationLecture 4: Proof of Shannon s theorem and an explicit code
CSE 533: Error-Correcting Codes (Autumn 006 Lecture 4: Proof of Shannon s theorem and an explicit code October 11, 006 Lecturer: Venkatesan Guruswami Scribe: Atri Rudra 1 Overview Last lecture we stated
More informationTheory of Computer Science. Theory of Computer Science. E1.1 Motivation. E1.2 How to Measure Runtime? E1.3 Decision Problems. E1.
Theory of Computer Science May 18, 2016 E1. Complexity Theory: Motivation and Introduction Theory of Computer Science E1. Complexity Theory: Motivation and Introduction Malte Helmert University of Basel
More informationa zoo of (discrete) random variables
discrete uniform random variables A discrete random variable X equally liely to tae any (integer) value between integers a and b, inclusive, is uniform. Notation: X ~ Unif(a,b) a zoo of (discrete) random
More informationPROBLEMS OF MARRIAGE Eugene Mukhin
PROBLEMS OF MARRIAGE Eugene Mukhin 1. The best strategy to find the best spouse. A person A is looking for a spouse, so A starts dating. After A dates the person B, A decides whether s/he wants to marry
More informationFilters. Alden Walker Advisor: Miller Maley. May 23, 2007
Filters Alden Walker Advisor: Miller Maley May 23, 2007 Contents 1 Introduction 4 1.1 Motivation..................................... 4 1.2 Mathematical Abstraction............................ 5 1.2.1
More informationTopic 4 Randomized algorithms
CSE 103: Probability and statistics Winter 010 Topic 4 Randomized algorithms 4.1 Finding percentiles 4.1.1 The mean as a summary statistic Suppose UCSD tracks this year s graduating class in computer science
More informationCS 151 Complexity Theory Spring Solution Set 5
CS 151 Complexity Theory Spring 2017 Solution Set 5 Posted: May 17 Chris Umans 1. We are given a Boolean circuit C on n variables x 1, x 2,..., x n with m, and gates. Our 3-CNF formula will have m auxiliary
More informationRainbow Tables ENEE 457/CMSC 498E
Rainbow Tables ENEE 457/CMSC 498E How are Passwords Stored? Option 1: Store all passwords in a table in the clear. Problem: If Server is compromised then all passwords are leaked. Option 2: Store only
More informationP is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.
Complexity Theory Problems are divided into complexity classes. Informally: So far in this course, almost all algorithms had polynomial running time, i.e., on inputs of size n, worst-case running time
More informationData Compression Techniques
Data Compression Techniques Part 1: Entropy Coding Lecture 4: Asymmetric Numeral Systems Juha Kärkkäinen 08.11.2017 1 / 19 Asymmetric Numeral Systems Asymmetric numeral systems (ANS) is a recent entropy
More informationDominating Set. Chapter 7
Chapter 7 Dominating Set In this chapter we present another randomized algorithm that demonstrates the power of randomization to break symmetries. We study the problem of finding a small dominating set
More informationTHIS work is motivated by the goal of finding the capacity
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 8, AUGUST 2007 2693 Improved Lower Bounds for the Capacity of i.i.d. Deletion Duplication Channels Eleni Drinea, Member, IEEE, Michael Mitzenmacher,
More informationCS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)
CS5314 Randomized Algorithms Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify) 1 Introduce two topics: De-randomize by conditional expectation provides a deterministic way to construct
More informationEvery binary word is, almost, a shuffle of twin subsequences a theorem of Axenovich, Person and Puzynina
Every binary word is, almost, a shuffle of twin subsequences a theorem of Axenovich, Person and Puzynina Martin Klazar August 17, 2015 A twin in a word u = a 1 a 2... a n is a pair (u 1, u 2 ) of disjoint
More informationSection 1.1. Chapter 1. Quadratics. Parabolas. Example. Example. ( ) = ax 2 + bx + c -2-1
Chapter 1 Quadratic Functions and Factoring Section 1.1 Graph Quadratic Functions in Standard Form Quadratics The polynomial form of a quadratic function is: f x The graph of a quadratic function is a
More informationSeifert s RSA Fault Attack: Simplified Analysis and Generalizations
Seifert s RSA Fault Attack: Simplified Analysis and Generalizations James A. Muir School of Computer Science Carleton University jamuir@scs.carleton.ca 15 December 2005 21:11:36 EST Abstract Seifert recently
More informationFINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)
FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016) The final exam will be on Thursday, May 12, from 8:00 10:00 am, at our regular class location (CSI 2117). It will be closed-book and closed-notes, except
More informationSliding Windows with Limited Storage
Electronic Colloquium on Computational Complexity, Report No. 178 (2012) Sliding Windows with Limited Storage Paul Beame Computer Science and Engineering University of Washington Seattle, WA 98195-2350
More informationCS 124 Math Review Section January 29, 2018
CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to
More informationHAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS
HAMILTON CYCLES IN RANDOM REGULAR DIGRAPHS Colin Cooper School of Mathematical Sciences, Polytechnic of North London, London, U.K. and Alan Frieze and Michael Molloy Department of Mathematics, Carnegie-Mellon
More informationLecture 5: The Principle of Deferred Decisions. Chernoff Bounds
Randomized Algorithms Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized
More informationNew Attacks on the Concatenation and XOR Hash Combiners
New Attacks on the Concatenation and XOR Hash Combiners Itai Dinur Department of Computer Science, Ben-Gurion University, Israel Abstract. We study the security of the concatenation combiner H 1(M) H 2(M)
More informationHashing Data Structures. Ananda Gunawardena
Hashing 15-121 Data Structures Ananda Gunawardena Hashing Why do we need hashing? Many applications deal with lots of data Search engines and web pages There are myriad look ups. The look ups are time
More information5 ProbabilisticAnalysisandRandomized Algorithms
5 ProbabilisticAnalysisandRandomized Algorithms This chapter introduces probabilistic analysis and randomized algorithms. If you are unfamiliar with the basics of probability theory, you should read Appendix
More informationComputation Theory Finite Automata
Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program
More information1 Complex Networks - A Brief Overview
Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,
More information