Interval Selection in the streaming model

Size: px
Start display at page:

Download "Interval Selection in the streaming model"

Transcription

1 Interval Selection in the streaming model Pascal Bemmann Abstract In the interval selection problem we are given a set of intervals via a stream and want to nd the maximum set of pairwise independent intervals. Let α(i) denote the size of an optimal solution. We present the results of S. Cebello and P. Pérez-Antero for estimating α(i) in the streaming model where only one pass over the data is allowed, the endpoints of intervals lie within the range of {1,..., n} and the memory is constrained. For intervals of potentially dierent size we provide an algorithm that computes an estimate ˆα of α(i) such that 1 2 (1 ε)α(i) ˆα α(i) holds with probability at least 2/3. For same sized intervals we explain an algorithm that computes an estimate ˆα of α(i) for which 2 3 (1 ε)α(i) ˆα α(i) holds with probability at least 2/3. The required space is in polynomial order of ε 1 and log n. We also present approximation algorithms for the interval selection problem which use O(α(I)) space which are used in the mentioned estimates. 1 Introduction Basics and Denitions Intervals Sampling H-random samples A 2-approximation algorithm Estimating the size of an optimal solution Segments Algorithms in the Streaming Model Same-size intervals Largest independent set of same size-intervals Size of largest independent set of same size-intervals Conclusion and other results Introduction In this work we will present results developed by Cabello and Pérez-Lantero [1]. We consider problems in the streaming model. Usually huge data sets arrive sequentially. The task is to solve problems with limited memory. We will assume that we are not able to look at input items again unless we saved it in our memory. This can be also described as using only one pass over the input. Furthermore we assume that we observe an input stream that is too big to be stored as a whole in our memory. Pascal Bemmann Universität Paderborn, Warburger Str. 100, Paderborn, pbemmann@mail.upb.de 1

2 2 Pascal Bemmann Within this model we will analyze the interval selection problem. As an input we get intervals within a predetermined range. The task is to nd the biggest set of intervals that are pairwise disjoint while using the memory most eciently. Another problem that arises is to estimate only the size of an optimal solution without providing the actual set of intervals. Both of these problems will be analyzed in the following chapters. Note that the interval selection problem is a generalization of the distinct element problem. This fundamental problem deals with the task to compute the number of pairwise dierent elements of a data stream. If for all intervals of the interval selection problem both endpoints of each interval are equal, the task becomes to count the number of dierent points (elements). We will start by providing general denitions and tools that we use to approach the interval selection problem. After that we will give you the idea of how to design a 2-approximation algorithm in the mentioned setting. This algorithm will be used together with other general results to construct an algorithm to estimate the size of an optimal solution for the interval selection problem. At the end we will provide the general approach how we can improve the presented results if we assume that all inputs of our input are of the same size. 2 Basics and Denitions In this section we provide denitions and useful tools which we will use in later proofs. To shorten the notation we will use [n] to denote the set of integers {1,..., n}. Also we will assume for all later constructions that 0 < ε < 1/2 holds. Denition 1 (Interval selection problem). The interval selection problem is dened as follows: Given a set I of (input) intervals, the task is to nd the largest set of intervals that are pairwise disjoint. These intervals are also called independent. α(i) denotes the size of an optimal solution for this problem. Another problem that arises from this denition is the problem to estimate α(i) without outputting an independent subset of the input intervals. We will consider both of these problems in this work. 2.1 Intervals We consider the input intervals of the interval selection problem to be closed. Intervals constructed during the algorithm in Section 4 will be called windows to distinguish them from the input intervals. For the same reason we use the term segment for an interval used in the segment tree in Section 5.1. We say that an interval I = [x, y] is contained in another interval I if both endpoints x, y are elements of I. Denition 2 (Leftmost and Rightmost intervals). Given a window W and a set of input intervals I. Then we dene Lef tmost(w ) is the interval with the smallest right endpoint among the intervals of I contained in W. We use the left endpoint as a tiebreaker, choosing the interval with the largest left endpoint. Rightmost(W ) is the interval with the largest left endpoint among the intervals of I contained in W. We use the right endpoint as a tiebreaker, choosing the interval with the smallest right endpoint. In case of W being empty both Leftmost(W ) and Rightmost(W ) are undened.

3 Interval Selection in the streaming model 3 If W contains just a single interval I I it holds that Lef tmost(w ) = Rightmost(W ). Also the intersection of all input intervals contained in W is equal to Leftmost(W ) Rightmost(W ). Otherwise it would contradict to the denition of Lef tmost(w ) or Rightmost(W ). It will be clear from context to which set of input intervals Leftmost(W ) and Rightmost(W ) refer to. 2.2 Sampling Denition 3 (ε-min-wise independence). A family of permutations H : {h : [n] [n]} is ε-min-wise independent if X [n] and y X : 1 ε X 1 + ε Pr [h(y) = min h(x)] h H X. Note that we will only look at subsets of all possible permutations on [n]. This is because if we examine the set of all permutations h(y) is distributed uniformly on [n] and it follows that ε = 0. Based on this denition we can use the results of Indyk [2] to obtain a family of permutations with properties that we will need later. Lemma 1. For every ε (0, 1/2) and n > 0 there exists a family permutations H(n, ε) = {h : [n] [n]} with the following properties: (i ) H(n, ε) has n O(log(1/ε)) permutations (ii ) H(n, ε) is ε-min-wise independent (iii ) an element of H(n, ε) can be chosen uniformly at random in O(log 1/ε) time (iv ) for h H(n, ε) and x, y [n], we can decide with O(log(1/ε)) arithmetic operations whether h(x) < h(y). Proof. We will just provide a rough idea how to proof the above properties. The results of Indyk [2] grant a family H of ε -wise independent hash functions with ε depending on ε and some constant factors. It can be shown that each hash function h H can be used to create a ε-min-wise independent permutation using the lexicographic order of (h (i), i) over all i [n]. Using standard constructions over nite eld grant a family of hash functions satisfying condition (i), (iii) and (iv). Transforming these hash functions into permutations using the above approach grants property (ii). 3 H-random samples We will use the result of the previous lemma to obtain H-random samples. These are elements that are chosen nearly uniformly at random, but we still want to maintain some characteristic information about this samples. The general idea is based on the work of Datar and Muthukrishnan [4]. We consider a xed subset X [n] and H = H(n, ε) a family of permutations as stated in Lemma 1. To obtain a H-random element s of X we choose a permutation h H uniformly at random and set s = arg min{h(x) x X}. It is important that we do not choose s completely uniformly at random. With the denition of ε-min-wise independence we obtain that x X : 1 ε X Pr[s = x] 1 + ε X. This follows from the observation that if we x h, it holds that h(x) = h(y) x = y. Moreover, with Pr[s Y ] = y i Y Pr[s = y i] we can conclude that

4 4 Pascal Bemmann Y X : (1 ε) Y X Pr[s Y ] (1 + ε) Y. (1) X This gives us the opportunity to estimate the ratio Y / X for a xed Y. We keep calculating H- random samples from X and count how many of the samples are elements of Y. The probability that an H-random sample is an element of Y correlates to the ratio between Y and X. Furthermore, H-random samples can be maintained during the stream. After choosing h H uniformly at random we check whether for a new element a of the stream h(a) < h(s) holds. This means that a is our new minimum of X and we update s = a. Also we will use H-random samples for conditional sampling where we sample elements until we obtain an element satisfying certain properties. To analyze later results we will need the following observation. Lemma 2. Let Y X [n] and ε (0, 1/2). Consider a family of permutations H = H(n, ε) with the properties of Lemma 1 and an H-random sample s from X. Then y Y : 1 4ε Y Pr[s = y s Y ] 1 + 4ε. Y Proof. Fix an arbitrary y Y. With s = y s Y and the considerations above we observe Pr[s = y s Y ] = Pr[s = y and s Y ] Pr[s Y ] 1+ε Pr[s = y] = Pr[s Y ] X (1 ε) Y X = 1 + ε 1 ε 1 ( ) (1 + 4ε) 1 Y Y. Here the inequality marked with ( ) follows with (1 + ε) (1 ε) 1 + 4ε 1 + ε (1 + 4ε)(1 ε) = 1 + 4ε ε 4ε 2 0 2ε 4ε 2 and 2ε 4ε 2 = ε(2 4ε) ε( ) = 0. Similarly we conclude that Pr[s = y s Y ] (1 4ε) 1 Y. 4 A 2-approximation algorithm The goal of this section is to construct a 2-approximation algorithm for the interval selection problem using O(α(I)) space. The algorithm will maintain a set W which is a partition of the real line. We call elements of W windows which are intervals for which both the inclusion and exclusion of the endpoints are allowed. More specically all elements of W are pairwise disjoint and the union of all elements of W forms whole R. We formalize this desired set and its consequentially properties in the next lemma. Lemma 3. Let I be a set of intervals and let W be a partition of the real line with the following properties: Each window of W contains at least one interval from I. For each window W W, the intervals of I contained in W pairwise intersect. Let J be any set of intervals constructed by selecting for each window W of W an interval of I contained in W. Then J > α(i)/2.

5 Interval Selection in the streaming model 5 Fig. 1 At the bottom is a partition of the real line. Filled circles represent included endpoints; empty circles excluded endpoints. At the top the we split an optimal solution J (marked in blue) into J and J. Proof. Consider a partition of the real line W with the above properties. To shorten the notation we set k = W. With J I we refer to an optimal solution of the interval selection problem. By denition J = α(i) holds. For the further investigation we split J into two disjoint sets J and J. For an example see Figure 1. J describes the set of intervals which are fully contained in some window of W. By construction all intervals contained in a window W of W pairwise intersect, therefore at most one interval of J is contained in a window. With W having k elements we obtain J k. Every interval which intersects at least two successive windows of W is contained in J. All these intervals cannot be elements of J. Since the k windows of W are split by k 1 endpoints, J k 1 holds. Since each element of J 5 is either contained in J or J we combine the above results to obtain α(i) = I = J + J k + k 1 = 2k 1. Since J is constructed by choosing one interval from each window and J = k we conclude that 2 J = 2k > 2k 1 α(i) which completes the proof. We will now present the general idea of an algorithm that maintains such a partition throughout the stream. For an example of such a partition see Figure 2. The overall goal is to partition the real line while storing Leftmost(W ) and Rigtmost(W ) for all windows W W. When receiving the rst input interval I 0 of the stream we set W = {R} and Leftmost(W ) = Rigtmost(W ) = I 0. At this point Lemma 3 holds. Now we show that we can insert new intervals while keeping the above conditions. Let I be a new interval of the stream. If I is not contained in any window of W no update is needed, since it will be disregarded by the algorithm because we will choose our nal intervals from the set of intervals contained in a window of W. Otherwise I is contained in some window W and we distinct two cases. If I intersects all intervals in W we check if we have to update Leftmost(W ) or Rightmost(W ). Otherwise we have to split window W into two windows W 1 and W 2. If both endpoints of I are bigger than Leftmost(W ) Rightmost(W ) we use the right endpoint of Leftmost(W ) as a splitting value. Then we set W 1 to the segment containing Leftmost(W ) and W 2 to the segment containing I. If both endpoints of I are smaller than Leftmost(W ) Rightmost(W ) we use the same approach with Rightmost(W ) instead of Lef tmost(w ). With this operations we ensure that our partition satises Lemma 3. The formal proof showing that these instructions maintain a partition satisfying Lemma 3 is a simple case distinction using inductive

6 6 Pascal Bemmann arguments. By storing W using a dynamic binary search tree the algorithm needs O( W ) = O(α(I)) space. Also all operations within this tree can be handled within O(log W ) = O(log α(i)) time. By choosing one arbitrary input interval from each window we end up with a 2-approximation solution for the interval selection problem. Fig. 2 Maintaining of a partition of the real line by a 2-approximation algorithm. 5 Estimating the size of an optimal solution In this section we want to estimate the size of an optimal solution of the interval selection problem. To reach this we will split the interval [1, n] into segments and apply the 2-approximation algorithm from Section 4 at each of these segments. We will construct our segments in such a way, that each segment contains neither too much nor too few input intervals. We will start with presenting results independent from the streaming model. After that we will explain how to use these in algorithms in the streaming model. 5.1 Segments For our overall we use a segment tree T. This is a balanced binary tree on the segments [i, i+1) with i [n]. Each leaf of T corresponds to a segment [i, i + 1) for some i, including the left endpoint and excluding the right endpoint. Note that the order of leafs in the tree is the same as the order of the corresponding segments on the real line. For any inner node v of T the corresponding interval S(v) is the (disjoint) union of the intervals of v's children. Then for the root node r the corresponding interval S(r) is equal to [1, n + 1). With S we denote the set of segments corresponding to the nodes of T. Since T is a balanced binary tree the size of S is 2n 1. For an example see Figure 3. To refer to the parent of a segment S S with S S(r) we will use π(s). This is the segment corresponding to the parent node of the node v of S for which S(v) = S holds. For the upcoming constructions we want to denote the size of the largest independent subset by β(s) if we only consider input intervals which are elements of S S. Analogously we use ˆβ(S) for the size of an optimal solution computed by a 2-approximation algorithm of Section 4 only applied to input intervals which are elements of S. Resulting from this denitions we directly conclude that S S : β(s) ˆβ(S) β(s)/2. (2)

7 Interval Selection in the streaming model 7 The next lemma tells us that if we restrict a 2-approximation algorithm to some segments of S with certain properties and apply it to the input intervals contained in them, we obtain a (1/2 ε) approximation for the estimation of the size of an optimal solution. Fig. 3 Segment tree for n = 8. Lemma 4. Let S S be such that (i ) S(r) is the disjoint union of the segments in S and (ii ) for each S S it holds that β(π(s)) 2ε 1 log n. Then α(i) S S ˆβ(S) ( 1 2 ε)α(i). Proof. Consider a set S with the above properties. Then we can merge the solutions produced by a 2-approximation algorithm applied independently each S S. No input interval is chosen multiple times because the segments in S are disjoint. With inequality (2) the rst inequality follows α(i) S S β(s) S S ˆβ(S). To obtain the second inequality we look at set of elements S which is the set of leafmost elements in the set of parents {π(s) S S }. This means that each element of S has a child in S but no descendant which is an element of S. By denition for each segment S S there exists a S S such that the parent of S is on the path Π T ( S) in T from the root to S. Otherwise we would have found a leafmost parent node which is not an element of S. For each of these S S using (ii) it holds that β( S) 2ε 1 log n. Now we want to link this considerations to J I an optimal solution for the interval selection problem. Assume we are given such a solution. Then for each segment S S, J contains at most two intervals that intersect S but which are not completely contained in S. Otherwise at least two intervals of the optimal solution would intersect which leads to a contradiction. Therefore we can conclude for all S S {J J J S } {J J J S} + 2 β(s) + 2. (3) Per denition the segments in S are pairwise disjoint as otherwise they would not be a leafmost parent node. Then we can join single solutions restricted on segments of S to a solution for the whole input. Together with (ii) we obtain

8 8 Pascal Bemmann J S S β( S) S 2ε 1 log n. (4) The maximum path length in T is log n + 1 since T is a balanced tree. Then for all S S the path from the root to S has at most log n vertices because S is a parent node. Each S S has a parent on the path from the root to some S S. With the fact that each S S has at most two children and by rearranging (2) to S we obtain S 2 log n S 2 log n J 2ε 1 log n = ε J. Since the segments of S form a disjoint union of S(r) we can conclude with (2), (3) and (4) that J {J J J S } (β(s) + 2) 2 S + β(s) 2ε J + β(s). S S S S S S S S Because β(s) is a 2-approximation and J = α(i) the next inequality proves the second inequality of the lemma: J 2ε J + β(s) 2ε J + 2 β(s) S S S S (1 2ε) J 2 β(s) S S ( 1 2 ε) J β(s). S S The next goal is to nd a set which satises the properties of Lemma 3. But to determine whether a segment S belongs to this set S we want to use only local information which does not requires knowledge about other segments to minimize our space requirements. The application of a 2-approximation algorithm on segments β(s) is not suitable for this task because it is possible that for some segment S S\{S(r)}, β(π(s)) < β(s) holds. This would cause problems in our overall construction. Instead we dene another estimate which is monotone nondecreasing along paths to the root. In particular, we dene for each segment S S γ(s) = {S S S S and I I s.t. I S } describing the number of segments of S that are contained in S and contain at least one input interval. This number corresponds are nodes in the segment tree that are descendants of S and contain some input interval. For this estimate we can prove the following properties. Lemma 5. For all S S, we have the following properties: (i) γ(s) γ(π(s)), if S S(r), (ii) γ(s) β(s) log n, (iii) γ(s) β(s) and (iv) γ(s) can be computed in O(γ(S)) space using the portion of the stream after the rst interval contained in S. Proof. The rst statement follows immediately from the denition of the segment tree because each parent node contains all input intervals which are contained in its children. To proof the remaining properties we x some S S and dene S := {S S S S and I I : I S }

9 Interval Selection in the streaming model 9 denoting the set of segments contained in S which itself contains at least one input interval. These segments are associated with descendants of S in the segment tree. Let T S be the subtree with root S. Since T is a balanced tree it has at most log n many levels. Due to the fact that γ(s) is exactly the size of S we can use the pigeonhole principle to conclude that there has to exists a level L of T S which has at least γ(s)/ log n pairwise distinct elements of S. All these segments are disjoint because they are on the same level of the segment tree. This means we can pick an arbitrary input interval from each of these segments to obtain a subset of the input intervals resulting in β(s) γ(s)/ log n. Rearranging grants the second property. To prove (iii) we consider an optimal solution J constrained to S. Each interval J from J is contained in some interval of S. Let S(J) be the smallest segments of the segments containing J. Then J contains the middle point of S(J). Otherwise we could split S(J) in half and choose the half containing J which would be smaller than S(J). Therefore for all J J the segments S(J) are pairwise distinct or else two input intervals of the optimal solution would intersect. Note that these segments do not have to be disjoint as their associated nodes might be on dierent levels of the segment tree. With this minimal segments being elements of S we can conclude that γ(s) = S {S(J) J J } = J. The property follows with J = β(s). We can use binary search trees to prove the fourth property. For a new input interval I we check if our tree already contains the segments which are contained in S and contain I. If not, update those segments to the structure. Then γ(s) corresponds to the number of leaves and can be computed by traversing the tree. The necessary space needed for such a tree is related to the number of elements stored and results in O(γ(S)). Equipped with this estimate we can dene a certain type of segments which will help us to nd a set satisfying Lemma 4. Denition 4. A segment S of S with S S(r) is relevant if (i) γ(π(s)) 2ε 1 log n 2 and (ii) 1 γ(s) < 2ε 1 log n 2. Then S rel S denotes the set of relevant segments of S. In case S rel is empty, we set it to S rel = {S(r)}. This formalizes the phrase that relevant segments contain at least one input interval but not too many. Also it is guaranteed that parent nodes of nodes associated with relevant segments also contain a certain amount of input intervals. Now we will analyze the result of applying a 2-approximation algorithm to relevant segments. Remind that β(s) describes the size of a solution produces by a 2-approximation algorithm. Lemma 6. It holds that α(i) 1 ˆβ(S) ( 2 ε)α(i). S S rel Proof. If γ(s(r)) < 2ε 1 log n 2 then S is empty because by Lemma 5, γ( ) is nondecreasing along paths from leafs to the root in T. Then no node S can have a parent node for which γ(π(s)) 2ε 1 log n 2 holds and we set S rel to S rel = {S(r)}. Then the above inequality follows directly from the fact that β(s) denotes the size of a 2-approximation algorithm. Therefore we can assume that γ(s(r)) 2ε 1 log n 2. Then also the root node of T is not an element of S rel. Dene S 0 = {S S\{S(r)} γ(s) = 0 and γ(π(s)) 2ε 1 log n 2 }. Let S be a the rst node on a path from the root to a leaf for which γ(π(s)) 2ε 1 log n 2 and γ(s) < 2ε 1 log n 2 holds. In case it contains any input intervals it follows that S S rel and S S 0 otherwise. Therefore S rel S 0 forms a disjoint union of S(r). By rearranging Lemma 5 (ii) to β(s), using the denition of relevant segments and the assumption that γ(s(r)) 2ε 1 log n 2 we obtain S S rel S 0 : β(π(s)) γ(π(s))/ log n 2ε log n

10 10 Pascal Bemmann Fig. 4 Active segments (dotted, marked in red) in a segment tree caused by an interval I. implying that S = S rel S 0 satises Lemma 4. Because all S S 0 do not contain any input interval by denition, it holds that γ(s) = β = 0 which grants the above statement. Another important type of segments we need for our overall estimate are active segments. Denition 5. A segment S S is active if S = S(r) or its parent π(s) contains some input interval. For an example of active segments see Figure 4. Note that every relevant segment is also an active segment because relevant segments contain at least one input interval. Later we will use H-random samples to estimate the ratio between relevant and active segments. 5.2 Algorithms in the Streaming Model In this section we want to use the ndings of the previous section to construct algorithms estimating the number of active segments, the ratio between active and relevant segments and the average value of β(s) over the relevant segments. Putting all these estimates together we can obtain an estimate for the size of an optimal solution. To provide an estimate for the number of active segments we rst denote with σ S (I) the sequence of active segments because of the input interval I. This sequence is ordered non increasing by the size of the intervals. Therefore the sequence starts with the root node followed by nodes which parents contain I. Since the segments of parent nodes are bigger than its children, parent nodes appear before its children in σ S (I). The length of the sequence is bounded by 2 log n + 1. This holds because if an interval is contained in some leaf of the segment tree, there exists a path on nodes whose associated segments are active. All children of this nodes are by denition active, too. Since T is a balanced tree we obtain for the length of the sequence 2 log n + 1 if we add the root node as well. Note that we do not need to store our whole segment tree to compute σ S because the intervals associated to nodes are independent from the input intervals. Therefore we can calculate segments associated to nodes in our segments tree on the y to compute the at most 2 log n + 1 active nodes using only O(log n) space. With this we can show an estimate for the number of active segments N act. Lemma 7. There is an algorithm in the data stream model that in O(ε 2 + log n) space computes a value ˆN act such that Pr[ N act ˆN act ε N act ] 11/12.

11 Interval Selection in the streaming model 11 Proof. The stream of input intervals I = I 1, I 2... denes the stream of active segments σ = σ S (I 1 ), σ S (I 2 ),... where S is set of intervals associated to the segments of the balanced segment tree over [n]. Remind that this stream is O(log n) times longer than I. If we count the distinct elements in the stream σ we obtain the number of active segments. Therefore we reduce the interval selection problem to the problem to counting the distinct elements of the stream σ. For this we can use the results of Kane, Nelson and Woodru [3]. Their algorithm grants a (1 ± ε) approximation for the distinct elements problem using O(ε 2 + log S ) = O(ε 2 + log(n)) space with 2/3 success probability. The space needed is proven to match the optimal space bound up to constant factors. The success probability can be improved by using a standard technique. By running O(log((2/12) 1 )) instances of the algorithm in parallel we can amplify the success probability to 1 1/12 = 10/12. The general idea of their algorithm is to run several constant factor approximations in parallel. Then they keep several counters updated to compensate the constant factors. With the help of their algorithm we obtain a value ˆN act matching the above property. As we have an estimate for the number of active segments we can use this to compute an estimate for the number of relevant segments which are a subset of all active segments. Lemma 8. There is an algorithm in the data stream model that uses O(ε 4 log 4 n) space and computes a value ˆN rel such that Pr[ N rel ˆN rel ε N rel ] 10/12. Proof. First we will estimate the number of active segments using the algorithm from Lemma 7. Using H-random samples we will be able estimate the ratio between active and relevant segments. This can be turned into an estimate for N rel = (N rel /N act ) N act. To ensure that the sample we take later will be representative, we need a lower bound for N rel /N act. In our segment tree T, each relevant segment S S rel has at most 2γ(S ) < 4ε 1 log n 2 active segments below it. This holds because each relevant segment is by denition also an active segment. Then all relevant segments contained in S have at most two children which are active. The inequality follows from the denition of γ( ). Also there are at most 2 log n active segments whose parent is an ancestor of S. This upper bound holds exactly if S is a leaf of T. Then for each relevant segment we have at most 4ε 1 log n log n 4ε 1 log n log n 2 = 6ε 1 log n 2 active segment. Using this upper bound we can conclude that N rel N act 1 6ε 1 log n 2 = ε 6 log n 2 (5) which we will use later. Now consider an arbitrary mapping b between S and [n 2 ] that can be easily computed. Then Lemma 1 gives us a family H = H(n 2, ε) of ε-min-wise independent hash functions. Note that we use n 2 as S contains at most 2n 1 elements to get an injective function. For a xed h H, the concatenation h b denes an order among the elements of S. This ordering will be used together with ε-min-wise independent hash function to compute H-random samples. We choose k = 72 log n 2 /(ε 3 (1 ε)) Θ(ε 3 log 2 n) and pick k permutation h 1,..., h k H uniformly and independently at random. We will see shortly why k was chosen in this specic way. Then for each permutation h j with j = 1,..., k our H-random sample S j is dened as S j = arg min{h j (b(s)) S S is active}. Therefore S j is the active segment of S which minimizes (h j b)( ). Then S j is approximately a random active segment of S. It is not completely uniformly random because we want to keep some

12 12 Pascal Bemmann information so we can apply properties of ε-min-wise independent hash functions to it. By dening the random variable X = {j {1,..., k} S j is relevant} we have formalized the number of H-random samples which are active. We shift the discussion how X can be compute to the end of the proof. Then N rel /N act is in the order of X/k. To get a bound for X we dene p = Pr [S j is relevant] h j H and remind that every relevant segment is also an active segment. Then we can our results from inequality (1) from Section 3 by setting Y = S rel and X = S act. We obtain (1 ε)n rel N act p (1 + ε)n rel N act. (6) By using the denition of k, the lower bound of p and the lower bound in (5) it holds that kp 72 log n 2 ε 3 (1 ε) (1 ε)n rel N act 72 log n 2 ε ε 3 6 log n 2 = 12 ε 3. (7) You can also understand each S j as a binary random variable. Then X is the sum of k independent random variables and it follows that E[X] = kp. Using Chebychev's inequality, rearranging the lower bound for kp in (7) and the fact that V ar[x] = kp(1 p) for sums of k independent random variables it holds that [ ] X Pr k p εp = Pr [ X kp εkp] V ar[x] kp(1 p) (1 p) = (εkp) 2 (εkp) 2 = kpε 2 < 1 kpε (8) This allows us to formalize our nal estimator. We use the estimator ˆN act for the number of active segments as stated in Lemma 7. Then we dene ˆN rel = ˆN act (X/k). For now assume that the estimator of N act is successful and p lies within an ε-range around X/k. Formally: [ N act ˆN ] [ ] X act εn act and k p εp. We use this to get bounds and the upper bound from (6) for our estimate ˆN rel : ˆN rel = ˆN act ( X k ) (1 + ε)n act (1 + ɛ)p (1 + ε) 2 N act (1 + ε)n rel N act = (1 + ε) 3 N rel With ε < 1/2 it holds that (1 + ε) 3 (1 + 7ε). This grants ˆN rel (1 + 7ε)N rel. Analogously using lower bounds we can conclude that ˆN rel (1 7ε)N rel. It is left to analyze our overall success probability. This can be done via the probability that the estimates for ˆN act and p fail: [ ] Pr[ ˆN rel = (1 ± 7ε)n rel ] 1 Pr[ N act ˆN X act εn act ] Pr k p εp We can scale ε by a factor 1/7 to obtain the claimed bound. This is a valid operation because ε stays within the range of (0, 1/2). It is left to analyze how X can be computed and the space needed for the estimate. For each j [k] we store the H-random element S j for all active segments seen so far using h j. Moreover, we store information about the choice of h j and also about γ(s j ) and γ(π(s j )) so we can decide whether S j is relevant. Then let I 1, I 2,... be the stream of input intervals and σ = σ S (I 1 ), σ S (I 2 ),... the stream of active

13 Interval Selection in the streaming model 13 segments described earlier. When given a segment S of σ we have to update S j if h j (S) < h j (S j ). With the segments of σ S (I) ordered nondecreasing in size we can keep γ(π(s j )) updated because S j becomes active for the rst time if its parent contains some input intervals. Then γ(π(s j )) > 0 and the following parts of σ can be used to compute γ(s j ) and γ(π(s j )) using Lemma 5 (iv). To stay within our desired space bounds we use the following trick. If at some point γ(s j ) > 2ε 1 log n 2 violates the condition for a relevant segment, we only store the fact that S j is not relevant and nothing more. No later arriving input interval could possibly change this. Also if γ(π(s j )) > 2ε 1 log n 2 we only store that is possible for S j to be relevant. This information is enough and maintaining a counter for bigger values is not necessary. Then for each j we need ε 1 log 2 n space. With the denition of k we obtain a space bound of at most O(kε 1 log 2 n) = O(ε 4 log 4 n). The last part we need for our nal result is about an average value of a 2-approximation applied to relevant segments. In specic we dene ρ = ( S S rel ˆβ(S)/ Srel. Lemma 9. There is an algorithm in the data stream model that uses O(ε 5 log 6 n) space and computes a value ˆρ such that Pr[ ρ ˆρ ερ] 10/12. Proof. To get an estimate of ρ we will use conditional sampling. We compute H-random samples until we get a sample satisfying a certain condition. As in the previous lemma let b be an arbitrary injective mapping between S and [n 2 ]. Also consider a family H = H(n 2, ε) from Lemma 1. Then h b denes an order among the elements of S. In the following S act describes the set of active segments. We repeatedly sample h H uniformly at random until S 1 = arg min S Sact h(b(s)) is a relevant segment. Then set the random variable Y 1 = ˆβ(S). Using Lemma 2 with X = S act and Y = S rel we obtain S S rel : 1 4ε S rel Pr[S 1 = S] 1 + 4ε S rel. Using the upper bound of the equation above and the denition of ρ it follows from the denition of the expected value that E[Y 1 ] (1 + 4ε) ρ. Similarly using the lower bounds we can show that E[Y 1 ] (1 4ε) ρ. With Lemma 5 (iii) we have that γ(s) β(s) ˆβ(S). Using the denition of relevant segments and ρ we obtain an upper bound for the variance of Y 1 : V ar[y 1 ] = E(Y 2 1 ) (E(Y 1 )) 2 E[Y 2 1 ] = Pr[S 1 = S] ( ˆβ(S)) 2 S S rel 1 + 4ε S rel ˆβ(S) 2 log n 2 ε S S rel ρ 2(1 + 4ε log n 2 ) ε 6 log n 2 ρ ε Since γ(s) 1 means that S contains at least 1 input interval, the approximation 2-algorithm constrained to S will also choose at least one input interval and therefore ˆβ(S) 1. This leads to ρ 1. Consider some integer k which we choose later. This k will be the amount of relevant segments we use to estimate ρ. Dene Y 2,... Y k as independent random variables with the same distribution as Y 1 and our estimate ˆρ as the average over those random variables: ˆρ = ( k i=1 Y i)/k. With Chebyshev's inequality and ρ 1 it holds that

14 14 Pascal Bemmann Pr[ ˆρ E(Y 1 ) ερ] = Pr [ ˆρk E[Y 1 ]k εkρ] V ar[ˆρk] (εkρ) 2 6 log n 2 kε 3. By setting k = 6 12 log n 2 /ε 2 we get Pr[ ˆp E[Y 1 ] ερ] 1/12. With this as a basis we apply the same approach as in Lemma 8. First we set k 0 = 12 log n 2 k/ε(1 ε) Θ(ε 4 log 4 n). Then for each j [k 0 ] we compute an H- random sample by choosing h j H uniformly at random and setting S j = arg min{h j (b(s)) S is active}. For the further analysis let X denote the overall number of relevant segments of S 1,..., S k0 and p = Pr[S 1 S rel ]. Using the lower bound of inequality (6) and the ratio between relevant and active segments (5) from Lemma 8 we get k 0 p (12 log n 2 )k ε(1 ε) (1 ε) N rel 12 log n 2 (1 ε)ε k N act ε(1 ε) 6 log n 2 = 2k. This result, Chebyshev's inequality and the fact that V ar[x] = k 0 p(1 p) holds because X is a sum of binary variables leads us to Pr[ X k 0 p k 0 p/2] V ar[x] (k 0 p/2) 2 = 4k p(1 p) k 2 0 p < 4 k 0 p 4 2k Here we also used that p > 0 and k 0 p 2k. This implies that with probability at least 11/12 the sample S 1,..., S k0 (represented by X) contains at least (1/2)k 0 p k relevant segments. With these rst k relevant segments we are able to estimate ˆp which was dened above. Like before we can use the failure probability for X and ˆp to obtain the probability 1 1/12 1/12 = 10/12 that both [ X k 0 p k 0 p/2] and [ ˆρ E[Y 1 ] ερ] hold. In case of success we get by using the upper bounds and our results for the expected value of Y 1 that ˆρ ερ + E[Y 1 ] ερ + (1 + 4ε)ρ = (1 + 5ε)ρ and analogously using the lower bounds ˆρ (1 5ε)ρ. Combining these two inequalities and the above success probabilities grants Pr [ ρ ˆρ 5ερ] 10/12. Like in the previous Lemma we can rescale ε, this time using the factor 1/5. Also the argumentation for the space bounds stays basically the same. For each j [k 0 ] we keep information about h j, γ(s j ), γ(π(s j )) and ˆβ(S j ). As discussed before ˆβ(S j ) β(s j ) γ(s j ) holds. Therefore our space bound for each index j is the same as in Lemma 8, particularly O(ε 1 log 2 n). Because we have k 0 indices we need O(k 0 ε 1 log 2 n) = O(ε 5 log 6 n) space in total. Now we are able to turn all the presented algorithms so far in one algorithm to compute an estimate for the optimal solution of the interval selection problem. Theorem 1. Let ε (0, 1/2) and I be a set of interval with endpoints in [n] that arrive in a data stream. There is an algorithm that uses O(ε 5 log 6 n) space and computes a value ˆα such that Pr[(1/2 ε) α(i) ˆα α(i)] 2/3. Proof. We start with estimating N rel and ρ using Lemma 8 respectively Lemma 9 and obtain the estimates ˆN rel and ˆρ. Then we combine them to a single estimate ˆα 0 = ˆN rel ˆρ which we will now further investigate. The success probability of ˆα 0 is at least = 2/3 using that the failure probability of ˆNrel and ˆρ are In case of success both

15 Interval Selection in the streaming model 15 [ N rel ˆN rel ε N rel ] and [ ρ ˆρ ερ] hold. Using the denition of N rel and ρ and the fact that N rel = S rel together with Lemma 6 we can show that ( ) ˆα 0 (1 + ε)n rel (1 + ε)ρ = (1 + ε)n rel (1 + ε) ˆβ(S) / S rel s S rel = (1 + ε) 2 ˆβ(S) S S rel (1 + ε) 2 α(i) Similarly one can show that ˆα 0 (1 ε) 2 ( 1 2 ε)α(i) holds using the lower bounds of ˆNrel and ˆp. Combining these two results leads to To obtain the nal result we use that for all ε (0, 1/2). This holds because Pr[(1 ε) 2 ( 1 2 ε) α(i) ˆα 0 (1 + ε) 2 α(i)] 2 3. (1 ε) 2 ( 1 2 ε)/(1 + ε) ε ( (1 ε) 2 /(1 + ε) 2) ( 1 ( ) 2 ε) = 4ε 1 (1 + ε) 2 ( 1 2 ε) = 1 2 4ε (1 + ε) 2 (1 ( ) ε) ε ε(1 2 ε) ε with ( ) using 4ε (1 + ε) 2 (1 2 ε) = 2ε 1+ε 1 (1 2ε) 2ε(1 2ε) 2ε. (1 + ε) 2 = 1 2 ε + 2ε2 ε 1 2 3ε Also we can again rescale ε this time using the factor 1/6. If we then set ˆα = ˆα 0 /(1 + ε) 2 to avoid overestimation we obtain our nal result. The space needed for this algorithm is precisely the space needed for our two estimates ˆN rel and ˆp from Lemma 8 and Lemma 9 which completes this proof. 6 Same-size intervals Within this section we will show how we can improve the results presented so far if we assume that all input intervals have the same length λ > 0.

16 16 Pascal Bemmann 6.1 Largest independent set of same size-intervals The rst approach is again to compute an approximation of the largest independent set. By using the shifting technique of Hochbaum and Mass [6] we obtain a (3/2) approximation using O(α(I)). We will maintain a partition of the real line using windows of length 3λ. For l R we dene the window W l = [l, l + 3λ) including the left endpoint and excluding the right endpoint. Given an a {0, 1, 2} we also dene W a = {W (a+3j)λ j Z}. Note that W a is a partition of the real line. Furthermore we dene I a for some a {0, 1, 2} as the set of input intervals that are contained in some window of W a. Formally: I a = {I I j Z : I W (a+3j)λ }. Each interval of length λ is contained in exactly two windows of W 0 W 1 W 2. Then it can be shown that max{α(i 0 ), α(i 1 ), α(i 2 )} 2/3α(I) where α(i a ) denotes an optimal solution restricted on the intervals contained in I a. Because at most 2 intervals can t in a window of length 3λ, we are able to compute and store an optimal solution J a restricted on input intervals of I a for a {0, 1, 2}. By returning the maximum of J 0, J 1 and J 2 we obtain (3/2)-approximation. We will now describe how an algorithm can maintain these solutions J a throughout the stream. We use the same approach as the algorithm in Section 4. We store Leftmost(W ) and Rightmost(W ) for each window W W a with a {0, 1, 2}. In addition we store a boolean value active(w ) indicating if some interval earlier in the stream is contained in W. If active(w ) = false, W does not contain any input interval and therefore we declare Lef tmost(w ) and Rightmost(W ) undened. When receiving a new interval I of the stream we look at all windows W W a for some a {0, 1, 2}. If W is not active, we set active(w ) = true and add the input interval to J a. In case W is already active and contains intersecting intervals, we check if there is an interval of W that is disjoint to I. If so, then these two intervals are added to J a. In the other cases there is nothing to do. By following the above instructions we maintain indeed an optimal solution J a restricted to intervals of I a. By using a binary search tree for storing the at most O(α(I)) active windows we can execute all necessary operation in time O(log α(i)) and O(α(I)) space. This grants a (3/2) approximation to the largest independent set. 6.2 Size of largest independent set of same size-intervals To estimate the size of an optimal solution of all intervals are of the same size will use H-random samples again. First we will show how to estimate a solution constrained to I a for a {0, 1, 2}. Then will use the result to get a nals estimate. Lemma 10. Let a {0, 1, 2} and ε (0, 1). There is an algorithm in the data stream model that in O(ε 2 log(1/ε) + log n) space computes a value ˆα a such that Pr[ α(i a ) ˆα a ε α(i a )] 8/9. Proof. Fix some a {0, 1, 2}. We dene the type i of a window W of W a as minimal number of disjoint input intervals contained in W. Since each window can contain at most two disjoint intervals it can be of type 0, 1 or 2. By γ i with i = 0, 1, 2 we denote the number of windows of type i in W a. Then α(i a ) = γ 1 + γ 2 since intervals counted by γ 0 do not contain any input intervals. Like in Section 5 we will use H-random samples to estimate γ 1 and then the ratio γ 2 /γ 1 to obtain γ 2. First we will describe how to obtain an estimate ˆγ 1 of γ 1. Given the stream of input intervals

17 Interval Selection in the streaming model 17 I = I 1, I 2,... we can compute the sequence of windows W (I) = W (I 1 ), W (I 2 ),... with W (I i ) denoting the window of W a that contains I i. If such a window does not exist we skip I i. It follows that γ 1 is the number of distinct elements in W (I). Again, using the results of Kane, Nelson and Woodru [3] we are able to compute the estimate ˆγ 1 for which Pr[(1 ε)γ 1 ˆγ 1 (1+ε)γ 1 ] 17/18 holds using O(ε 2 + log n) space. Next we will estimate the ratio γ 2 /γ 1. For this we use H-random samples very similar to the approach in Lemma 8. Given a family H = H(n, ε) of permutation [n] [n] as stated in Lemma 1. We choose k = 18ε 2 Θ(ε 2 ) and choose h 1,..., h k H uniformly and independently at random. h j with j [k] let W j be the window [l, l + 3λ) of W a that contains at least one input interval and minimizes h j (l): W j = arg min{h j (l) [l, l + 3λ) W a and I I : I [l, l + 3λ)}. Then W j is nearly a uniform random window of W a among the segments which contain at least one input interval. We dene the random variable M = {j [k] W j is of type 2}. With the considerations before Mγ 1 /k is roughly γ 2. By applying Chebyshev's inequality on M and using the choice of k it is possible to show that Mγ 1 /k = γ 2 ± εγ 1. Using both results above we output with ˆγ 1 (1 + M/k) the desired estimate. Note that M can be computed in O(ε 2 log(1/ε)) space by keeping information about h j and the current window W j for each index J. We also store Leftmost(W j ) and Rightmost(W j ) to decide if W j is of type 1 or 2. Theorem 2. Let ε (0, 1/2) and I be a set of intervals of length λ with endpoints in [n] that arrive in a data stream. There is an algorithm that uses O(ε 2 log(1/ε)+log n) space and computes a value ˆα such that Pr[(2/3 ε) α(i) ˆα α(i)] 2/3. Proof. For each a {0, 1, 2} we use Lemma 10 to obtain the estimate ˆα a for α(i a ). The probability that these three estimates are successful is at least 1 1/9 1/9 1/9 = 2/3. With the properties of these estimates we conclude that 2/3(1 ε) α(i) max{ˆα 0, ˆα 1, ˆα 2 } (1 + 1ε)α(I). Rescaling ˆα by 1/(1 + ε) to avoid overestimation and replace ε by ε/2 complete the proof. 7 Conclusion and other results We have shown how to get a 2-approximation for the interval selection problem and how to use it to get an estimate for the size of an optimal solution. It is also possible to show lower bounds for both problems considered. Emek, Halldórsson and Rosén [5] showed that any streaming algorithm for the interval selection problem cannot achieve an approximation ratio of r, for any constant r < 2. For same size intervals no ratio below 3/2 is possible. In [1] they showed similar results of the problem of estimating α(i). For this they reduce the intrval selection problem to the INDEX problem. Here it is the task to deciding whether a subset of [n] contains some element i [n]. The complexity of INDEX is well studied [7] [8]. To achieve a non trivial success probability Ω(n) bits of memory are required. This reduction shows that any algorithm that uses o(n) bits of memory cannot compute an estimate ˆα for which Pr[( c)α(i) ˆα α(i)] 2 3

18 18 Pascal Bemmann with some arbitrary constant c > 0 holds. This means that the results presented in this work match the lower bounds up to constant factors if we use o(n) space. References 1. S. Cebello, P. Pérez-Lantero. Interval Selection in the Streaming Model. Volume 9214 of the series Lecture Notes in Computer Science pp February 5, P. Indyk. A small approximately min-wise independent family of hash function. J. Algorithms 38(1):84-90, D. M. Kane, J. Nelson and D. P. Woodru. An optimal solution algorithm for the distinct elements problem. PODS 2010, pp , M. Datar and S. Muthukrishnan. Estimating rarity and similarity over data stream windows. ESA 2002, pp Springer, Lecture Notes in Computer Science 2461, Y. Emek, M. M. Halldórsson and A. Rosén. Space-constrained interval selection. ICALP 2012(1), pp Springer, Lecture Note in Computer Science 7391, D.S. Hochbaum and W. Maass. Approximation schemes for covering and packings problems in image processing and vlsi. J. ACM32(1) : , T.S. Jayram, R. Kumar, and D. Sivakumar. The one-way communication complexity of hamming distance. Theory of Computing 4(6) : , E. Kushilevitz and N. Nisan. Communication Complexity. Cambride Usniversity Press. New, NY, USA, 1997.

arxiv: v2 [cs.ds] 4 Feb 2015

arxiv: v2 [cs.ds] 4 Feb 2015 Interval Selection in the Streaming Model Sergio Cabello Pablo Pérez-Lantero June 27, 2018 arxiv:1501.02285v2 [cs.ds 4 Feb 2015 Abstract A set of intervals is independent when the intervals are pairwise

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

Lecture 3 Sept. 4, 2014

Lecture 3 Sept. 4, 2014 CS 395T: Sublinear Algorithms Fall 2014 Prof. Eric Price Lecture 3 Sept. 4, 2014 Scribe: Zhao Song In today s lecture, we will discuss the following problems: 1. Distinct elements 2. Turnstile model 3.

More information

Lecture 2. Frequency problems

Lecture 2. Frequency problems 1 / 43 Lecture 2. Frequency problems Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 43 1 Frequency problems in data streams 2 Approximating inner product 3 Computing frequency moments

More information

1 Estimating Frequency Moments in Streams

1 Estimating Frequency Moments in Streams CS 598CSC: Algorithms for Big Data Lecture date: August 28, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri 1 Estimating Frequency Moments in Streams A significant fraction of streaming literature

More information

1 Basic Combinatorics

1 Basic Combinatorics 1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set

More information

1. Introduction Bottom-Up-Heapsort is a variant of the classical Heapsort algorithm due to Williams ([Wi64]) and Floyd ([F64]) and was rst presented i

1. Introduction Bottom-Up-Heapsort is a variant of the classical Heapsort algorithm due to Williams ([Wi64]) and Floyd ([F64]) and was rst presented i A Tight Lower Bound for the Worst Case of Bottom-Up-Heapsort 1 by Rudolf Fleischer 2 Keywords : heapsort, bottom-up-heapsort, tight lower bound ABSTRACT Bottom-Up-Heapsort is a variant of Heapsort. Its

More information

Assignment 5: Solutions

Assignment 5: Solutions Comp 21: Algorithms and Data Structures Assignment : Solutions 1. Heaps. (a) First we remove the minimum key 1 (which we know is located at the root of the heap). We then replace it by the key in the position

More information

Lecture 2 September 4, 2014

Lecture 2 September 4, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 2 September 4, 2014 Scribe: David Liu 1 Overview In the last lecture we introduced the word RAM model and covered veb trees to solve the

More information

Erdös-Ko-Rado theorems for chordal and bipartite graphs

Erdös-Ko-Rado theorems for chordal and bipartite graphs Erdös-Ko-Rado theorems for chordal and bipartite graphs arxiv:0903.4203v2 [math.co] 15 Jul 2009 Glenn Hurlbert and Vikram Kamat School of Mathematical and Statistical Sciences Arizona State University,

More information

IBM Almaden Research Center, 650 Harry Road, School of Mathematical Sciences, Tel Aviv University, TelAviv, Israel

IBM Almaden Research Center, 650 Harry Road, School of Mathematical Sciences, Tel Aviv University, TelAviv, Israel On the Complexity of Some Geometric Problems in Unbounded Dimension NIMROD MEGIDDO IBM Almaden Research Center, 650 Harry Road, San Jose, California 95120-6099, and School of Mathematical Sciences, Tel

More information

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9],

Chapter 1. Comparison-Sorting and Selecting in. Totally Monotone Matrices. totally monotone matrices can be found in [4], [5], [9], Chapter 1 Comparison-Sorting and Selecting in Totally Monotone Matrices Noga Alon Yossi Azar y Abstract An mn matrix A is called totally monotone if for all i 1 < i 2 and j 1 < j 2, A[i 1; j 1] > A[i 1;

More information

arxiv: v2 [cs.ds] 3 Oct 2017

arxiv: v2 [cs.ds] 3 Oct 2017 Orthogonal Vectors Indexing Isaac Goldstein 1, Moshe Lewenstein 1, and Ely Porat 1 1 Bar-Ilan University, Ramat Gan, Israel {goldshi,moshe,porately}@cs.biu.ac.il arxiv:1710.00586v2 [cs.ds] 3 Oct 2017 Abstract

More information

The Count-Min-Sketch and its Applications

The Count-Min-Sketch and its Applications The Count-Min-Sketch and its Applications Jannik Sundermeier Abstract In this thesis, we want to reveal how to get rid of a huge amount of data which is at least dicult or even impossible to store in local

More information

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1

Lecture 10. Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Lecture 10 Sublinear Time Algorithms (contd) CSC2420 Allan Borodin & Nisarg Shah 1 Recap Sublinear time algorithms Deterministic + exact: binary search Deterministic + inexact: estimating diameter in a

More information

Lecture 2 Sept. 8, 2015

Lecture 2 Sept. 8, 2015 CS 9r: Algorithms for Big Data Fall 5 Prof. Jelani Nelson Lecture Sept. 8, 5 Scribe: Jeffrey Ling Probability Recap Chebyshev: P ( X EX > λ) < V ar[x] λ Chernoff: For X,..., X n independent in [, ],

More information

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Nathan Lindzey, Ross M. McConnell Colorado State University, Fort Collins CO 80521, USA Abstract. Tucker characterized

More information

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan

Biased Quantiles. Flip Korn Graham Cormode S. Muthukrishnan Biased Quantiles Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Flip Korn flip@research.att.com Divesh Srivastava divesh@research.att.com Quantiles Quantiles summarize data

More information

Optimal compression of approximate Euclidean distances

Optimal compression of approximate Euclidean distances Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

A fast algorithm to generate necklaces with xed content

A fast algorithm to generate necklaces with xed content Theoretical Computer Science 301 (003) 477 489 www.elsevier.com/locate/tcs Note A fast algorithm to generate necklaces with xed content Joe Sawada 1 Department of Computer Science, University of Toronto,

More information

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch 1 and Srikanta Tirthapura 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY

More information

Lecture 2: Streaming Algorithms

Lecture 2: Streaming Algorithms CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 2: Streaming Algorithms Prof. Moses Chariar Scribes: Stephen Mussmann 1 Overview In this lecture, we first derive a concentration inequality

More information

A version of for which ZFC can not predict a single bit Robert M. Solovay May 16, Introduction In [2], Chaitin introd

A version of for which ZFC can not predict a single bit Robert M. Solovay May 16, Introduction In [2], Chaitin introd CDMTCS Research Report Series A Version of for which ZFC can not Predict a Single Bit Robert M. Solovay University of California at Berkeley CDMTCS-104 May 1999 Centre for Discrete Mathematics and Theoretical

More information

Extracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice.

Extracted from a working draft of Goldreich s FOUNDATIONS OF CRYPTOGRAPHY. See copyright notice. 106 CHAPTER 3. PSEUDORANDOM GENERATORS Using the ideas presented in the proofs of Propositions 3.5.3 and 3.5.9, one can show that if the n 3 -bit to l(n 3 ) + 1-bit function used in Construction 3.5.2

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information

On shredders and vertex connectivity augmentation

On shredders and vertex connectivity augmentation On shredders and vertex connectivity augmentation Gilad Liberman The Open University of Israel giladliberman@gmail.com Zeev Nutov The Open University of Israel nutov@openu.ac.il Abstract We consider the

More information

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1.

Chapter 11. Min Cut Min Cut Problem Definition Some Definitions. By Sariel Har-Peled, December 10, Version: 1. Chapter 11 Min Cut By Sariel Har-Peled, December 10, 013 1 Version: 1.0 I built on the sand And it tumbled down, I built on a rock And it tumbled down. Now when I build, I shall begin With the smoke from

More information

Lecture 4 Thursday Sep 11, 2014

Lecture 4 Thursday Sep 11, 2014 CS 224: Advanced Algorithms Fall 2014 Lecture 4 Thursday Sep 11, 2014 Prof. Jelani Nelson Scribe: Marco Gentili 1 Overview Today we re going to talk about: 1. linear probing (show with 5-wise independence)

More information

Some notes on streaming algorithms continued

Some notes on streaming algorithms continued U.C. Berkeley CS170: Algorithms Handout LN-11-9 Christos Papadimitriou & Luca Trevisan November 9, 016 Some notes on streaming algorithms continued Today we complete our quick review of streaming algorithms.

More information

Data Structure. Mohsen Arab. January 13, Yazd University. Mohsen Arab (Yazd University ) Data Structure January 13, / 86

Data Structure. Mohsen Arab. January 13, Yazd University. Mohsen Arab (Yazd University ) Data Structure January 13, / 86 Data Structure Mohsen Arab Yazd University January 13, 2015 Mohsen Arab (Yazd University ) Data Structure January 13, 2015 1 / 86 Table of Content Binary Search Tree Treaps Skip Lists Hash Tables Mohsen

More information

Lecture 5: Two-point Sampling

Lecture 5: Two-point Sampling Randomized Algorithms Lecture 5: Two-point Sampling Sotiris Nikoletseas Professor CEID - ETY Course 2017-2018 Sotiris Nikoletseas, Professor Randomized Algorithms - Lecture 5 1 / 26 Overview A. Pairwise

More information

Computing the Entropy of a Stream

Computing the Entropy of a Stream Computing the Entropy of a Stream To appear in SODA 2007 Graham Cormode graham@research.att.com Amit Chakrabarti Dartmouth College Andrew McGregor U. Penn / UCSD Outline Introduction Entropy Upper Bound

More information

COS597D: Information Theory in Computer Science October 19, Lecture 10

COS597D: Information Theory in Computer Science October 19, Lecture 10 COS597D: Information Theory in Computer Science October 9, 20 Lecture 0 Lecturer: Mark Braverman Scribe: Andrej Risteski Kolmogorov Complexity In the previous lectures, we became acquainted with the concept

More information

The Great Wall of David Shin

The Great Wall of David Shin The Great Wall of David Shin Tiankai Liu 115 June 015 On 9 May 010, David Shin posed the following puzzle in a Facebook note: Problem 1. You're blindfolded, disoriented, and standing one mile from the

More information

CSE 190, Great ideas in algorithms: Pairwise independent hash functions

CSE 190, Great ideas in algorithms: Pairwise independent hash functions CSE 190, Great ideas in algorithms: Pairwise independent hash functions 1 Hash functions The goal of hash functions is to map elements from a large domain to a small one. Typically, to obtain the required

More information

The space complexity of approximating the frequency moments

The space complexity of approximating the frequency moments The space complexity of approximating the frequency moments Felix Biermeier November 24, 2015 1 Overview Introduction Approximations of frequency moments lower bounds 2 Frequency moments Problem Estimate

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems

CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems Antoine Vigneron King Abdullah University of Science and Technology September 20, 2012 Antoine Vigneron (KAUST)

More information

2 THE COMPUTABLY ENUMERABLE SUPERSETS OF AN R-MAXIMAL SET The structure of E has been the subject of much investigation over the past fty- ve years, s

2 THE COMPUTABLY ENUMERABLE SUPERSETS OF AN R-MAXIMAL SET The structure of E has been the subject of much investigation over the past fty- ve years, s ON THE FILTER OF COMPUTABLY ENUMERABLE SUPERSETS OF AN R-MAXIMAL SET Steffen Lempp Andre Nies D. Reed Solomon Department of Mathematics University of Wisconsin Madison, WI 53706-1388 USA Department of

More information

ACO Comprehensive Exam October 14 and 15, 2013

ACO Comprehensive Exam October 14 and 15, 2013 1. Computability, Complexity and Algorithms (a) Let G be the complete graph on n vertices, and let c : V (G) V (G) [0, ) be a symmetric cost function. Consider the following closest point heuristic for

More information

Dominating Set Counting in Graph Classes

Dominating Set Counting in Graph Classes Dominating Set Counting in Graph Classes Shuji Kijima 1, Yoshio Okamoto 2, and Takeaki Uno 3 1 Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan kijima@inf.kyushu-u.ac.jp

More information

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort

Sorting Algorithms. We have already seen: Selection-sort Insertion-sort Heap-sort. We will see: Bubble-sort Merge-sort Quick-sort Sorting Algorithms We have already seen: Selection-sort Insertion-sort Heap-sort We will see: Bubble-sort Merge-sort Quick-sort We will show that: O(n log n) is optimal for comparison based sorting. Bubble-Sort

More information

2 RODNEY G. DOWNEY STEFFEN LEMPP Theorem. For any incomplete r.e. degree w, there is an incomplete r.e. degree a > w such that there is no r.e. degree

2 RODNEY G. DOWNEY STEFFEN LEMPP Theorem. For any incomplete r.e. degree w, there is an incomplete r.e. degree a > w such that there is no r.e. degree THERE IS NO PLUS-CAPPING DEGREE Rodney G. Downey Steffen Lempp Department of Mathematics, Victoria University of Wellington, Wellington, New Zealand downey@math.vuw.ac.nz Department of Mathematics, University

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Gap Embedding for Well-Quasi-Orderings 1

Gap Embedding for Well-Quasi-Orderings 1 WoLLIC 2003 Preliminary Version Gap Embedding for Well-Quasi-Orderings 1 Nachum Dershowitz 2 and Iddo Tzameret 3 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel Abstract Given a

More information

Lecture 15 - NP Completeness 1

Lecture 15 - NP Completeness 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 29, 2018 Lecture 15 - NP Completeness 1 In the last lecture we discussed how to provide

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Basic counting techniques. Periklis A. Papakonstantinou Rutgers Business School

Basic counting techniques. Periklis A. Papakonstantinou Rutgers Business School Basic counting techniques Periklis A. Papakonstantinou Rutgers Business School i LECTURE NOTES IN Elementary counting methods Periklis A. Papakonstantinou MSIS, Rutgers Business School ALL RIGHTS RESERVED

More information

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs Krishnendu Chatterjee Rasmus Ibsen-Jensen Andreas Pavlogiannis IST Austria Abstract. We consider graphs with n nodes together

More information

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees Francesc Rosselló 1, Gabriel Valiente 2 1 Department of Mathematics and Computer Science, Research Institute

More information

Lecture 4 February 2nd, 2017

Lecture 4 February 2nd, 2017 CS 224: Advanced Algorithms Spring 2017 Prof. Jelani Nelson Lecture 4 February 2nd, 2017 Scribe: Rohil Prasad 1 Overview In the last lecture we covered topics in hashing, including load balancing, k-wise

More information

Lecture Lecture 9 October 1, 2015

Lecture Lecture 9 October 1, 2015 CS 229r: Algorithms for Big Data Fall 2015 Lecture Lecture 9 October 1, 2015 Prof. Jelani Nelson Scribe: Rachit Singh 1 Overview In the last lecture we covered the distance to monotonicity (DTM) and longest

More information

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,

More information

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm

8 Priority Queues. 8 Priority Queues. Prim s Minimum Spanning Tree Algorithm. Dijkstra s Shortest Path Algorithm 8 Priority Queues 8 Priority Queues A Priority Queue S is a dynamic set data structure that supports the following operations: S. build(x 1,..., x n ): Creates a data-structure that contains just the elements

More information

2. This exam consists of 15 questions. The rst nine questions are multiple choice Q10 requires two

2. This exam consists of 15 questions. The rst nine questions are multiple choice Q10 requires two CS{74 Combinatorics & Discrete Probability, Fall 96 Final Examination 2:30{3:30pm, 7 December Read these instructions carefully. This is a closed book exam. Calculators are permitted. 2. This exam consists

More information

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 CS17 Integrated Introduction to Computer Science Klein Contents Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018 1 Tree definitions 1 2 Analysis of mergesort using a binary tree 1 3 Analysis of

More information

A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms

A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms A General Lower ound on the I/O-Complexity of Comparison-based Algorithms Lars Arge Mikael Knudsen Kirsten Larsent Aarhus University, Computer Science Department Ny Munkegade, DK-8000 Aarhus C. August

More information

Lecture 23: Alternation vs. Counting

Lecture 23: Alternation vs. Counting CS 710: Complexity Theory 4/13/010 Lecture 3: Alternation vs. Counting Instructor: Dieter van Melkebeek Scribe: Jeff Kinne & Mushfeq Khan We introduced counting complexity classes in the previous lecture

More information

CSE 202 Homework 4 Matthias Springer, A

CSE 202 Homework 4 Matthias Springer, A CSE 202 Homework 4 Matthias Springer, A99500782 1 Problem 2 Basic Idea PERFECT ASSEMBLY N P: a permutation P of s i S is a certificate that can be checked in polynomial time by ensuring that P = S, and

More information

Randomness and Computation March 13, Lecture 3

Randomness and Computation March 13, Lecture 3 0368.4163 Randomness and Computation March 13, 2009 Lecture 3 Lecturer: Ronitt Rubinfeld Scribe: Roza Pogalnikova and Yaron Orenstein Announcements Homework 1 is released, due 25/03. Lecture Plan 1. Do

More information

Optimal Color Range Reporting in One Dimension

Optimal Color Range Reporting in One Dimension Optimal Color Range Reporting in One Dimension Yakov Nekrich 1 and Jeffrey Scott Vitter 1 The University of Kansas. yakov.nekrich@googlemail.com, jsv@ku.edu Abstract. Color (or categorical) range reporting

More information

Streaming and communication complexity of Hamming distance

Streaming and communication complexity of Hamming distance Streaming and communication complexity of Hamming distance Tatiana Starikovskaya IRIF, Université Paris-Diderot (Joint work with Raphaël Clifford, ICALP 16) Approximate pattern matching Problem Pattern

More information

Lecture 21: Algebraic Computation Models

Lecture 21: Algebraic Computation Models princeton university cos 522: computational complexity Lecture 21: Algebraic Computation Models Lecturer: Sanjeev Arora Scribe:Loukas Georgiadis We think of numerical algorithms root-finding, gaussian

More information

Notes on induction proofs and recursive definitions

Notes on induction proofs and recursive definitions Notes on induction proofs and recursive definitions James Aspnes December 13, 2010 1 Simple induction Most of the proof techniques we ve talked about so far are only really useful for proving a property

More information

COMPLETION OF PARTIAL LATIN SQUARES

COMPLETION OF PARTIAL LATIN SQUARES COMPLETION OF PARTIAL LATIN SQUARES Benjamin Andrew Burton Honours Thesis Department of Mathematics The University of Queensland Supervisor: Dr Diane Donovan Submitted in 1996 Author s archive version

More information

1 Approximate Quantiles and Summaries

1 Approximate Quantiles and Summaries CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity

More information

Bloom Filters and Locality-Sensitive Hashing

Bloom Filters and Locality-Sensitive Hashing Randomized Algorithms, Summer 2016 Bloom Filters and Locality-Sensitive Hashing Instructor: Thomas Kesselheim and Kurt Mehlhorn 1 Notation Lecture 4 (6 pages) When e talk about the probability of an event,

More information

Big Data. Big data arises in many forms: Common themes:

Big Data. Big data arises in many forms: Common themes: Big Data Big data arises in many forms: Physical Measurements: from science (physics, astronomy) Medical data: genetic sequences, detailed time series Activity data: GPS location, social network activity

More information

Problem 5. Use mathematical induction to show that when n is an exact power of two, the solution of the recurrence

Problem 5. Use mathematical induction to show that when n is an exact power of two, the solution of the recurrence A. V. Gerbessiotis CS 610-102 Spring 2014 PS 1 Jan 27, 2014 No points For the remainder of the course Give an algorithm means: describe an algorithm, show that it works as claimed, analyze its worst-case

More information

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins.

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins. On-line Bin-Stretching Yossi Azar y Oded Regev z Abstract We are given a sequence of items that can be packed into m unit size bins. In the classical bin packing problem we x the size of the bins and try

More information

Dictionary: an abstract data type

Dictionary: an abstract data type 2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees

More information

RMT 2013 Power Round Solutions February 2, 2013

RMT 2013 Power Round Solutions February 2, 2013 RMT 013 Power Round Solutions February, 013 1. (a) (i) {0, 5, 7, 10, 11, 1, 14} {n N 0 : n 15}. (ii) Yes, 5, 7, 11, 16 can be generated by a set of fewer than 4 elements. Specifically, it is generated

More information

Self-improving Algorithms for Coordinate-Wise Maxima and Convex Hulls

Self-improving Algorithms for Coordinate-Wise Maxima and Convex Hulls Self-improving Algorithms for Coordinate-Wise Maxima and Convex Hulls Kenneth L. Clarkson Wolfgang Mulzer C. Seshadhri November 1, 2012 Abstract Computing the coordinate-wise maxima and convex hull of

More information

Limitations of Algorithm Power

Limitations of Algorithm Power Limitations of Algorithm Power Objectives We now move into the third and final major theme for this course. 1. Tools for analyzing algorithms. 2. Design strategies for designing algorithms. 3. Identifying

More information

Online Interval Coloring and Variants

Online Interval Coloring and Variants Online Interval Coloring and Variants Leah Epstein 1, and Meital Levy 1 Department of Mathematics, University of Haifa, 31905 Haifa, Israel. Email: lea@math.haifa.ac.il School of Computer Science, Tel-Aviv

More information

DD2446 Complexity Theory: Problem Set 4

DD2446 Complexity Theory: Problem Set 4 DD2446 Complexity Theory: Problem Set 4 Due: Friday November 8, 2013, at 23:59. Submit your solutions as a PDF le by e-mail to jakobn at kth dot se with the subject line Problem set 4: your full name.

More information

1 Some loose ends from last time

1 Some loose ends from last time Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Kruskal s and Borůvka s MST algorithms September 20, 2010 1 Some loose ends from last time 1.1 A lemma concerning greedy algorithms and

More information

Range-efficient computation of F 0 over massive data streams

Range-efficient computation of F 0 over massive data streams Range-efficient computation of F 0 over massive data streams A. Pavan Dept. of Computer Science Iowa State University pavan@cs.iastate.edu Srikanta Tirthapura Dept. of Elec. and Computer Engg. Iowa State

More information

CS 161: Design and Analysis of Algorithms

CS 161: Design and Analysis of Algorithms CS 161: Design and Analysis of Algorithms Greedy Algorithms 3: Minimum Spanning Trees/Scheduling Disjoint Sets, continued Analysis of Kruskal s Algorithm Interval Scheduling Disjoint Sets, Continued Each

More information

The Complexity of Constructing Evolutionary Trees Using Experiments

The Complexity of Constructing Evolutionary Trees Using Experiments The Complexity of Constructing Evolutionary Trees Using Experiments Gerth Stlting Brodal 1,, Rolf Fagerberg 1,, Christian N. S. Pedersen 1,, and Anna Östlin2, 1 BRICS, Department of Computer Science, University

More information

Randomized Algorithms III Min Cut

Randomized Algorithms III Min Cut Chapter 11 Randomized Algorithms III Min Cut CS 57: Algorithms, Fall 01 October 1, 01 11.1 Min Cut 11.1.1 Problem Definition 11. Min cut 11..0.1 Min cut G = V, E): undirected graph, n vertices, m edges.

More information

SMT 2013 Power Round Solutions February 2, 2013

SMT 2013 Power Round Solutions February 2, 2013 Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken

More information

Notes on Logarithmic Lower Bounds in the Cell Probe Model

Notes on Logarithmic Lower Bounds in the Cell Probe Model Notes on Logarithmic Lower Bounds in the Cell Probe Model Kevin Zatloukal November 10, 2010 1 Overview Paper is by Mihai Pâtraşcu and Erik Demaine. Both were at MIT at the time. (Mihai is now at AT&T Labs.)

More information

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine)

Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Lower Bounds for Dynamic Connectivity (2004; Pǎtraşcu, Demaine) Mihai Pǎtraşcu, MIT, web.mit.edu/ mip/www/ Index terms: partial-sums problem, prefix sums, dynamic lower bounds Synonyms: dynamic trees 1

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

Partitions and Covers

Partitions and Covers University of California, Los Angeles CS 289A Communication Complexity Instructor: Alexander Sherstov Scribe: Dong Wang Date: January 2, 2012 LECTURE 4 Partitions and Covers In previous lectures, we saw

More information

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0. Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound

More information

Homework 6: Solutions Sid Banerjee Problem 1: (The Flajolet-Martin Counter) ORIE 4520: Stochastics at Scale Fall 2015

Homework 6: Solutions Sid Banerjee Problem 1: (The Flajolet-Martin Counter) ORIE 4520: Stochastics at Scale Fall 2015 Problem 1: (The Flajolet-Martin Counter) In class (and in the prelim!), we looked at an idealized algorithm for finding the number of distinct elements in a stream, where we sampled uniform random variables

More information

Sliding Windows with Limited Storage

Sliding Windows with Limited Storage Electronic Colloquium on Computational Complexity, Report No. 178 (2012) Sliding Windows with Limited Storage Paul Beame Computer Science and Engineering University of Washington Seattle, WA 98195-2350

More information

Randomized Algorithms. Lecture 4. Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010

Randomized Algorithms. Lecture 4. Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010 Randomized Algorithms Lecture 4 Lecturer: Moni Naor Scribe by: Tamar Zondiner & Omer Tamuz Updated: November 25, 2010 1 Pairwise independent hash functions In the previous lecture we encountered two families

More information

Efficient Reassembling of Graphs, Part 1: The Linear Case

Efficient Reassembling of Graphs, Part 1: The Linear Case Efficient Reassembling of Graphs, Part 1: The Linear Case Assaf Kfoury Boston University Saber Mirzaei Boston University Abstract The reassembling of a simple connected graph G = (V, E) is an abstraction

More information

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved. Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should

More information

The Inclusion Exclusion Principle and Its More General Version

The Inclusion Exclusion Principle and Its More General Version The Inclusion Exclusion Principle and Its More General Version Stewart Weiss June 28, 2009 1 Introduction The Inclusion-Exclusion Principle is typically seen in the context of combinatorics or probability

More information

On the Complexity of Budgeted Maximum Path Coverage on Trees

On the Complexity of Budgeted Maximum Path Coverage on Trees On the Complexity of Budgeted Maximum Path Coverage on Trees H.-C. Wirth An instance of the budgeted maximum coverage problem is given by a set of weighted ground elements and a cost weighted family of

More information

Advanced Analysis of Algorithms - Midterm (Solutions)

Advanced Analysis of Algorithms - Midterm (Solutions) Advanced Analysis of Algorithms - Midterm (Solutions) K. Subramani LCSEE, West Virginia University, Morgantown, WV {ksmani@csee.wvu.edu} 1 Problems 1. Solve the following recurrence using substitution:

More information

Tree sets. Reinhard Diestel

Tree sets. Reinhard Diestel 1 Tree sets Reinhard Diestel Abstract We study an abstract notion of tree structure which generalizes treedecompositions of graphs and matroids. Unlike tree-decompositions, which are too closely linked

More information

CS246 Final Exam, Winter 2011

CS246 Final Exam, Winter 2011 CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including

More information

Lecture 4. 1 Circuit Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, Jonathan Katz

Lecture 4. 1 Circuit Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, Jonathan Katz Notes on Complexity Theory: Fall 2005 Last updated: September, 2005 Jonathan Katz Lecture 4 1 Circuit Complexity Circuits are directed, acyclic graphs where nodes are called gates and edges are called

More information

Counting and Constructing Minimal Spanning Trees. Perrin Wright. Department of Mathematics. Florida State University. Tallahassee, FL

Counting and Constructing Minimal Spanning Trees. Perrin Wright. Department of Mathematics. Florida State University. Tallahassee, FL Counting and Constructing Minimal Spanning Trees Perrin Wright Department of Mathematics Florida State University Tallahassee, FL 32306-3027 Abstract. We revisit the minimal spanning tree problem in order

More information