Streaming Algorithms for Extent Problems in High Dimensions

Size: px

Start display at page:

Download "Streaming Algorithms for Extent Problems in High Dimensions"

Clemence Elliott
6 years ago
Views:

1 Streaming Algorithms for Extent Problems in High Dimensions Pankaj K. Agarwal R. Sharathkumar Department of Computer Science Duke University {pankaj, Abstract We develop single-pass) streaming algorithms for maintaining extent measures of a stream S of n points in R d. We focus on designing streaming algorithms whose working space is polynomial in d polyd)) and sublinear in n. For the problems of computing diameter, width and minimum enclosing ball of S, we obtain lower bounds on the worst-case approximation ratio of any streaming algorithm that uses polyd) space. On the positive side, we introduce the notion of blurred ball cover and use it for answering approximate farthestpoint queries and maintaining approximate minimum enclosing ball and diameter of S. We describe a streaming algorithm for maintaining a blurred ball cover whose working space is linear in d and independent of n. 1 Introduction In many applications, data arrives rapidly as a stream of points and there is limited space to store the input. Algorithms in the streaming model have to work with one or few passes over the data, using small space. Motivated by a wide range of applications, including networking, databases and geographic information systems, there is extensive work on streaming algorithms. Often, it is impossible to compute exact solutions given the space constraints. Hence most of the work has focused on developing approximation algorithms and several novel techniques have been developed. See books [5, 7] for a comprehensive review. In this paper, we design streaming algorithms for several geometric problems in high dimensions. Problem statement. For a point x R d and a value r > 0, let Bx,r) denote a ball of radius r centered at x. For a ball B, let cb), rb) denote its center and radius Work on this paper is supported by NSF under grants CNS , CCF , CCF and DEB , by ARO grants W911NF and W911NF , by an NIH grant 1P50-GM , by a DOE grant OEG- P00A070505, and by a grant from the U.S. Israel Binational Science Foundation. respectively, and let 1 + ε)b denote the ε-expansion of B, i.e., BcB),1 + ε)rb)). Let S be a finite) set of points in R d. We use MEBS) to denote the smallest ball that contains S. For a parameter α > 1, a ball B is called an α- approximation of the minimum enclosing ball of S, or α-mebs) for brevity, if S B and rb) α rmebs)). A set C S is called an α-coresets) if S α MEBC). Let diams) denote any farthest pair diameteral pair) of points in S, and any pair of points r,s S such that, for every pair p,q S, pq α rs, is referred to as α-diams) α-diameteral pair). For any slab J, which is a region bounded by a pair h 1,h ) of parallel d 1)-dimensional hyperplanes, let dj) be the minimum distance between h 1 and h. Let widths) be any slab J such that S J and dj) is minimized. For α > 1, α-widths) is a slab J such that S J and dwidths)) dj ) α dwidths)). Finally, for a point x R d, an α-farthest-neighbor of x, α-fnx) is a point q S such that, for every p S, xp α xq. In this paper, we study the problems of maintaining α-mebs), α-coresets), α-widths), and α-diams), and of answering α-fnx) queries, in the streaming model. We assume that the input points arrive one by one, and the algorithm updates the information quickly using little space. Our goal is to design streaming algorithms whose working space and update time are polynomial in d and sub-linear in the size of S. Related work. There is extensive literature on fast algorithms for computing various extent measures, such as diameter, width, and smallest enclosing ball of a point set in low dimensions [, 14, 15, 17, 6, 8]. Agarwal et al. [1] have developed approximation algorithms for computing many extent measures of a set of n points in R d using coresets whose running time is On + 1/ε Od) ); see also [, 4, 11, 1]. Although these algorithms work well in small dimensions, the exponential dependency on d makes them unsuitable for higher dimensions. This has led to work on develop- 1

2 ing polynomial-time approximation algorithms in n, d and 1/ε) for different extent measures and other related problems. For example, Bădoiu and Clarkson [9] develop an elegant coreset-based algorithm that given a point set S R d computes a 1 + ε)-mebs) in time Ond/ε+1/ε 5 ). See [10, 1,, 4] for other similar results. Clarkson [16] presented a general framework that gives coreset-based algorithms for a number of geometric problems in high dimensions; see also [19]. Goel et al. [0] describe fast approximation algorithms for several proximity problems. For instance, they describe a data structure that answers -FNx) queries in Od) time after spending Ond) time in preprocessing. There is also some work on streaming algorithms for extent problems in high dimensions. Chan and Zarrabi- Zadeh [9] gave a streaming algorithm, which maintains a /)-MEBS), for a stream S R d, using Od) space. They also proved that any deterministic algorithm that stores only a single ball in its working space cannot maintain a better than α-mebs), for α 1+. Indyk [] proposed a streaming algorithm for maintaining α-diams) for α >, using Odn 1/α 1) log n) space. Recently Clarkson and Woodruff [18] described streaming algorithms for several problems in numerical algebra. Our results. We prove upper and lower bounds on the size of data structures for maintaining various extent measures under the streaming model. First, we present in Section, a data structure in the streaming model called the ε-blurred ball cover, for ε > 0, that maintains a subset K S of O1/ε )log1/ε)) points. Roughly speaking K is the union of a set of {K 1,...,K u }, u = O1/ε )log1/ε)), of subsets of S each of size O1/ε) so that S lies in the union of 1 + ε)mebk i ). The data structure can be updated in Od/ε )log1/ε)) amortized time. We then show in Section that K can be used to: i) compute a α-fnx), for any query x R d and for α = + ε; ii) maintain a α-diams) for α = + ε; iii) maintain a α-coresets), for α = + ε; iv) maintain a α-mebs) for α = 1+ + ε. Next, we show in Section 4 that any randomized streaming algorithm that maintains α-diams), α- widths), α-mebs), or α-coresets) with probability at least /, requires Ωmin S,expd 1/ ))) space for certain values of α. In particular, α < 1 /d 1/ ) for α-diams) and α-coresets), α d 1/ /8 for α- widths), and α < 1+ 1 /d 1/ ) for α-mebs). All lower bounds are obtained using known lower bounds on the one-round communication complexity of indexing [5]. Communication complexity has been widely used to prove lower bounds on the size of various data structures including in the streaming model [6, 7, 18]. Our algorithms for maintaining α-diams) and α- coresets) are close to optimal. Note the contrast between the lower and upper bounds for MEB a coreset based algorithm has a lower bound of, but we are able to circumvent this bound by, roughly speaking, maintaining a set of O1/ε )log1/ε)) balls and returning a ball that contains these balls. We can regard the centers of these balls as weighted points and thus we can get better results by maintaining a weighted weak coreset it is not a subset of input points. Although it is known that weak ε-nets are more powerful than ε-nets [1], we are unaware of similar results for coresets of MEB and other extent measures. Blurred Ball Cover This section defines the notion of blurred ball cover and describes an algorithm for maintaining such a cover in the streaming model. For a parameter 0 < ε 1, an ε- blurred ball cover of a set S of n points in R d, denoted by K = KS), is a sequence K 1,K,...,K u, where each K i S is a subset of O1/ε) points that satisfies the following three properties; let B i = MEBK i ) and K = i u K i. P1) For all 1 i < u, rb i+1 ) 1 + ε /8)rB i ); P) For all i < j, K i 1 + ε)b j ; P) For every p S, i u such that p 1 + ε)b i. Algorithm 1 describes a simple procedure called UpdateK,A), that given K := KS) and a set of points A R d, computes KS A). If we update K as each new point arrives, then A consists of a single point. However, as we will see below, it will be more efficient for some of our applications to update A in a batched mode. That is, newly arrived points are stored in a buffer A and when its size exceeds certain threshold, Update procedure is called to update K. Update relies on a procedure Approx-MEBZ,ε) that takes a set Z of points and a parameter 0 < ε 1 and returns a set G Z of O1/ε) points and B = MEBG) such that Z 1 + ε)b. Bădoiu and Clarkson [9] have described such a procedure with Od Z /ε + 1/ε 5 ) running time. For efficiency reasons, we explicitly maintain B i = MEBK i ) for every K i K. Update procedure. If there is a point in A that does not lie in the union of the ε-expansions of B i s, then the Update procedure invokes Approx-MEB on K A

3 B i c i c B i B q h c i c B q a) b) Figure 1: Proof of Lemma.. a) c i c εr i /, b) c i c > εr i /. with approximation ratio ε/. Let K K be the point set and B = MEBK ) be the ball returned by Approx-MEB. The Update procedure adds K to K and then deletes all K i s for which rk i ) εrk )/4. Algorithm 1 UpdateK, A) 1: if p A, i u, p 1 + ε)b i then : K,B := Approx-MEBK A),ε/) : K D := {K i rb i ) εrb )/4} 4: K := K \ K D ) K 5: end if Correctness. The proof of correctness relies on the following well-known observation; see e.g. [10]. Lemma.1. Let P be a set of points in R d and let B = MEBP). Then any closed halfspace that contains cb) also contains at least one point of P that is on B. Lemma.. For any i < u, rb i+1 ) 1+ε /8)rB i ). Proof. Update procedure adds K i+1 to K only if there is a point q A such that q 1 + ε)b i. Let B = MEB K {q}); rb i+1 ) rb ). Let r = rb ), c = cb ), c i = cb i ), and r i = rb i ). We prove that r > 1 + ε /8)r i. If c i c εr i / Figure 1a)), then r c q c i q c c i 1 + ε)r i εr i / 1 + ε /8)r i. On the other hand, if c c i > εr i / Figure 1b)), then let h be the hyperplane passing through c i with the direction c i c as its normal, and let h + be the halfspace, bounded by h, that does not contain c. By Lemma.1, there is a point q K i h + that is at a distance r i from c i. Therefore r q c c i c + q c i ) 1/ εr i /) + r i ) 1/ 1 + ε /8)r i. We now prove that P1) P) are maintained by Update. If for every p A, there is an i u such that p 1+ε)B i, then the set K does not change, and P1) P) continue to hold. Hence, assume that there is a p A such that p B i for all i u. By Lemma., property P1) continues to hold after each update. Note that if p 1 + ε)b i for all 1 i u, then Update computes a 1+ε/)-MEBK A). Since K is the only subset added to K, K 1 + ε)b and K is added at the end of the sequence, K i 1+ε)B. Thus property P) holds after each Update. Note that a prefix K D is deleted from K, so P) may be violated. However, the next two lemmas prove that P) is also satisfied. Lemma.. For i < j, cb i ) 1 + ε)b j. Proof. Suppose, on the contrary, that cb i ) 1+ε)B j. Let γ be a halfspace that contains cb i ) but γ 1 + ε)b j =. By Lemma.1, γ contains a point p K i, but p 1+ε)B j, which contradicts the fact K i 1+ε)B j by P)). Hence, cb i ) 1 + ε)b j. Lemma.4. For all K i K D, 1 + ε)b i 1 + ε)b. Proof. Let c = cb ), r = rb ), c i = cb i ), and r i = rb i ). Let K i K D, then r i εr /4. Since K i 1 + ε/)b, the proof of Lemma. implies cb i ) 1 + ε/)b. For any point x 1 + ε)b i, xc c c i + c i x c c i ε)r i 1 + ε/)r + ε1 + ε)r /4 1 + ε)r.

4 Therefore q 1+ε)B and thus 1+ε)B i 1+ε)B. Lemma.4 immediately implies that for any K i K D, if there is a point q 1+ε)B i then q 1+ε)B. Hence, for every q S, there is an i such that q 1 + ε)b i, implying P). Size and update time. Let r i = rb i ). Update ensures that r u /r 1 4/ε. By P1), r i ε /8)r i. Therefore u log 1+ε /84/ε) = O1/ε )log1/ε)). Hence K K i = O1/ε )log1/ε)). Note that Update spends Ou. A ) = Od A /ε )log1/ε)) time to test the condition in line 1. The time spent in computing K and B in line is O A + K )d/ε + 1/ε 5 )) or Od A /ε) + d/ε 4 )log1/ε) + 1/ε 5 )). If we update K after the arrival of each new point, then the update time is Od/ε 5 ). However, if we batch the updates and invoke Update only after O1/ε ) points have arrived, the total time spent will be Od/ε 5 )log1/ε)) which is the time taken by line 1 of the update procedure. The amortized time for Update will then be Od/ε )log1/ε)). We thus conclude the following. Theorem.1. For any 0 < ε 1, an ε-blurred ball cover of size Od/ε )log1/ε)) of a stream of points can be maintained in Od/ε )log1/ε)) amortized time. Applications of Blurred Ball Cover We now describe streaming algorithms for answering + ε)-fnx) queries and for maintaining + ε)- diams), + ε)-mebs) and 1+ + ε)-mebs) using the blurred ball cover. We use the following simple inequality repeatedly. For any x, y R,.1) x + y) x + y ) 1/. Farthest-neighbor queries. We maintain an ε- blurred ball cover K and update it in the batched mode. Let A be the set of newly arrived points in S that have not been processed yet. Set Q = K A; Q = O1/ε )log1/ε)). For any query point x R d, we return the point of Q that is farthest from x. We claim that for any x R d, max xp + ε)max qx. p S q Q by P), there is an i u such that p 1 + ε)b i. Assuming c i = cb i ) and r i = rb i ),.) xp xc i + c i p xc i ε)r i. Let h be the hyperplane passing through c i and normal to xc i and let h + be the halfspace, bounded by h, that does not contain x. By Lemma.1, there is a point z K i such that zc i = r i. Since xc i z is obtuse,.) xz xc i + c i z ) 1/ xc i + r i ) 1/. By combining.) and.) and using.1), xp xc i + r i ) 1/ + εr i + ε) xz + ε) xq. We conclude the following. Theorem.1. For a stream S of points in R d and a parameter 0 < ε 1, there is a data structure of size Od/ε )log1/ε)) that answers + ε)-fnx) in time Od/ε )log1/ε)). The amortized update time is Od/ε )log1/ε)). Diameter. For a point set S, we maintain a α-diams), for α = + ε, using the above data structure for answering α-fnx) queries. Let p,q be the current α- diameteral pair maintained by the algorithm. When a new point z arrives, we compute the w = α-fnz). If wz > pq, we replace p,q) with w,z). We conclude the following. Theorem.. For a stream S of n points in R d, there is a data structure of size Od/ε )log1/ε)) that maintains a + ε)-diams). The amortized update time of this structure is Od/ε )log1/ε)). Coreset for minimum enclosing ball. For a stream S of points, we maintain an ε-blurred ball cover K and update it in the batched mode. Let A be the set of points not processed by Update and let Q = K A. Let B = MEBQ), c = cb) and r = rb). We argue that for every K i K, 1 + ε)b i + ε)b. Lemma.1. For all i u, 1 + ε)b i + ε)b. Let p = arg max p S px and q = arg max q Q qx. If p A then p = q and the claim is obviously true. If p S \A, i.e., p has been processed by Update, then, 4 Proof. For any ball B i in the blurred ball cover, let c i = cb i ) and r i = rb i ). Observe that K i B. Let t = arg max t 1+ε)Bi tc. t c = cc i ε)r i.

5 By Lemma.1, there is a point p K i such that pc cc i + ri. But pc r. Hence, t c pc cc i ε)r i cci + ri + ε, or t c +ε) pc +ε)r. Thus 1+ε)B i + ε)b. Lemma.1 in conjunction with property P) of blurred ball cover implies that S + ε)b. Thus Q is a + ε)-coresets). We conclude the following. Theorem.. For a stream S of points in R d and a parameter 0 < ε 1, there is a data structure of size Od/ε )log1/ε)) that maintains a +ε)-coresets) of minimum enclosing ball. The amortized update time of this structure is Od/ε )log1/ε)). Minimum enclosing ball. For a stream S of points, we maintain a ε/9)-blurred ball cover K of S. K is updated whenever a new point arrives. Let B = {B 1,...,B u }. Let B = MEBB). We return 1 + ε/)b which can be computed in time O1/ε 5 ) []. Hence the total update time is O1/ε 5 ). Let ˆr = rmebs)). Lemma.. rb 1 + ) ) + ε/ ˆr. Proof. Let K = K i B i B and B = {B i K i K }. Note that any subsequence of K, and hence K, satisfies properties P1) and P) of blurred ball cover. Let B t be the largest ball in B. Let c t = cb t ), r t = rb t ),c = cb ),r = rb ) and let k = r /r t. Observe that c t c = r r t = k 1)r t. Let h be a hyperplane passing through c with a normal parallel to c t c. Let h + be the halfspace bounded by h and that does not contain c t. By Lemma.1, there is a point b i B B i, for some i, such that b i c = r = kr t. Since c t c b i is obtuse, we have.4) c t b i k 1) rt + k rt. Since the proof of Lemma.1 requires only P) of blurred ball cover, it holds for any subsequence of K. In particular, it holds for K. Hence, for all B i B, B i + ε/9)b t, implying that.5) c t b i + ε/9)r t. Combining.4) and.5), we have k 1) r t + k r t + ε/9) r t 1 + ε/9) r t. 5 Solving this quadratic equation, we can deduce that, k 1+ + ε/. Thus r 1 + ) kr t kˆr + ε/ ˆr. By P), S i u1 + ε/9)b i 1 + ε/9)b. By Lemma., we have r 1 + ) + ε/ ˆr. Thus 1 + ε/)r implying that 1 + ε/)b 1 + ) + ε/ 1 + ) + ε ˆr, is indeed a MEBS). We conclude the following. 1 + ε/)ˆr 1+ ) + ε - Theorem.4. Given a stream S of points in R d, there is a data structure of size Od/ε )log1/ε)) that maintains a +1 + ε)-mebs). The worst-case) update time of this structure is Od/ε 5 ). Remark. A slightly better bound on r, the radius of the ball returned by the above algorithm can be obtained using a more involved argument, While, from the lower bounds on α-mebs) in Section 4 it follows that r > 1+, the following example shows that one cannot hope to prove r = 1+ + ε. Let the input be a stream S = p 1,p,p,p 4,..., where p 1 = 1 1,, ), p = 1 1,, ), p = 1, 1,0), p 4 = 1,0,0), and p 5,p 6,..., are points on a unit sphere in R. When the points in S arrive in this order, after the arrival of p 4, the blurred ball cover will consist of K = {{p 1,p }, {p 1,p,p }, {p 1,p,p,p 4 }}. The three balls {B 1,B,B } of K are described by ) ) 1 1 B 1 = B,,0,, ) ) 1 B = B,0,0 1,, B = B0,0,0),1).

6 Let B = MEBB 1 B ) with rb ) = 1 + )/ and cb ) = 1/ ) 1/),0,0). Let q be the farthest point of B from cb ). qcb ) = / > 1+. We evaluate B to be θ-mebs) where θ Ct,/d 1/ ) C t,/d 1/ ). Let t resp. t ) be a point on the boundary of Cs,/d 1/ ) resp. C s,/d 1/ )). Then st st st. A simple trignometric calculation shows that see Figure a)) 4 Lower Bounds In this section we show that any randomized streaming algorithm that maintains α-diams), α-widths), α- MEBS), or α-coresets) with probability at least /, requires Ωmin S,expd 1/ ))) space for certain values of α. In particular, α < 1 /d 1/ ) for α-diams) and α-coresets), α d 1/ /8 for α-widths), and α < 1+ 1 /d 1/ ) for α-mebs). Let S d 1 denote d 1)-dimensional unit sphere centered at origin. We show how to sample points from S d 1 which will be crucial for proving our lower bounds. Sampling in S d 1. Let u S d 1 and ε 0,1). Let H be the hyperplane x,u = ε, i.e., H is the hyperplane that is at distance ε from the origin and normal to the vector u. H divides S d 1 into two spherical regions. We refer to the smaller spherical region as a spherical cap and denote it by Cu,ε). We define its measure to be µu,ε) = SACu,ε)) SAS d 1 ), where SAX) is the surface area of X. known [8] that 4.6) µu,ε) exp dε /). It is well Lemma 4.1. There is a centrally symmetric point set K S d 1 of size Ωexpd 1/ )) such that for any pair of distinct points p,q K if p q, then 1 /d 1/ ) pq 1 + /d 1/ ). Proof. The set K is constructed incrementally. We maintain the invariant that if p,q K and q p then q Cp,/d 1/ ). We also maintain a centrally symmetric region F = S d 1 \ q K Cq,/d1/ ). Initially K = and F = S d 1. If F, we choose an antipodal pair of points p, p from F. By construction, p, p Cq,/d 1/ ) for any point q K. Moreover, by symmetry, no point in K lies in Cp,/d 1/ ) C p,/d 1/ ). We add p and p to K. We set F = F \ { Cp,/d 1/ ) C p,/d 1/ ) } to ensure that the invariant holds after adding p and p. The algorithm terminates when F =. By 4.6), µp,/d 1/ )+µ p,/d 1/ ) exp d 1/ ), therefore the above algorithm performs Ωexpd 1/ )) steps before it stops. Hence K = Ωexpd 1/ )). Let s,t K be such that s t, t. By construction, s 6 st st 1 + /d 1/ ) + 1 4/d / ) 1 + /d 1/ ) 1/ 1 + /d 1/ ). Similarly one can show st st 1 /d 1/ ). ) 1/ Diameter. The problem Index is defined as follows. Let Alice and Bob be two players. Suppose Alice has a binary string σ {0,1} k and Bob has an index i [1 : k]. For any j [1 : k], let σ j be the bit that appears in the jth position of σ. Alice sends a message to Bob after which Bob needs to determine σ i with probability at least /. It is well-known that the length of any message sent by Alice that helps Bob determine σ i with probability at least / is Ωk) [5]. We choose the smallest d so that k expd 1/ ). Hence k = Θexpd 1/ )). Let A be a streaming algorithm that maintains α-diams) for α 1 /d 1/ ) with probability at least /. We choose expd 1/ ) points on S d 1 using Lemma 4.1. Let L be the subset of at least k of these points that lie on the hemisphere x 1 0. We sort L in lexicographic order, which induces a map ϕ : [1 : k] L. The map ϕ is independent of the input string σ and the index i. Hence we can assume that both Alice and Bob know ϕ without communicating any bits. For any σ {0,1} k, let ϕσ) = {ϕx) σ x = 1}. Alice adds ϕσ) as input to A in an arbitrary order. Then, Alice communicates the working space of A to Bob. In order to determine the σ i, Bob adds as input, ϕi) to A. Let p,q S be the pair of points returned by A at the end of the protocol. If p = q, Bob determines σ i = 1. Else, Bob determines σ i = 0. The following lemma shows the correctness of this protocol. Lemma 4.. For any σ {0,1} k and i [1 : k]. Let S = ϕσ) { ϕσ i )} and α 1 /d 1/ ). Every α-diams) pair is an antipodal pair if and only if σ i = 1. Proof. Since all points in ϕσ) lie in one hemisphere, there is an antipodal pair of points in S if and only if ϕi) ϕσ), i.e. σ i = 1. If σ i = 0, then ϕi) ϕσ)

7 s t t t p 1 ε ε ϕi) ϕi) s a) q i 1 + ǫ) b) Figure : a) Proof of Lemma 1, b) Lower bound for α-mebs); here ε = /d 1/. and A does not return an antipodal pair of points. On the other hand, suppose σ i = 1 and suppose there is an α-diameteral pair p,q S such that p q. Since {ϕσ i ), ϕσ i )} S, diams). From Lemma 4.1, it follows that pq 1 + /d 1/ ), i.e., diams) > α pq for α 1 /d 1/ ). Thus p,q is not an α- diameteral pair implying that α-diams) is an antipodal pair of points. Since A returns an α-diams) with probablity at least /, Bob determines σ i correctly with the same probability. Since the communication complexity of any randomized algorithm for the indexing problem is Ωk) = Ωexpd 1/ )), it follows that the work space of A is Ωexpd 1/ )). We thus conclude the following. Theorem 4.1. Any streaming algorithm that maintains an α-diams) of a set S of n points in R d, for α < 1 /d 1/ ), with probability at least / requires Ωmin { n,expd 1/ ) } ) bits of storage. Minimum enclosing ball. Let A be an algorithm that maintains an α-mebs), for α 1+ 1 /d 1/ ) with probability at least /. The map ϕ is defined as above. Alice passes points in ϕσ) as input to A in an arbitrary order and communicates the working space of A to Bob. For index i, let q i be the point that is in the direction ϕi) and at distance 1 + /d 1/ ) resp /d 1/ )) from ϕi)resp. ϕi)); see Figure b). Bob adds q i as input to A. If A returns a ball of radius at least ), then Bob declares σ i = 1. Otherwise, Bob declares σ i = 0. Let S = {ϕσ) {q i }}. Note that ϕi)q i = /d 1/ ). Hence if σ i = 1, then ϕi),q i S and hence the radius of any ball containing S must be at least /d 1/ ) from ϕi). Since, q i ϕi)) = 1 + /d 1/ ), the radius of MEBS) is at most 1+/d 1/ ). In this case, the radius of any α-mebs) is at most α rmebs) ), for α < 1+ 1 /d 1/ ). Since A outputs a correct α-mebs) with probability at least /, Bob can determine σ i with the same probability. We can thus conclude that the workspace of A is Ωexpd 1/ )) and obtain the following. Theorem 4.. Any streaming algorithm that maintains an α-mebs) of a set S of n points in R d, for α < 1+ 1 /d 1/ ), with probability at least / requires Ωmin { n,expd 1/ ) } ) bits of storage. Coreset for minimum enclosing ball. The reduction to α-mebs) can be modified to obtain lower bounds on α-coreset. Let A be an algorithm that maintains a α-coresets) for α 1 /d 1/ ) with probability /. Let ϕ be a map as defined previously. Alice passes points in ϕσ) as input to A in an arbitrary order and communicates the working space of A to Bob. For index i, q i is the point as defined above; see Figure b). Let C be α-coreset maintained by A. For index i, Bob adds q i as input to A and checks whether ϕi) C. If ϕi) C, Bob reports σ i = 1. Otherwise, Bob reports σ i = 0. The correctness of this reduction follows from the following lemma. Lemma 4.. If C is an α-coresetϕσ) {q i }), for α < 1 /d 1/ ) then, i) q i C, and ii) ϕi) C if and only if σ i = 1. ϕi)q i / = /d 1/ ))/ > On the other hand, if σ i = 0, then ϕi) S. By Lemma 4.1, all points in ϕσ) are within distance 1+ 7 Proof. Let S = ϕσ) {q i }. First, we prove i). Suppose, on the contrary, that q i C and hence C ϕσ). Let B d be the unit ball centered at origin. Let β be MEBC). Since C S d 1, rβ) 1 and

8 cβ) B d. Observe that, for any point t B d, tq i 1 + /d 1/ ). Hence, cβ)q i 1 + /d 1/ ) 1 + /d 1/ )rβ) > α rβ), leading to a contradiction that C is an α-coresets). Now, we prove ii). If σ i = 0, then ϕi) S and C S implies ϕi) C. On the other hand, suppose σ i = 1 but ϕi) C. Let β = MEBC). Observe that all points in C are within a distance 1+/d 1/ ) from ϕi). Hence rβ) 1 + /d 1/ ). It follows from i) that the center cβ) β, where β = Bq i, 1 + /d 1/ )). For any point p β, pϕi), therefore cβ)ϕi) 1 /d 1/ ) 1 + /d 1/ ) α rβ), for α < 1 /d 1/ ), leading to a contradiction that C is an α-coresets). Hence, we conclude the following. Theorem 4.. Any streaming algorithm that maintains an α-coresets) of a set S of n points in R d, for α < 1 /d 1/ ), with probability at least / requires Ωmin { n,expd 1/ ) } ) bits of storage. Width. Let B be a ball centered at origin with rb) = 1/. For any vector u S d 1, let h u be a hyperplane passing through the origin and normal to u, i.e., x,u = 0. Let N u h u be a set of d points such that convn u { u,u}) contains B; see Figure. For any σ {0,1} k, let ϕσ) = { p p ϕσ)}. v v 1 u u Figure : N u = {v 1,v,v }, B convn u { u,u}). B v h u to A in an arbitrary order and then communicates the working space of A to Bob. For index i, Bob adds N ϕi) to A. If A reports a slab J such that dj) is at least 1, then Bob reports σ i = 1 and otherwise Bob reports σ i = 0. The correctness of the reduction follows from the following observation. Let S = {ϕσ) ϕσ) N ϕi) }. If σ i = 1, then {N ϕi) { ϕi),ϕi)}} S. By construction of N ϕi), B convn ϕi) {ϕi), ϕi)}). Since rb) = 1/, dj) dwidths)) 1. On the other hand, if σ i = 0, then ϕi),ϕi)) S. Let W be a slab bounded by the two hyperplanes x,ϕi) = /d 1/ and x,ϕi) = /d 1/. By Lemma 4.1 and the fact that N ϕi) h ϕi) W, we have S W and dwidths)) 4/d 1/. For α < d 1/ /8, dj) α dwidths)) α 4/d 1/ 1/. Hence, we conclude the following. Theorem 4.4. Any streaming algorithm that maintains an α-widths) of a set of n points in R d, for α < d 1/ /8, with probability at least /, requires Ωmin { n,expd 1/ ) } ) bits of storage. 5 Conclusion In this paper, we design streaming algorithms for maintaining several extent measures on high-dimensional point sets. Our approach is based on the notion of blurred ball cover that, for a stream S of points, maintains a subset K S. K can be interpreted as the union of a set of {K 1,...,K u }, u = O1/ε )log1/ε)), of subsets of S each of size O1/ε) so that S lies in the union of 1+ε)MEBK i ). We provide a simple streaming algorithm for maintaing blurred ball cover. We then use blurred ball cover to provide streaming algorithms for various extent measures. We conclude by stating related problems. Is there a fully dynamic data structure, whose size is linear in n and polyd,1/ε), that maintains a 1 + ε)-mebs)? Can the concept of blurred ball cover be generalized for maintaining approximate smallest encosing convex shape of a high dimensional point set, e.g., minimum enclosing ellipsoids, under the streaming model? The Index problem can be reduced to α-widths) as follows. Let A be a streaming algorithm that maintains α-widths), for α < d 1/ /8, with probability at least /. Alice passes points in {ϕσ) ϕσ)} 8 References [1] P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan, Approximating extent measures of points, J. ACM, ),

9 [] P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan, Geometric approximation via coresets, in: Combinatorial and Computational Geometry J. Goodman, J. Pach, and E.Welzl, eds.), Cambridge University Press, 005, pp [] P. K. Agarwal, J. Matoušek, and S. Suri, Farthest neighbors, maximum spanning trees and related problems in higher dimensions, Comput. Geom. Theory Appl., 1 199), [4] P. K. Agarwal and H. Yu, A space-optimal data-stream algorithm for coresets in the plane, Proc. rd Annual Sympos. Comput. Geom., 007, pp [5] C. Aggarwal, Data Streams: Models and Algorithms, Springer, 007. [6] N. Alon, Y. Matias, and M. Szegedy, The space complexity of approximating the frequency moments, J. Comput. Syst. Sci., ), [7] A. Andoni, D. Croitoru, and M. Patrascu, Hardness of nearest neighbor under L-infinity, Proc. 49th Annual IEEE Sympos. Foundations of Comp. Sc., 008, pp [8] K. Ball, An elementary introduction to modern convex geometry, in Flavors of Geometry, ), [9] M. Bădoiu and K. L. Clarkson, Optimal core-sets for balls, Comput. Geom. Theory Appl., ), 14. [10] M. Bădoiu, S. Har-Peled, and P. Indyk, Approximate clustering via core-sets, Proc. 4th Annual ACM Sympos. Theory Comput., 00, pp [11] T. M. Chan, Approximating the diameter, width, smallest enclosing cylinder, and minimum-width annulus, Intl. J. Comput. Geom. Appl., 1 00), [1] T. M. Chan, Faster core-set constructions and datastream algorithms in fixed dimensions, Comput. Geom. Theory Appl., 5 006), 0 5. [1] B. Chazelle, H. Edelsbrunner, M. Grigni, L. Guibas, M.Sharir, and E. Welzl, Improved bounds on weak ε-nets for convex sets, Discrete Comput. Geom., ), [14] B. Chazelle and J. Matoušek, On linear-time deterministic algorithms for optimization problems in fixed dimension, J. Algorithms, ), [15] K. L. Clarkson, Las Vegas algorithms for linear and integer programming when the dimension is small, J. ACM, ), [16] K. L. Clarkson, Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, Proc. 19th Annual ACM-SIAM Sympos. Discrete Algorithms, 008, pp [17] K. L. Clarkson and P. W. Shor, Applications of random sampling in computational geometry, II., Discrete Comput. Geom., 1987), 195. [18] K. L. Clarkson and D. P. Woodruff, Numerical linear algebra in the streaming model, Proc. 41st Annual ACM Sympos. Theory of Comput., 009, pp [19] B. Gärtner and M. Jaggi, Coresets for polytope distance, Proc. 5th Annual Sympos. Comput. Geom., 009, pp. 4. [0] A. Goel, P. Indyk, and K. Varadarajan, Reductions among high dimensional proximity problems, Proc. 1th Annual ACM-SIAM Sympos. Discrete Algorithms, 001, pp [1] S. Har-Peled and K. R. Varadarajan, Projective clustering in high dimensions using core-sets, In Proc. 18th Annual Sympos. Comput. Geom, 00, pp [] P. Indyk, Better algorithms for high-dimensional proximity problems via asymmetric embeddings, Proc. 14th Annual ACM-SIAM Sympos. Discrete Algorithms, 00, pp [] P. Kumar, J. S. B. Mitchell, and E. A. Yildirim, Approximate minimum enclosing balls in high dimensions using core-sets, J. Exp. Algorithmics, 8 00), 1.1. [4] P. Kumar and E. A. Yildirim, Minimum-Volume Enclosing Ellipsoids and Core Sets, J. Optimization Theory and Appl., ), 1 1. [5] E. Kushilevitz and N. Nisan, Communication Complexity, Cambridge University Press, [6] J. Matoušek, M. Sharir, and E. Welzl, A subexponential bound for linear programming, Algorithmica, ), [7] S. Muthukrishnan, Data Streams: Algorithms and Applications, Now Publishers Inc., 005. [8] E. A. Ramos, An Optimal Deterministic Algorithm for Computing the Diameter of a three-dimensional point set, Discrete Comput. Geom., 6 001), 44. [9] H. Zarrabi-Zadeh and T. Chan, A simple streaming algorithm for minimum enclosing balls, Proc. 18th Annual Canadian Conf. Comput. Geom., 006, pp

Geometric Optimization Problems over Sliding Windows

Geometric Optimization Problems over Sliding Windows Timothy M. Chan and Bashir S. Sadjad School of Computer Science University of Waterloo Waterloo, Ontario, N2L 3G1, Canada {tmchan,bssadjad}@uwaterloo.ca