High Dimensional Clustering with r-nets

Size: px

Start display at page:

Download "High Dimensional Clustering with r-nets"

Christal Stevenson
5 years ago
Views:

1 High Dimensional Clusteing with -nets Geogia Avaikioti, Alain Ryse, Yuyi Wang, Roge Wattenhofe ETH Zuich, Switzeland Abstact Clusteing, a fundamental task in data science and machine leaning, goups a set of objects in such a way that objects in the same cluste ae close to each othe than to those in othe clustes. In this pape, we conside a well-known stuctue, so-called -nets, which igoously captues the popeties of clusteing. We devise algoithms that impove the untime of appoximating -nets in high-dimensional spaces with l 1 and l 2 metics fom Õ(dn2 Θ( ) ) to Õ(dn + n2 α ), whee α = Ω( 1/3 /log(1/)). These algoithms ae also used to impove a famewok that povides appoximate solutions to othe high dimensional distance poblems. Using this famewok, seveal impotant elated poblems can also be solved efficiently, e.g., (1 + )-appoximate kth-neaest neighbo distance, (4 + )-appoximate Min-Max clusteing, (4+)-appoximate k-cente clusteing. In addition, we build an algoithm that (1 + )-appoximates geedy pemutations in time Õ((dn + n2 α ) log Φ) whee Φ is the spead of the input. This algoithm is used to (2 + )-appoximate k-cente with the same time complexity. Intoduction Clusteing aims at gouping togethe simila objects, whee each object is often epesented as a point in a high dimensional space. Clusteing is consideed to be a conestone poblem in data science and machine leaning, and as a consequence thee exist multiple clusteing vaiants. Fo instance, while each cluste may just be epesented as a set of points, it is often advantageous to select one point of the data set as a pototype fo each cluste. A significant fomal epesentation of such a pototype clusteing is known as -nets. Given a lage set of n data points in d-dimensional space, an -net is a subset (the pototypes) of these data points. This subset needs to fulfill two popeties: Fist, balls of adius aound each of the pototypes need to contain evey point of the whole data set (coveing). Second, we must ensue that the pototypes ae well sepaated, i.e., no ball contains the cente of any othe ball (packing). Appoximate -nets lift the coveing constaint a tiny bit by allowing balls to have a slightly lage adius than, while peseving the packing popety, i.e., any two pototypes still need to have at least distance. Copyight c 2019, Association fo the Advancement of Atificial Intelligence ( All ights eseved. Thoughout this pape, we assume data sets to be lage and high dimensional. We theefoe assume the numbe of featues d of each object to be non-constant. This leads to inteesting and impotant poblems, as this assumption foces us to think about algoithms whose untime is sub-exponential (pefeably linea) in the numbe of featues d. In addition, we want ou untime to be sub-quadatic in the size n of ou data. In this pape we lay theoetical goundwok, by showing impoved algoithms on the appoximate -net poblem and applications theeof. Related Wok Thee is not a unique best clusteing citeion, hence many methods (Estivill-Casto 2002) ae poposed to solve the clusteing poblem fo diffeent applications (e.g., (Sibson 1973; Defays 1977; Lloyd 1982; Kiegel et al. 2011)), which makes it difficult to systematically analyze clusteing algoithms. In ou pape we will make use of so-called polynomial theshold functions (PTF), a poweful tool developed by (Alman, Chan, and Williams 2016). PTFs ae distibutions of polynomials that can efficiently evaluate cetain types of Boolean fomulas with some pobability. They ae mainly used to solve poblems in cicuit theoy, but wee also used to develop new algoithms fo othe poblems such as appoximate all neaest neighbos o appoximate minimum spanning tee in Hamming, l 1 and l 2 spaces. In the following, we employ this method to develop an algoithm that computes appoximate -nets. The algoithmic famewok Net & Pune, was developed by (Ha-Peled and Raichel 2015). It is able to solve so called nice distance poblems, when povided with a suitable data stuctue fo the poblem. These data stuctues ae often constucted by exploiting -nets. A majo dawback of the famewok is its estiction to a constant numbe of featues. Consequentially, this famewok was late extended by (Avaikioti et al. 2017) to also solve highe dimensional cases. The algoithm, constucted in this pape, yields an immediate impovement on this famewok, as the constuction of the famewok is based aound appoximate -nets. We also pesent vaious of the peviously mentioned data stuctues that we plug into the famewok to solve high dimensional distance optimization poblems. Recent wok by (Eppstein, Ha-Peled, and Sidiopoulos

2 2015) suggests a way of constucting appoximate geedy pemutations with appoximate -nets. Geedy pemutations imply an odeing of the data, which povide a solution to 2-appoximate k-cente clusteing as shown by (Gonzalez 1985). We pesent a simila constuction, by applying appoximate geedy pemutations. An appoach on hieachical clusteing was pesented by (Dasgupta 2002). They constuct an algoithm based on futhest fist tavesal, which is essentially building a geedy pemutation and then tavesing the pemutation in ode. In (Fen and Bodley 2003) they pesent how andom pojections, in pactice, can be applied to educe the dimension of given data. We late employ a simila appoach, namely andom pojections to lines, to educe the appoximate -net poblem with l 1 metics to a low-dimensional subspace. Ou Contibution This pape pesents new theoetical esults on the constuction of (1 + )-appoximate -nets, impoving the pevious uppe bound of Õ(dn2 Θ( ) ) by (Avaikioti et al. 2017). We denote n as the numbe of data points, d the dimension of the data and α = Ω( / log( 1 )) fo an abitay eo paamete. The algoithm builds appoximate -nets in Hamming, l 1 and l 2 spaces, unning in Õ(n2 α +n 1.7+α d) 1 time in both Hamming and Euclidean space. We also modify ou algoithm to yield an impovement on the Net & Pune famewok of (Avaikioti et al. 2017). Supplying the famewok with cetain data stuctues, which ae ceated using appoximate -nets, we deive new algoithms with impoved untime on (1 + )-appoximate k-smallest neaest neighbo distance, (4 + )-appoximate Min-Max Clusteing, intoduced in (Ha-Peled and Raichel 2015), and (4 + )-appoximate k-cente clusteing. These algoithms un in Õ(dn + n2 α ) fo data sets in l 1 o l 2 spaces. With the exception of appoximate k-smallest neaest neighbo, this is, to ou knowledge, the fist time this famewok is used to solve these poblems in high dimensional spaces. We late also design a new algoithm to (2 + )-appoximate k-cente clusteing, by deiving an impoved vesion of the algoithm fo (1 + )-appoximate geedy pemutations in (Eppstein, Ha-Peled, and Sidiopoulos 2015). Both of these algoithms have a untime of Õ((dn + n2 α ) log Φ), whee Φ denotes the spead of the data. We define the spead of a dataset as the faction of the diamete ove the shotest distance of the gaph. The omitted poofs can be found in the appendix. Appoximate -nets In this section, we pesent an algoithm that builds appoximate -nets in l 1 and l 2 spaces. To that end, we fist deive an algoithm, that constucts appoximate -nets in Hamming space. We late show how to educe the poblem fom l 1 o l 2 to Hamming space. 1 The Õ notation thoughout the pape hides logaithmic factos in n and polynomial tems in 1 Appoximate -net in Hamming Space Building appoximate -nets in Euclidean space is computationally expensive. Theefoe, we initially estict ouselves to datapoints on the vetices of a high dimensional hypecube. The distance between any two datapoints is then measued by the Hamming distance. In the following, we define the notion of appoximate -nets in this metic space, whee the eo is additive instead of multiplicative. Definition 1. Given a point set X {0, 1} d, a adius R, an appoximation paamete > 0 and the Hamming distance denoted as 1, an appoximate -net of X with additive eo is as subset C X such that the following popeties hold: 1. (packing) Fo evey p, q C, p q, it holds that p q 1 2. (coveing) Fo evey p X, thee exists a q C, s. t. p q 1 + d (additive eo) To constuct appoximate -net we employ Pobabilistic Polynomial Theshold Functions, a tool intoduced in (Alman, Chan, and Williams 2016). To effectively apply this technique, we equie a spase dataset, meaning that we assume that most of the points ae futhe than fom each othe. To that end, we pesent a technique that spasifies the data in advance without losing meaningful data fo ou poblem. Spasification To spasify ou data, we apply bute foce to build pat of the appoximate -net. Intuitively, we andomly pick a cente point fom ou dataset and then emove evey point that is close then + d fom the cente, by checking evey point of the dataset. This technique was oiginally intoduced in (Avaikioti et al. 2017). The poof of Theoem 1 closely follows this wok. Theoem 1. Given X {0, 1} d, X = n, > 0, the Hamming distance which we denote as 1 and some distance R, we can compute a set X X with P [Y n 1.7 ] 1 n 0.2 and a patial -net C of X \ X, whee Y := {{i, j} x i, x j X, x i x j 1 + d} the numbe of points with close distance to each othe, in time O(dn 1.5 ). Distance Matix Next we intoduce a tool, called distance matix, to appoximate -nets. To constuct a distance matix, we patition the dataset into disjoint sets of equal size. The ows of the matix coespond to patitions and the columns to points of the dataset. Each enty holds a value which indicates if any of the points in a patition (ow) is at most + d close to a data point (column). We use Pobabilistic Polynomial Theshold Functions, fomally defined below, to constuct a matix with such indicato values. Definition 2 ((Alman, Chan, and Williams 2016)). If f is a Boolean function on n vaiables, and R is a ing, a pobabilistic polynomial fo f with eo 1 s and degee d is a distibution D of degee-d polynomials ove R such that {0, 1} n, P p D [p(x) = f(x)] 1 1 s.

3 The main building block to constuct the distance matix is Theoem 2, which uses the fact that each enty of the distance matix can be expessed as a Boolean fomula. Theoem 2 ((Alman, Chan, and Williams 2016)). Given d, s, t,, we can constuct a pobabilistic polynomial P : {0, 1} ns R of degee at most := O(( 1 ) 1 3 log(s)) with at most s (n ), such that: s 1. If i=1 [ n j=1 x ij t] is false, then P (x 11,..., x 1n,..., x s1,..., x sn ) s with pobability at least 2 3 ; s 2. If i=1 [ n j=1 x ij t + n] is tue, then P (x 11,..., x 1n,..., x s1,..., x sn ) > 2s with pobability at least 2 3. Befoe we show how to constuct the distance matix fo a given dataset, we cite the following Lemma by (Coppesmith 1982), on ectangula matix multiplication. Lemma 1 ((Coppesmith 1982)). Fo all sufficiently lage N, and α.172, multiplication of an N N α matix with an N α N matix can be done in N 2 poly(log N) aithmetic opeations, ove any field with O(2 poly(logn) ) elements. 2 Next, we pesent how to build the distance matix, combining fast matix multiplication and Pobabilistic Polynomial Theshold Functions. Theoem 3. Let X be a set of n points in {0, 1} d, a adius R, some log6 (d log n), α = Ω( 1 3 d log( log n )) and let 1 denote the Hamming distance. Thee exists an algoithm that computes, with high pobability, a n 1 α n matix W and a patition S 1,..., S n 1 α of X that satisfies the following popeties: 1. Fo all i [n 1 α ] 3 and j [n], if min p Si x j p 1 then W i,j > 2 S i. 2. Fo all k [n 1 α ] and j [n], if min p Si x j p 1 > + d, then W i,j S i The algoithm uns in Õ(n 2 α ). Building a Net Now, we pesent how we can build an appoximate -net fo a data set, as in (Avaikioti et al. 2017): we fist employ the spasification technique and then build the distance matix in the spase dataset whee we can seach efficiently. The unning time of building an appoximate - net is dominated by the time complexity of the constuction of the distance matix. Theoem 4. Given X {0, 1} d with X = n, some distance R and some log6 (d log n), we can compute a set C that contains the centes of an appoximate -net with additive eo at most with high pobability in time Õ(n 2 α + dn 1.7+α ), whee α = Ω( d log( )). log n Poof. We apply Theoem 1 to the set X with adius and eo. This esults in the emaining points X, a patial 2 A poof can be found in the Appendix of (Williams 2014) 3 By [k] we denote the set {1, 2,..., k} appoximate -net C fo X \ X and P [Y n 1.7 ] 1 n 0.2, whee Y := {{i, j} x i, x j X, x i x j 1 + d}, in time O(n 1.5 d). We then apply Theoem 3 to assemble the distance matix W and the patition S 1,..., S n 1 α on inputs X, and. If we encounte moe then n 1.7 enties W ij whee W ij > 2 S i, we estat the algoithm. Since P [Y n 1.7 ] 1 n 0.2, with high pobability we pass this stage in a constant numbe of uns. Next, we iteate top down ove evey column of W. Fo evey column j, fist check if x j is aleady deleted. If this is the case we diectly skip to the next column. Othewise set C = C {x j } and delete x j. Fo evey enty in that column such that W i,j > 2 S i, we then delete evey x S i whee x j x 1 + d. As we iteate ove evey column of W, which coespond to evey point of X, the net fulfills coveing. Since points that ae added as centes wee not coveed within of a cente by pevious iteations, C also fulfills packing. Thus C now contains the net points of an appoximate -net of X. By Theoem 3, building the distance matix takes Õ(n 2 α ) time and iteating ove evey enty of W takes Õ(n 2 α ) time as well. Fo at most n 1.7 of these enties, we check the distance between points in a set S i and the point of the cuent column which takes anothe O(n 1.7+α d). The untime is thus as stated. Appoximate -nets in Euclidean Space In this subsection, we educe the poblem of computing appoximate -nets fom Euclidean to Hamming space. Then, we apply Theoem 4 to compute appoximate -nets in Euclidean space. We distinguish between l 1 and l 2 metics. Specifically, we fist show how to map the dataset fom R d space with l 1 metic to Hamming space. Then, we pesent a eduction fom l 2 to l 1 Euclidean space. Note that the eo in the oiginal Euclidean space is multiplicative, while in Hamming space additive. Although the poofs of both Theoem 5 and 6 use the same mappings as in (Alman, Chan, and Williams 2016), the poblem of computing -nets is diffeent and moe geneal than finding the closest pai, thus we cannot diectly cite thei esults. l 1 case To educe the poblem of computing appoximate -nets fom the Euclidean to the Hamming space (with l 1 metic) we use Locality Sensitive Hashing (LSH), fomally defined below. Definition 3. Let, c R and p 1, p 2 [0, 1] whee p 1 > p 2. A distibution D of hash functions is called (, c, p 1, p 2 )-sensitive, if, fo a metic space V unde nom, a hash function h andomly dawn fom D satisfies the following conditions fo any points x, y V : 1. x y P [h(x) = h(y)] p 1 2. x y c P [h(x) = h(y)] p 2 We call hashing methods that exhibit these popeties locality sensitive hash functions (LSH). Now, we show how to compute appoximate -nets in Euclidean space unde the l 1 nom, by employing a specific instance of LSH functions.

4 Theoem 5. Fo a set of input points X R d, some adius R and some eo log6 (log n), with high pobability, we can constuct a (1 + )-appoximate -net un- de l 1 euclidean nom in time Õ(dn + n2 α ) whee α = Ω( / log( 1 )). Poof. The following inequalities, ae the same as the ones deived in (Alman, Chan, and Williams 2016) fo thei algoithm that finds all neaest/futhest neighbos of a set. Fist apply a vaiant of locality sensitive hashing, to map points fom l 1 to Hamming space. Fo each point p X and i {1,..., k}, k = O( 2 log n), we define hash functions h i = {h i1 (p),..., h id (p)}, whee h ij = paij +b ij, a ij {1,..., d} and b ij [0, 2) sampled independently unifomly at andom. Fo each value h i (p), define f i (p) = 0 with pobability 1 2 o f i(p) = 1 with pobability 1 2. We then define a new point in Hamming space as f(p) = (f 1 (p),..., f k (p)). We have that fo any p, q X, P [h ij (p) h ij (q)] = 1 d pa qa d a=1 min{ 2, 1} and P [f i (p) f i (q)] = P [f i (p) f i (q) h i (p) = h i (q)]p [h i (p) = h i (q)] + P [f i (p) f i (q) h i (p) h i (q)]p [h i (p) h i (q)] = P [h i(p) h i (q)] = 1 2 P [ d j=1 h ij(p) h ij (q)] = 1 2 (1 d j=1 P [h ij(p) = h ij (q)]) = 1 2 (1 d j=1 (1 P [h ij(p) h ij (q)])) Thus, the following popeties hold fo any p, q X: 1. If p q then P [h ij (p) h ij (q)] 1 2d P [f i (p) f i (q)] (1 (1 2d )d ) := α 0 2 and thus 2. If p q (1 + ) then P [h ij (p) h ij (q)] 1+ 2d and thus P [f i (p) f i (q)] (1 (1 2d )d ) := α 1 Then it follows that α 1 α 0 = Ω(). By applying a Chenoff bound, we deive the following: 1. If p q then E[ f(p) f(q) ] = k i=1 P [ f i(p) f i (q) ] kα 0 and thus P [ f(p) f(q) α 0 k + O( k log n) := A 0 ] 1 O( 1 n ) 2. If p q (1 + ) then E[ f(p) f(q) ] = k i=1 P [ f i(p) f i (q) ] kα 1 and thus P [ f(p) f(q) α 1 k O( k log n) := A 1 ] 1 O( 1 n ) As we know that α 1 α 0 = Ω() it is easy to see that A 1 A 0 = k(α 1 α 0 ) O( k log n) = Ω(k). Fo the new set of points X := f(x), we constuct an appoximate -net with additive eo Ω(), which yields the cente points of an appoximate -net of the oiginal points with multiplicative eo (1 + ). We hence apply Theoem 4 on inputs X = X, d = k, = Ω() and = A 0. This gives us the centes C of an appoximate -net fo X in time Õ(n2 α + n 1.7+α ) whee α = Ω( / log( 1 )). The points that get mapped to the net points in C ae then the centes of a (1 + )-appoximate -net of the points in X unde l 1 metics with high pobability. Applying this vesion of locality sensitive hashing to X takes O(dkn) = Õ(dn) time, which leads to the untime as stated. l 2 case Fistly, we educe the dimension of the dataset using the Fast Johnson Lindenstauss Tansfom (Johnson and Schechtman 1982; Johnson and Lindenstauss 1984). Specifically, we use a vaiant that allows us to map l 2 points to l 1 points, while peseving a slightly petubed all pai distance unde the espective nom, as fo example seen in (Matoušek 2008). Thus, we can constuct appoximate -nets in the geneal Euclidean space, as fomally stated below. Theoem 6. Fo set of input points X R d, some adius R, some eo log6 (log n), with high pobability, we can constuct a (1 + )-appoximate -net unde l 2 euclidean nom in time Õ(dn1.7+α + n 2 α ) whee α = Ω( / log( 1 )). Applications In the following section, we pesent applications fo the algoithms we pesented in the pevious section. To that end, we exhibit an impovement on a famewok called Net & Pune. Net & Pune was invented by (Ha-Peled and Raichel 2015) fo low dimensional applications. An extended vesion of the famewok, that is efficient in highe dimensional datasets, was late pesented by (Avaikioti et al. 2017). In what follows, we apply the appoximate -net algoithm to immediately impove the high dimensional famewok. We then pesent vaious applications, that depend on appoximate -nets and the famewok. Net & Pune Famewok Net & Pune mainly consists of two algoithms, AppxNet and DelFa, which ae altenatively called by the famewok, and a data stuctue that is specific to the poblem we want to solve. When supplied with these, the famewok etuns an inteval with constant spead, which is guaanteed to contain the optimal solution to the objective of the desied poblem. To impove the famewok, we fist impove these two algoithms. AppxNet computes an appoximate -net fo a given point set and DelFa deletes points the isolated points, i.e. the points that do not contain any othe point in a ball of adius aound them. As an impovement to AppxNet, we efe to Theoem 5 and Theoem 6. We now pesent an algoithm, that yields an impoved vesion of DelFa: Theoem 7. Fo a set of points X, some eo log6 (log n), a adius Rd and the nom, that denotes the l 1 o l 2 nom, we can constuct an algoithm DelFa that outputs, with high pobability, a set F, whee the following holds: 1. If fo any point p X it holds that q X, q p, p q > (1 + ) then p F 2. If fo any point p X it holds that q X, q p, p q then p F We do this in time Õ(dn + n2 α ), whee α = Ω( log(1/) ).

5 Poof. We pove the Theoem fo the l 1 metic. Fo an l 2 instance we can simply apply the mapping of Theoem 6 and the poof holds. Initially, we map the points to Hamming space, applying the techniques descibed in Theoem 5. Duing the bute foce pat of the algoithm we do the following: we delete each point that is coveed by a cente and then we add both the point and the cente to set F. We do not add centes to F that do not cove any othe points. Late, when tavesing the distance matix, we check each enty that indicates if the patition contains a close point. We calculate the distances between the cuent point and all points of the patition. We add in set F, and then delete, the points that ae actually close. We ignoe points, whee evey enty in its column indicate no close points. In the end, we etun the set F. The unning time of the algoithm is the same as in Theoem 6, since deleting a point afte adding it to set F takes constant time. The Net & Pune famewok allows us to solve vaious so called nice distance poblems. As pesented by (Ha-Peled and Raichel 2015), the poblems solved in the subsections to come, ae all poven to be of such kind. One of the popeties of such poblems is, that thee needs to exist a so called (1 + )-decide fo that poblem. In the following, we denote a fomal definition of such decides. Definition 4 ((Avaikioti et al. 2017)). Given a function f : X R, we call a decide pocedue a (1 + )-decide fo f, if fo any x X and > 0, decide f (, X) etuns one of the following: (i) f(x) [β, (1 + )β] fo some eal β, (ii) f(x) <, o (iii) f(x) >. Even though (Ha-Peled and Raichel 2015) pesented a decide fo each of the poblems that follow, the extended famewok by (Avaikioti et al. 2017) equies decides to be efficient, as othewise the famewoks untime does not hold. This is a good oppotunity to apply Theoem 5 and Theoem 6 fom the pevious section. In the following sections, we employ the theoem below to find constant spead intevals. These contain the solutions to nice distance poblems. We then apply decides to appoximate the solution of the poblem. Theoem 8 ((Avaikioti et al. 2017)). Fo c 64, the Net & Pune algoithm computes in O(dn ) time a constant spead inteval containing the optimal value f(x), with pobability 1 o(1). kth-smallest Neaest Neighbo Distance When having a set of points in high dimensional space, we may be inteested in finding the k-smallest neaest neighbo distance. This means, when looking at the set of distances to the neaest neighbo of each point, finding the kth-smallest of these. Computing this with a naive algoithm takes O(dn 2 ), which is not suitable in high dimensional space. Altenatively, we ae able to build a (1 + )- decide and then apply the Net & Pune famewok to solve the poblem. This has peviously been done by (Avaikioti et al. 2017). Theoem 5 and Theoem 6 yield immediate impovement on the untime of the decide, as it is built with help of DelFa. We thus omit the poof of the following Theoem hee. The poof can be found in the supplementay mateial. Theoem 9. Fo a set of points X, 4 log6 (log n) and the nom, that denotes the l 1 o l 2 nom, with high pobability, we can find the (1 + )-appoximate k-smallest neaest neighbo distance of X in time Õ(dn+n2 α ), whee α = Ω( log(1/) ). Min-Max Clusteing To undestand the following poblem, we fist define Upwad Closed Set Systems and Sketchable Families, as intoduced in (Ha-Peled and Raichel 2015). Definition 5 (Upwad Closed Set System & Sketchable Families (Ha-Peled and Raichel 2015)). Let P be a finite gound set of elements, and let F be a family of subsets of P. Then (P, F) is an upwad closed set system if fo any X F and any Y P, such that X Y, we have that Y F. Such a set system is a sketchable family, if fo any set S P thee exists a constant size sketch sk(s) such that the following hold. 1. Fo any S, T P that ae disjoint, sk(s T ) can be computed fom sk(s) and sk(t ) in O(1) time. We assume the sketch of a singleton can be computed in O(1) time, and as such the sketch of a set S P can be computed in O( S ). 2. Thee is a membeship oacle fo the set system based on the sketch. That is, thee is a pocedue oac such that given the sketch of a subset sk(s), oac etuns whethe S F o not, in O(1) time. Min-Max Clusteing is a method of clusteing sets of the Upwad Closed Set Systems within Sketchable Families unde some cost function. The following is a fomal definition of Min-Max clusteing, as povided by (Ha-Peled and Raichel 2015). Definition 6 (Min-Max Clusteing (Ha-Peled and Raichel 2015)). We ae given a sketchable family (P, F), and a cost function g : 2 P R +. We ae inteested in finding disjoint sets S 1,..., S m F, such that (i) m i=1 S i = P, and (ii) max i g(s i ) is minimized. We will efe to the patition ealizing the minimum as the optimal clusteing of P. We late esot to the following Lemma when building a (1 + )-decide fo a concete instance of Min-Max Clusteing. Lemma 2. Given a set of n points X R d, a adius R, some eo paamete log6 (log n), the nom, that denotes the l 1 o l 2 nom, and a set C X s.t. x, y C, x y 2(1+), with high pobability, we can etun sets P i, such that c i C, x P i, c i x (1 + ) and x X B (c i ), whee B (c i ) = {x : x R d, x c i } we have that x P i in time Õ(dn + n2 α ), whee α = Ω( log(1/) ). Poof. We educe the poblem to Hamming space with eo and adius, applying the same techniques as in Theoem

6 5 fo l 1 points o Theoem 6 fo points in l 2. Afte this eduction, we get a set of new points X and a new adius. We apply bute foce on X to get pat of the solution. We andomly choose a point of c i C and then iteate ove evey point in x X. We check if x c i + k, fo k = ( 2 log n) the dimension in Hamming space. Fo evey point whee this holds, we the oiginal point into the set P i and then delete x fom X. We do this n times which takes Õ(n1.5 ) time in total. Without loss of geneality, we now assume that C > n, as othewise we would be done at this point. With a simila agument as in Theoem 1, we ague that, afte using bute foce, with pobability at least 1 n 0.2, {(c, x) c C, x X, x c + k} n 1.7. As in Theoem 4, we then build the distance matix W of the emaining points in X. We iteate ove evey column coesponding to emaining cente points c j. Fo evey enty W i,j > 2 S i, we add oiginal vesion of evey point x S i such that x c j + k to P j. This takes time Õ(n2 α + n 1.7 ). It then holds that c i C, x P i, c i x (1 + ), as we only added points to P i s, whee this popety holds. It also holds that x X B (c i ), as evey point is only within the ball of one single cente, because of the constaint on C. As we only do bute foce and then build a distance matix which we iteate though in a simila fashion as in Theoem 4, the untime is as stated. The poof of the following Theoem descibes how to utilize the above Lemma to build a decide. The famewok then allows us to solve a concete instance of Min-Max Clusteing. A simila decide was built by (Ha-Peled and Raichel 2015), to solve the same poblem in low dimensional space. Theoem 10. Let P R d, let (P, F) be a sketchable family and let be the nom, that denotes the l 1 o l 2 nom. Fo a set W F, let min (W ) be the smallest adius, such that a ball centeed at a point of W encloses the whole set. We can then, fo 4 log6 (log n), (4 + )-appoximate the min-max clusteing of P with min (W ) as the cost function, with high pobability, in time Õ(dn + n2 α ), whee α = Ω( log(1/) ). Specifically, one can cove P by a set of balls and assign each point of P to a ball containing that point, such that the set of assigned point of each ball is in F and the maximum adius of these balls is a (4+)-appoximate of the minimum of the maximal adius used by any such cove. Poof. Fist notice that, when building a (1+)-appoximate (4 opt (1 + ))-net, whee opt is the adius of the optimal clusteing P opt of P, the following popeties hold. Let W i P opt be the cluste that contains cente c i of the net. It then holds that diam(w i ) 2 opt. Also any two cente points of the net have distance at least (4 opt (1 + )) fom each othe, thus thee ae no i j such that W i = W j. Now define C i as the set of points that ae contained in a ball of adius 2 opt aound cente c i, hence C i = P B 2opt (c i ). It then holds that W i C i and since W i P opt, we know that W i F. Thus C i F by definition of upwad closed set systems. This obsevation allows us to build a decide fo f(p, F), which is the function etuning the optimal solution to the objective of the clusteing. Fist, we build a (1 + 4 )- appoximate (4(1 + 4 ))-net. We then apply Lemma 2 on inputs X = P, = 2, = 4 and C = C, whee C is the set of cente points of the net. It is easy to see, that the needed popety on C is met,as it contains the cente points of a (4(1 + 4 ))-net. The sets P i, that get etuned by Lemma 2, ae then supesets of C i, if opt. Fom the definition of sketchable families we know that, fo evey P i, we ae able to decide if P i F in O(n). Assume now that thee exists a P i which is not in F. P i is thus not a supeset of C i, and we etun < opt. Othewise, we know that i, C i P i and thus opt. Now its left to decide if opt is within some constant spead inteval. To that end, we epeat the pocess above, but fo a slightly smalle net, say a (1 + 4 )-appoximate 4-net. If all of the P i s fo this new net ae in F, we know that the oiginal was to big and we etun f(p, F) <. Othewise, because we applied Lemma 2 to adius 2 (1+ 4 ) and found that balls of that adius centeed at c i ae not in F, we know that the optimal value is at least (1+ 4 ), dew to ou obsevation about the diamete of the clustes in the beginning. We also know that opt can be as big as 4(1 + 4 )2 as we ae able to cove the whole space with balls of such adius and subsets of these ae in F. Theefo, we etun the inteval, 4(1+ 4 )3 1+ ]. Plugging this into the famewok thus 4 povides us with a constant spead inteval, which contains [ 1+ 4 the solution. By seaching the inteval the same way as in Theoem 9, we end up with an inteval [ 1+, 4( )3 1+ ]. 4 We etun, which is a (4+)-appoximate solution since 1+ 4 the eal solution may be up to a 4(1 + 4 )3 -facto off and 4(1 + 4 )3 = 4(( 4 )3 + 3( 4 ) ) (4 + ). In the wost case, the decide builds an appoximate -net twice and also calls Lemma 2 twice. Applying the famewok with that decide and seaching the etuned inteval thus esults in Õ(dn + n2 α ). k-cente The k-cente clusteing is tightly coupled to the poblem of building -nets. Fo a set of high dimensional points, we want to find k clustes, that minimize the maximum diamete of any of these. Fo any > 0, computing a (2 ) appoximate k-cente clusteing in polynomial time has been shown to be impossible except P = N P (Hsu and Nemhause 1979). We thus focus on computing (2 + )-appoximates of the optimal solution. In the following we pesent two appoaches to this. Fist, we build a decide, such that we ae able to employ the famewok, which povides us with a (4 + )-appoximate k-cente clusteing. We then exhibit a diffeent appoach to the poblem. Instead of elying on the famewok, we deive an algoithm that computes appoximate geedy pemutations. We then pesent a way of educing the computation of a (2 + )-appoximate k-cente clusteing to building appoximate geedy pemutations. The dawback of this appoach is, that the untime has a logaithmic dependency on the spead of the data.

7 (4 + ) appoximate k-cente As in pevious subsections, we design a decide which then gets called by the famewok. The constuction is simila to (Ha-Peled and Raichel 2015), whee they constuct a decide to (4+)-appoximate k-cente clusteing in lowe dimensions. We fist poof the following Lemma, which is going to be useful late. Lemma 3. Thee ae the following elations between a set C, which contains the net points of a (1 + )-appoximate -net on a set of points X, and the function f(x, k), which etuns the optimal clusteing adius fo the k-cente poblem on the set X. 1. If C k then f(x) < (1 + 2) 2. If C > k then 2f(X) Theoem 11. Fo a set of n points X R d, some intege k, n k > 0, some eo paamete 32 log6 (log n) and the nom, that denotes the l 1 o l 2 nom, with high pobability, we etun a (4 + )-appoximate k-cente clusteing in time Õ(dn + n2 α ), whee α = Ω( log(1/) ). Poof. In the following we pesent the decide that we plug into the famewok. Fist we ceate a ( )-appoximate 1+ -net and check if we get k o less cente points. If we 16 do, then due to Lemma 3 we can safely etun f(x, k) <. If we do not, we ceate a (1+ 32 )-appoximate (2(1+ 16 ))- net and check if we have at most k centes in this net. In that case, due to Lemma 3 we know that 2(1+ 16 ) f(x, k) < 2( )2 and we etun this inteval. Othewise, we have moe than k-centes. Thus we know fom Lemma 3 that ( ) f(x, k) and we etun f(x, k) >. We theefo satisfy the popeties of a (1 + )-decide and apply it to the famewok to compute a constant spead inteval containing the exact solution. As in the pevious subsections, we slice the inteval and do binay seach using the decide. If we find an inteval 2(1+ 16 ) f(x, k) < 2( )2, we etun 2( )2 which is a (4 + ) appoximation, as it might miss the eal solution up to a facto of 4( )3 = It is easy to see that the decide uns in time Õ(dn+n2 α ) and as in the pevious subsection, the seach takes O(1/ 2 ) iteations. Theefo, the untime is as stated. (2 + ) appoximate k-cente with dependency on the spead Anothe way to appoach the appoximate k-cente clusteing, is given by (Gonzalez 1985). Thee they constuct a 2-appoximate k-cente clusteing by using geedy pemutations. A geedy pemutation of a point set is an odeing, such that the i-th point is the futhest fom all pevious points in the pemutation. In (Eppstein, Ha-Peled, and Sidiopoulos 2015) they descibe a way of constucting an appoximate geedy pemutation by building appoximate -nets. In the following, we pesent a way to impove this constuction by applying the appoximate -net algoithm fom the pevious section. We then pesent how to exploit appoximate geedy pemutations to ceate (2 + )- appoximate k-cente clusteings. Unfotunately, building the geedy pemutation has a logaithmic untime dependency on the spead, which is the atio of the biggest to the smallest distance within the point set. Theefoe, the algoithm is only useful fo data, whee the spead is in poly(n). Appoximate geedy pemutation A geedy pemutation is an odeed set Π of the input points, such that the point π i is the futhest point in V fom the set {π j } i 1 j=1. The following is a fomal definition of appoximate geedy pemutations, as descibed in (Eppstein, Ha-Peled, and Sidiopoulos 2015). Definition 7. A Pemutation Π is a (1 + )-geedy pemutation on n points on metic space (V, d), if thee exists a sequence n s.t. 1. The maximum distance of a point in V fom {π j } i j=1 is in the ange [ i, (1 + ) i ] 2. The distance between any two points u, v {π j } i j=1 is at least i We now poof the following Lemma, which helps us build the appoximate geedy pemutation late. Lemma 4. Fo a set of n points X R d, the nom, that denotes the l 1 o l 2 nom, a set of points C X, such that x, y C, x y, some eo log6 (log n) and a adius R, with high pobability, we can compute a set F, such that y F, c C, x y. We do this in time Õ(dn + n2 α ), whee α = Ω( log(1/) ). Poof. We poceed simila as when building the appoximate -net. We fist educe the poblem to Hamming space with additive eo as in Theoem 5 fo points in l 1 o Theoem 6 fo l 2 points. We then aive at the mapped point set X and adius. Next, we apply a slightly modified vesion of Theoem 1. Instead of andomly choosing a point fom X, we andomly choose a point c C. We then iteate ove evey point in x X and check if x c + k fo k = ( 2 log n), the dimension in Hamming space. Fo evey point x whee this holds, we delete x fom X as well as fom the oiginal set X. We do this n times, which takes Õ(n1.5 ) time in total. We now assume, without loss of geneality, that C > n, as othewise we would be done at this point. By applying a simila agument as in Theoem 1, it holds that with pobability at least 1 n 0.2, {(c, x) c C, x X, x c + k} n 1.7. As in Theoem 4, we now build the distance matix W of the emaining points in X. We then iteate ove evey column coesponding to the emaining cente points c j and, fo evey enty W i,j > 2 S i, delete the oiginal vesion of evey point x S i such that x c +k fom X. This takes time Õ(n2 α +n 1.7 ). The point set X then, by constuction, only contains points which ae futhe then fom any point in C. The algoithm fo building the geedy pemutation is vey simila to the algoithm pesented in (Eppstein, Ha-Peled, and Sidiopoulos 2015). We build sequences of appoximate -nets, stating with a adius that is an appoximation of the

8 maximum distance within the point set. Then, we consecutively build appoximate -nets fo smalle and smalle, while keeping centes of peviously computed appoximate -nets. Putting the cente points into a list in the ode they get computed esults in an appoximate geedy pemutation. Fo a full poof outline efe to the supplementay mateial. Theoem 12. Fo point set X R d with X = n, 4 log 6 (log n) and the nom, that denotes the l 1 o l 2 nom, with high pobability, we can compute a (1+)-appoximate geedy computation in time Õ((dn + n2 α ) log Φ), whee α = Ω( 1 3 log(1/) ) and Φ = max x,y X,x y x y min x,y X,x y x y is the spead of data X. k-cente with appoximate geedy pemutation In (Gonzalez 1985), Gonzales poved that an exact geedy pemutation leads to a 2 appoximation of the solution fo the k-cente objective, if we take the fist k elements out of the pemutation and declae them as cluste centes. The maximum adius of a cluste, is then the minimum distance of the k + 1-th point in the pemutation to the one of the fist k points. With a (1 + )-appoximate geedy pemutation we can then deive a (2 + )-appoximate solution fo the k-cente poblem, since fo evey element π i and evey i as in the definition of the appoximate geedy pemutation, we know that, in metic space (V, ), it holds that i max u V min j {1,...,i} π j u (1 + ) i. Thus the adius used, if we take the fist k elements of the appoximate geedy pemutation as cluste centes, is at most a 1 + facto lage than the adius we would use by taking the fist k elements of the exact geedy pemutation, which in tun is at most a 2 facto lage than the exact k-cente clusteing adius. Conclusion & Futue Wok Ou wok has lead to inteesting impovement on the constuction time of appoximate -nets and applications theeof. We wish to highlight the following open poblems. Fist, can we find a lowe bound to the constuction time of -nets? This would also tell us moe about the limits of the Net & Pune famewok. Second, can we get id of the spead dependency on the appoximate geedy pemutation algoithm, as this would make the algoithm suitable fo much moe geneal data sets? Ou wok seems to suggest that this is tightly coupled to finding all neaest neighbos. Acknowledgments We thank Ioannis Psaos fo the helpful discussions. Yuyi Wang is patially suppoted by X-Ode Lab. Refeences Alman, J.; Chan, T. M.; and Williams, R Polynomial epesentations of theshold functions and algoithmic applications. In 2016 IEEE 57th Annual Symposium on Foundations of Compute Science (FOCS), IEEE. Avaikioti, G.; Emiis, I. Z.; Kavouas, L.; and Psaos, I High-dimensional appoximate -nets. In Poceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discete Algoithms, SIAM. Coppesmith, D Rapid multiplication of ectangula matices. SIAM J. Comput. 11: Dasgupta, S Pefomance guaantees fo hieachical clusteing. In Kivinen, J., and Sloan, R. H., eds., Computational Leaning Theoy, Belin, Heidelbeg: Spinge Belin Heidelbeg. Defays, D An efficient algoithm fo a complete link method. The Compute Jounal 20(4): Eppstein, D.; Ha-Peled, S.; and Sidiopoulos, A Appoximate geedy clusteing and distance selection fo gaph metics. CoRR abs/ Estivill-Casto, V Why so many clusteing algoithms: a position pape. ACM SIGKDD exploations newslette 4(1): Fen, X. Z., and Bodley, C. E Random pojection fo high dimensional data clusteing: A cluste ensemble appoach. In Poceedings of the 20th intenational confeence on machine leaning (ICML-03), Gonzalez, T. F Clusteing to minimize the maximum intecluste distance. Theoetical Compute Science 38: Ha-Peled, S., and Raichel, B Net and pune: A linea time algoithm fo euclidean distance poblems. Jounal of the ACM (JACM) 62(6):44. Hsu, W.-L., and Nemhause, G. L Easy and had bottleneck location poblems. Discete Applied Mathematics 1(3): Johnson, W. B., and Lindenstauss, J Extensions of lipschitz mappings into a hilbet space. Contempoay mathematics 26( ):1. Johnson, W. B., and Schechtman, G Embedding lp m into l1 n. Acta Mathematica 149(1): Kiegel, H.-P.; Köge, P.; Sande, J.; and Zimek, A Density-based clusteing. Wiley Intedisciplinay Reviews: Data Mining and Knowledge Discovey 1(3): Lloyd, S Least squaes quantization in pcm. IEEE tansactions on infomation theoy 28(2): Matoušek, J On vaiants of the johnson-lindenstauss lemma. Random Stuct. Algoithms 33(2): Sibson, R Slink: an optimally efficient algoithm fo the single-link cluste method. The compute jounal 16(1): Williams, R New algoithms and lowe bounds fo cicuits with linea theshold gates. In Poceedings of the foty-sixth annual ACM symposium on Theoy of computing, ACM.

9 Poof of Theoem 1 Theoem 1. Given X {0, 1} d, X = n, > 0, the Hamming distance which we denote as 1 and some distance R, we can compute a set X X with P [Y n 1.7 ] 1 n 0.2 and a patial -net C of X \ X, whee Y := {{i, j} x i, x j X, x i x j 1 + d} the numbe of points with close distance to each othe, in time O(dn 1.5 ). Poof. We ceate a copy of X and call it X. Afte that, we epeat the following n times: Choose a point x i X unifomly at andom, delete it fom X and add it to C. Then check fo each x X, if x x i 1 + d. If so, delete x fom X as well. We do this in O(dn 1.5 ) time. Let Y := {{i, j} x i, x j X, x i x j 1 + d}. We now pove P [Y n 1.7 ] 1 n 0.2 by doing a case distinction. Let A i be the numbe of points with small distance to a andomly chosen point p in the i-th iteation. Now fist assume that E[A i ] > 2n 0.5 ; i {1,..., n}. Thus the numbe of points to be deleted in each iteation is at least 2n in expectation which esults in moe then n deleted points afte n iteations. Theefo, if ou assumption holds, we get X = 0 afte at most n iteations. P [Y n 1.7 ] 1 n 0.2 then holds as Y = 0. Next assume that i {1,..., n}, E[X i ] 2n 0.5. If we each such an i, we have at most 2n 0.5 small distances left between a andom point of X and all the points within X in expectation. Afte all iteations, the numbe of small distances is theefo no moe then 2n 1.5 in expectation. Thus, by Makov s inequality: P [Y n 1.7 ] = 1 P [Y n 1.7 ] 1 O(n 0.2 ). (1) We now pove coectness of the patial appoximate -net C. Fo evey point p X \ X thee will be a net point in C that has distance at most + d fom p, as we only emoved points fom X which satisfy this popety. Fo evey two points p, q C we have p q 1 >, because if the distance was less o equal to, eithe p would have been deleted in the iteation of q o vice vesa (whateve point came fist). This concludes the poof. Poof of Theoem 3 Theoem 3. Let X be a set of n points in {0, 1} d, a adius R, some log6 (d log n), α = Ω( 1 3 d log( log n )) and let 1 denote the Hamming distance. Thee exists an algoithm that computes, with high pobability, a n 1 α n matix W and a patition S 1,..., S n 1 α of X that satisfies the following popeties: 1. Fo all i [n 1 α ] 4 and j [n], if min p Si x j p 1 then W i,j > 2 S i. 2. Fo all k [n 1 α ] and j [n], if min p Si x j p 1 > + d, then W i,j S i 4 By [k] we denote the set {1, 2,..., k} The algoithm uns in Õ(n 2 α ). Poof. We constuct an algoithm that is simila to the one used fo neaest/futhest neighbo seach in Hamming space, pesented in (Alman, Chan, and Williams 2016). We fist ceate a andom patition of X into disjoint sets S 1,..., S n 1 α, each of size s := n α. Fo evey such S i and evey point q X we then want to test, if at least one point is within ( + d) distance of q o not. This can be expessed as a Boolean fomula in the following way: F (S i, q) := [min p q 1 + d] p S i = d (p j q j + (1 p j )(1 q j )) d ( + d)] [ p S i j=1 = [ p S i j=1 d (p j 0.5)(2q j 1) d ( + d) + 0.5] Applying Theoem 2, we constuct a pobabilistic PTF to expess F (S i, q). We then give a bound on the maximum numbe of monomials, accoding to Theoem 2: ( ) O(d) s O(( 1 ) n α O( 1 3 log(s)) n α n O(( α d ) log( 1/3 ( α α log n )) (n 1 α ) 0.1 d α ) log n )O( log n) 1/3 As stated in (Alman, Chan, and Williams 2016), this bound also holds fo the constuction time of the polynomial. Next we sample a polynomial f fom the pobabilistic PTF fo F (S i, q). As pesented by (Alman, Chan, and Williams 2016), we ae able to do this in O(n log(d) log(nd)) time. We then split f into two vectos φ(s i ) and ψ(q) of (n 1 α ) 0.1 dimensions ove R s.t. thei dot poduct esults in the evaluation of the coesponding polynomial. We ae able to do this as the polynomial P (x 11,...x Si d) has paametes of the fom x ij = (p j 0.5)(2q j 1). This educes the poblem of evaluating n 2 α many polynomials to multiplying a matix A := n s ( n s )0.1, whee the i-th ow of A consists of φ(s i ) T, with a matix B := ( n s )0.1 n, whee the i-th column of B consists of ψ(x i ). We futhe educe the multiplication, by splitting B into s matices of size ( n s )0.1 n s. By Lemma 1 we ae able to do each of these multiplications in Õ(( n s )2 ) aithmetic opeations ove an appopiate field. The total time of the multiplications is n2 then Õ( s ) = Õ(n2 α ) as we do s matix multiplications. We then eassemble each of the s matices by placing them next to each othe, such that the j-th column coesponds to the point q j X. This leads to the matix W whee 1. W ij > 2 S i if d (p j 0.5)(2q j 1) d ( + d) d] [ p S i j=1 = [min p S i p q 1 ]

10 2. W ij S i if [ p S i j=1 d (p j 0.5)(2q j 1) < d ( + d) + 0.5] = [min p S i p q 1 > + d] By Theoem 2, the eo pobability of each enty is 1 3 which can be loweed to 1 n 3 by epeating O(log n) times and taking majoities. The oveall untime is then Õ(n2 α ). Poof of Theoem 6 Theoem 6. Fo set of input points X R d, some adius R, some eo log6 (log n), with high pobability, we can constuct a (1 + )-appoximate -net unde l 2 euclidean nom in time Õ(dn1.7+α + n 2 α ) whee α = Ω( / log( 1 )). Poof. We define a mapping fom l 2 to l 1. Evey x X gets mapped to the vecto f(x) = (f 1 (x),..., f k (x)) whee k = ( 2 log n) and f i (x) = d j=1 σ ijx j. The coefficients σ ij s ae independent nomally distibuted andom vaiables with mean 0 and vaiance 1. As pesented in (Matoušek 2008), it holds that fo any two points x, y X, (1 ) x y 2 C f(x) f(y) 1 (1 + ) x y 2 with pobability 1 O( 1 n ) fo some constant C. The cost of applying the mapping is O(dkn). We then employ Theoem 5 on the new set of points, to get a (1+)-appoximate -net in the time stated. Poof of Theoem 9 Theoem 9. Fo a set of points X, 4 log6 (log n) and the nom, that denotes the l 1 o l 2 nom, with high pobability, we can find the (1 + )-appoximate k-smallest neaest neighbo distance of X in time Õ(dn+n2 α ), whee α = Ω( log(1/) ). Poof. Fist we descibe the decide, which is basically the same as in (Avaikioti et al. 2017), except that we plug in the new algoithm fo DelFa. The decide then woks as follows: We fist call DelFa on the set X with adius /(1+ 4 ) and eo /4 to get a set W 1. Then we call DelFa on X again but this time with adius and eo /4 to get anothe set W 2. If it then holds that W 1 k, we know that when dawing balls of at most adius aound each point, at least k of the points have thei neaest neighbo within thei ball. This means, that is to big and we output f(x, k) <. Simila, if W 2 < k, we know that even if we daw balls aound all the points with at least adius, not even k points have thei neaest neighbo inside thei ball which implies that is to small and we output f(x, k) >. Finally, if we have that W 1 < k and W 2 k, we know that the exact k- neaest neighbo has to be in the ange [/(1 + 4 ), (1 + 4 )] and we output that inteval. As this satisfies the definition of a (1 + )-decide, we plug it into the famewok and get a constant spead inteval [x, y] which contains the exact solution. We then use the decide again to (1 + )-appoximate the exact solution. We slice the inteval into pieces x, (1 + )x, (1 + ) 2 x,..., y and do binay seach on those slices, by applying the decide. If we hit an whee the decide gives us an inteval [/(1 + 4 ), (1 + 4 )], we etun (1 + 4 ) and ae done. The optimal solution might then be /(1 + 4 ), which is a (1 + 4 )2 facto smalle then what we etun. This is fine as (1 + 4 )2 = 1 + /2 + 2 / and what we etun is thus a (1 + )-appoximate as desied. While seaching we make O(1/ log(1 + )) = O(1/ 2 ) calls to the decide. The seach thus ends up having a untime of Õ(dn + n2 α ). Poof of Lemma 3 Lemma 3. Thee ae the following elations between a set C, which contains the net points of a (1 + )-appoximate -net on a set of points X, and the function f(x, k), which etuns the optimal clusteing adius fo the k-cente poblem on the set X. 1. If C k then f(x) < (1 + 2) 2. If C > k then 2f(X) Poof. Fo the fist popety we ceate a (1+)-appoximate -net of X. Due to the coveing popety of appoximate - nets, evey point in X is within (1 + ) of a cente point and thus (1 + 2) is not an optimal adius fo the k-cente clusteing. Fo the second popety note that an appoximate -net with moe then k centes contains at least k + 1 centes. These ae at least fom each othe due to the packing popety. Thus k centes with a adius of < /2 would not be able to cove all of the k + 1 centes fom the appoximate -net and hence 2f(X). Poof of Theoem 12 Theoem 12. Fo point set X R d with X = n, 4 log 6 (log n) and the nom, that denotes the l 1 o l 2 nom, with high pobability, we can compute a (1+)-appoximate geedy computation in time Õ((dn + n2 α ) log Φ), whee α = Ω( 1 3 log(1/) ) and Φ = max x,y X,x y x y min x,y X,x y x y is the spead of data X. Poof. In (Eppstein, Ha-Peled, and Sidiopoulos 2015) they pesented a way to compute a (1 + )-appoximate geedy pemutation. In the following, when talking about building an appoximate -net we efe to Theoem 5 fo Euclidean points with l 1 metic o Theoem 6 fo points in l 2 metic space. We choose a point p at andom and seach fo the futhest neighbo of that point., which is the distance to the futhest neighbo, is then a 2-appoximation to the maximum distance within X by the tiangle inequality. We thus know that max x,y X,x y x y 1 2 max x,y X,x y x y. Next we define a sequence of adiuses i = fo i {1,.., M := (1+ 4 )i 1 log 1+ 4 Φ + 2} whee Φ :=

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012 Stanfod Univesity CS59Q: Quantum Computing Handout 8 Luca Tevisan Octobe 8, 0 Lectue 8 In which we use the quantum Fouie tansfom to solve the peiod-finding poblem. The Peiod Finding Poblem Let f : {0,...,