Approximating and Testing k-histogram Distributions in Sub-linear time
|
|
- Nickolas Thornton
- 5 years ago
- Views:
Transcription
1 Electronic Colloquiu on Coputational Coplexity, Report No Approxiating and Testing k-histogra Distributions in Sub-linear tie Piotr Indyk Reut Levi Ronitt Rubinfeld Noveber 9, 011 Abstract A discrete distribution p, over [n], is a k-histogra if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following proble: given a collection of saples fro a distribution p, find a k-histogra that approxiately iniizes the l distance to the distribution p. We give tie and saple efficient algoriths for this proble. We further provide algoriths that distinguish distributions that have the property of being a k- histogra fro distributions that are ɛ-far fro any k-histogra in the l 1 distance and l distance respectively. CSAIL, MIT, Cabridge MA E-ail: indyk@theory.lcs.it.edu. This aterial is based upon work supported by David and Lucille Packard Fellowship, MADALGO Center for Massive Data Algorithics, funded by the Danish National Research Association and NSF grant CCF School of Coputer Science, Tel Aviv University. E-ail: reuti.levi@gail.co. Research supported by the Israel Science Foundation grant nos. 1147/09 and 46/08 CSAIL, MIT, Cabridge MA 0139 and the Blavatnik School of Coputer Science, Tel Aviv University. E-ail: ronitt@csail.it.edu. Research supported by NSF grants and , Marie Curie Reintegration grant PIRG03- GA and the Israel Science Foundation grant nos. 1147/09 and 1675/09. ISSN
2 1 Introduction The ubiquity of assive data sets is a phenoenon that began over a decade ago, and is becoing ore and ore pervasive. As a result, there has been recently a significant interests in constructing succinct representations of the data. Ideally, such representations should take little space and coputation tie to operate on, while approxiately preserving the desired properties of the data. One of the ost natural and useful succinct representations of the data are histogras. For a data set D whose eleents coe fro the universe [n], a k-histogra H is a piecewise constant function defined over [n] consisting of k pieces. Note that a k-histogra can be described using Ok nubers. A good k- histogra is such that a the value Hi is a good approxiation of the total nuber of ties an eleent i occurs in the data set denoted by P i and b the value of k is sall. Histogras are a popular and flexible way to approxiate the distribution of data attributes e.g., eployees age or salary in databases. They can be used for data visualization, analysis and approxiate query answering. As a result, coputing and aintaining histogras of the data has attracted a substantial aount of interests in databases and beyond, see e.g., [GMP97, JPK + 98, GKS06, CMN98, TGIK0, GGI + 0], or the survey [Ioa03]. A popular criterion for fitting a histogra to a distribution P is the least-squares criterion. Specifically, the goal is to find H that iniizes the l nor P H. Such histogras are often called v-optial histogras, with v standing for variance. There has been a substantial aount of work on algoriths, approxiate or exact, that copute the optial k-histogra H given P and k [JPK + 98, GKS06]. However, since these algoriths need to read the whole input to copute H, their running ties are at least linear in n. A ore efficient way to construct data histogras is to use rando saples fro data set D. There have been soe results on this front as well [CMN98, GMP97]. However, they have been restricted to so-called equi-depth histogras which are essentially approxiate quantiles of the data distribution or so-called copressed histogras. Although the nae by which they are referred to sounds siilar, both of these representations are quite different fro the representations considered in this paper. We are not aware of any work on constructing v-optial histogras fro rando saples with provable guarantees. The proble of constructing an approxiate histogra fro rando saples can be forulated in the fraework of distribution property testing and estiation [Rub06, Ron08]. In this fraework, an algorith is given access to i.i.d. saples fro an unknown probability distribution p, and its goal is to to characterize or estiate various properties of p. In our case we define p = P/ p 1. Then choosing a rando eleent fro the data set D corresponds to choosing i [n] according the distribution p. In this paper we propose several algoriths for constructing and testing for the existence of good histogras approxiating a given distribution p. 1.1 Histogra taxonoy Forally a histogra is a function H : [n] [0, 1] that is defined by a sequence of intervals I 1,..., I k and a corresponding sequence of values v 1,..., v k. For t [n], Ht represents an estiate to pt. We consider the following classes of histogras see [TGIK0] for a full list of classes: 1. Tiling histogras: the intervals for a tiling of [n] i.e., they are disjoint and cover the whole doain. For any t we have Ht = v i, where t I i. In practice we represent a tiling k-histogra as a sequence {I 1, v 1... I k, v k }.. Priority histogras: the intervals can overlap. For any t we have Ht = v i, where i is the largest index such that t I i ; if none exists Ht = 0. In practice we represent a priority k-histogra as 1
3 {I 1, v 1, r 1... I k, v k, r k } where r 1,..., r k correspond to the priority of the intervals. Note that if a function has a tiling k-histogra representation then it has a priority k-histogra representation. Conversely if it has a priority k-histogra representation then it has a tiling k-histogra representation. 1. Results The following algoriths receive as input a distribution over [n], p, an accuracy paraeter ɛ and an integer k. 1. In Section 3, we describe an algorith which outputs a priority k-histogra that is closest to p in the l distance up to ɛ-additive error. The algorith is a greedy algorith, at each step it enuerates over all possible intervals and adds the interval which iniizes the approxiated l distance. The saple coplexity of the algorith is Õk/ɛ ln n and the running tie is Õk/ɛ n. We then iprove the running tie substantially to Õk/ɛ ln n by enuerating on a partial set of intervals.. In Section 4, we provide a testing algorith for the property of being a tiling k-histogra with respect to the l 1 nor. The saple coplexity of the algorith is Õ knɛ 5. We provide a siilar test for the l nor that has saple coplexity of Oln n ɛ 4. We prove that testing if a distribution is a tiling k-histogra in the l 1 -nor requires Ω kn saples for every k 1/ɛ. 1.3 Related Work Our forulation of the proble falls within the fraework of property testing [RS96, GGR98, BFR + 00]. Properties of single and pairs of distributions has been studied quite extensively in the past see [BFR + 10, BFF + 01, AAK + 07, BDKR05, GMP97, BKR04, RRSS09, Val08, VV11]. One question that has received uch attention in property testing is to deterine whether or not two distributions are siilar. A proble referred to as Identity testing assues that the algorith is given access to saples of distribution p and an explicit description of distribution q. The goal is to distinguish a pair of distributions that are identical fro a pair of distributions that are far fro each other. A special case of Identity testing is Unifority Testing, where the fixed distribution, q, is the unifor distribution. A unifor distribution can be represented by a tiling 1-histogra and therefore the study of unifority testing is closely related to our study. Goldreich and Ron [GR00] study Unifority Testing in the context of approxiating graph expansion. They show that counting pairwise collisions in a saple can be used to approxiate the l -nor of the probability distribution fro which the saple was drawn fro. Several ore recent works, including this one, ake use of this technical tool. Batu et al. [BFR + 10] note that running the [GR00] algorith with Õ n saples yields an algoriths for unifority testing in the l 1 -nor. Paninski [Pan08] gives an optial algorith in this setting that takes a saple of size O n and proves a atching lower bound of Ω n. Valiant [Val08] shows that a tolerant tester for unifority for constant precision would require n 1 o1 saples. Several works in property testing of distributions approxiate the distribution by a sall histogra distribution and use this representation as an essential way in their algorith [BKR04], [BFF + 01]. Histogras were subject of extensive research in data strea literature, see [TGIK0, GGI + 0] and the references therein. Our algorith in Section 3 is inspired by streaing algorith in [TGIK0].
4 Preliinaries Denote by D n the set of all discrete distributions over [n]. A property of a discrete distributions is a subset P D n. We say that a distribution p D n is ɛ-far fro p D n in the l 1 distance l distance if p p 1 > ɛ p p > ɛ. We say that an algorith, A, is a testing algorith for the property P if given an accuracy paraeter ɛ and a distribution p: 1. if p P, A accepts p with probability at least /3. if p is ɛ-far according to any specified distance easure fro every distribution in P, A rejects p with probability at least /3. Let p D n, the for every i [n], denote by p i the probability of the i-th eleent. For every I [n], let pi denote the weight of I, i.e. p i. For every I [n] such that pi 0, let p I denote the distribution of p restricted to I i.e. p I i = p i pi. Call an interval I flat if p I is unifor or pi = 0. Given a set of saples fro p, S, denote by S I the saples that fall in the interval I. Define the observed collision probability of interval I as colls I S where colls I def = occi,si and occi, SI is [ the nuber of occurrences of i in S I. In [GR00], in the proof of Lea 1, it was shown that E p I and that [ ] colls I Pr p I > δ p I S In particular, since p I 1, we also have that [ ] colls I Pr p I > ɛ < S In a siilar fashion we prove the following lea. < < δ SI ] colls I = S 1/ 1 pi 4 δ S p I. 1 ɛ 1 S. 3 Lea 1 Based on [GR00] If we take 4 saples, S, then, for every interval I, ɛ [ colls I Pr ] p i ɛpi > 3 4 S 4 Proof: For every i < j define an indicator variable C i,j so that C i,j = 1 if the ith saple is equal to the jth saple and is in the interval I. For every i < j, µ def = E[C i,j ] = p def i. Let P = {i, j : 1 i < j }. By Chebyshev s inequality: [ i,j P Pr C i,j ] p i P > ɛpi Var[ i,j P C i,j] ɛ pi P 3
5 Fro [GR00] we know that Var i,j P C i,j P µ + P 3/ µ 3/ 5 [ ] and since µ p I we have Var i,j P C i,j pi P + P 3/ µ 1/, thus Pr [ i,j P C i,j P p i ] > ɛpi < P + P 3/ µ 1/ ɛ P 6 ɛ P 1/ 6 ɛ Near-optial Priority k-histogra In this section we give an algorith that given p D n, outputs a priority k-histogra which is close in the l distance to an optial tiling k-histogra that describes p. The algorith, based on a sketching algorith in [TGIK0], takes a greedy strategy. Initially the algorith starts with an epty priority histogra. It then proceed by doing k ln ɛ 1 iterations, where in each iteration it goes over all n possible intervals and adds the best one, i.e the interval I [n] which iniizes the distance between p and H when added to the currently constructed priority histogra H. The algorith has an efficient saple coplexity of only logarithic dependence on n but the running tie has polynoial dependence on n. This polynoial dependency is due to the exhaustive search for the interval which iniizes the distance between p and H. Theore 1 Let p D n be the distribution and let H be the tiling k-histogra which iniizes p H. The priority histogra H reported by Algorith 1 satisfies p H p H + 5ɛ. The saple coplexity of Algorith 1 is Õk/ɛ ln n. The running tie coplexity of Algorith 1 is Õk/ɛ n. Proof: By Chernoff s bound and union bound over the intervals in [n], with high constant probability, for every I, y I pi ξ. 8 By Lea 1 and Chernoff s bound, with high constant probability, for every I, z I p i ξpi. 9 Henceforth, we assue that the estiations obtained by the algorith are good. It clear that we can transfor any priority histogra H to any tiling k-histogra, H, by adding the k intervals of H to H with the highest priority. This iplies that there exists an interval J and a value y J such that adding the to H as described in Algorith 1 decreases the error in the following way p H J,yJ p H 1 1 p H k p H. 10 4
6 Algorith 1: Greedy algorith for priority k-histogra 1 Obtain l = ln1n saples, S, fro p, where ξ = ɛ ξ k ln 1 ɛ For each interval I [n] set y I := S I l ; 3 Obtain r = ln6n sets of saples, S 1,..., S r, each of size = 4 fro p; ξ I 4 For each interval I [n] set z I := edian ; 5 Initialize the priority histogra H to epty; 6 for i := 1 to k ln 1 ɛ do 7 foreach interval J [n] do 8 Create H J,yJ obtained by: collsi 1 S1,..., collsr Sr Adding the interval J to H with the value y J ; Recoputing the interval to the left right of J, I L I R so it would not intersect with J; Adding I L I R with the value y IL y IR to H; c J := z I y I ; 9 Let J in be the interval with the sallest value of c J ; 10 Update H to be H Jin,y Jin ; 11 return H ; where H J,yJ is defined in Algorith 1. Next, we would like to write the distance between H J,yJ and p as a function of p i and pi, for I H J,y J. We note that the value of x that iniizes the su p i x is x = pi, therefore p H J,yJ = = p i pi 11 p pi pi i p i + 1 p i pi. 13 Since c J = z I y I, by applying the triangle inequality twice we get that c J z I p i + z I p i + 5 p i y I p i p + pi y I 14, 15
7 So after reordering the ters in Equation 15 we obtain that c J p i pi + z I p i + y I p Fro the fact that y I p = y I pi y I + pi and Equation 8 it follows that Therefore we obtain fro Equations 9, 13, 16 and 17 that. 16 y I pi ξξ + pi. 17 c J p H J,yJ + ξpi + ξξ + pi 18 p H J,yJ + 3ξ + {I H J,y J } ξ. 19 Since the algorith calculates c J for every interval J, we derive fro Equations 10 and 19 that at the q-th step p H Jin,y p Jin H 1 1 p H k p H + 3ξ + qξ. 0 So for H obtained by the algorith after q steps we have p H p H 1 1 q k +q3ξ +qξ. Setting q = k ln 1 ɛ we obtain that p H p H + 5ɛ as desired. 3.1 Iproving the Running Tie We now turn to iproving the running tie coplexity to atch the saple coplexity. Instead of going over all possible intervals in [n] in search for an interval I [n] to add to the constructed priority histogra H. We search for I over a uch saller subset of intervals, in particular, only those intervals whose endpoints are saples or neighbors of saples. In Lea we prove that if we decrease the value a histogra H assigns to an interval I, then the square of the distance between H and p in the l -nor can grow by at ost pi. The lea iplies that we can treat light weight intervals as atoic coponents in our search because they do not affect the distance between H and p by uch. While the running tie is reduced significantly, we prove that the histogra this algorith outputs is still close to being optial. Lea Let p D n and let I be an interval in [n]. For 0 β 1 < β 1, p i β pi 1 p i β 1 Theore Let p and H be as in Theore 1. There is an algorith that outputs a priority histogra H that satisfies p H p H + 8ɛ. The saple coplexity of the algorith and the running tie coplexity of the algorith is Õk/ɛ ln n. Proof: In the iproved algorith, as in Algorith 1, we take l = ln1n saples, T. Instead of going ξ over all J [n] in Step 7 we consider only a sall subset of intervals as candidates. We denote this subset of intervals by T. Let T be the set of all eleents in T and those that are distance one away, i.e. 6
8 T = {in{i + 1, n}, i, ax{i 1, 0} i T }. Then T is the set of all intervals between pairs of eleents in T, i.e. [a, b] T if and only if a b and a, b T. Thus, the size of T is bounded above by 3l+1. Therefore we decrease the nuber of iterations in Step 7 fro n to at ost 3l+1. It is easy to see that intervals which are not in T have sall weight. Forally, let I be an intervals such that pi > ξ. The probability that I has no hits after taking l saples is at ost 1 ξ l < 1/n. Therefore by union bound over all the intervals I [n], with high constant probability, for every interval which has no hits after taking l saples, the weight of the interval is at ost ξ. Next we see why in Step 7 we can ignore intervals which have sall weight. Consider a single run of the loop in Step 7 in Algorith 1. Let H be the histogra constructed by the algorith so far and let J in be the interval added to H at the end of the run. We shall see that there is an interval J T such that p H J, pj p H Jin,y Jin 4ξ. J Denote the endpoints of J in by a and b where a < b. Let I 1 = [a 1, b 1 ] be the largest interval in T such that I 1 J in and let I = [a, b ] be the sallest interval in T such that J in I. Therefore for every interval J = [x, y] where x {a 1, a } and y {b 1, b } we have that i J J in p i ξ where J J in is the syetric difference of J and J in. Let β 1, β the value assigned to i [a, a 1 ], i [a, a 1 ] by H Jin, y Jin, respectively. Notice that the algorith only assigns values to intervals in T, therefore β 1 and β are well defined. Take J to be as follows. If β 1 > y J then take the start-point of J to be a 1 otherwise take it to be a. If β > y J then take the end-point of J to be b 1 otherwise take it to be b. By lea it follows that p HJ,yJin p H Jin, y Jin p i 4ξ. 3 i J J in Thus, we obtain Equation fro the fact that p H J, pj J = in δ p H J,δ 4 Thus, by siilar calculations as in the proof of theore 1, after q steps, p H p H 1 1 q k + q3ξ + qξ + 4ξ; Setting q = k ln 1 ɛ we obtain that p H p H 8ɛ. Proof of Lea : p i β p i + β 5 p i β 1 p i β = p i β 1 p i + β 1 piβ β 1 + β 1 β 6 pi 7 4 Testing whether a Distribution is a Tiling k-histogra In this section we provide testing algoriths for the property of being a tiling k-histogra. The testing algoriths attept to partition [n] into k intervals which are flat according to p recall that an interval is flat if it has unifor conditional distribution or it has no weight. If it fails to do so then it rejects p. Intervals that are close to being flat can be detected because either they have light weight, in which case they can be 7
9 found via sapling, or they are not light weight, in which case they have sall l -nor. Sall l -nor can in turn be detected via estiations of the collision probability. Thus an interval that has overall sall nuber of saples or alternatively sall nuber of pairwise collisions is considered by the algorith to be a flat interval. The search of the flat intervals boundaries is perfored in a siilar anner to a search of a value in a binary search. The efficiency of our testing algorith is stated in the following theores: Theore 3 Algorith is a testing algorith for the property of being a tiling k-histogra for the l distance easure. The saple coplexity of the algorith is Oln n ɛ 4. The running tie coplexity of the algorith is Ok ln 3 n ɛ 4. Theore 4 There exists a testing algorith for the property of being a tiling k-histogra for the l 1 distance easure. The saple coplexity of the algorith is Õ knɛ 5. The running tie coplexity of the algorith is Õk knɛ 5. Algorith : Test Tiling k-histogra 1 Obtain r = 16 ln6n sets of saples, S 1,..., S r, each of size = 64 ln n ɛ 4 fro p; Set previous := 1, low := 1, high := n; 3 for i := 1 to k do 4 while high low do 5 id := low + high - low /; 6 if testflatness-l [previous,id], S 1,..., S r, ɛ then 7 low := id+1; 8 else 9 high := id 1; 10 previous := low; 11 high := n; 1 If previous = n then return ACCEPT; 13 return REJECT Algorith 3: testflatness-l I, S 1,..., S r, ɛ 1 For each i [r] set ˆp i I := Si ; If there exists i [r] such that Si 3 If edian collsi 1 S1 4 return REJECT;,..., collsr < ɛ I Sr 1 then return ACCEPT ; + ax i{ ɛ ˆp i I } then return ACCEPT ; Proof of Theore 3: Let I be an interval in [n] we first show that [ Pr edian{colls1 I S 1,..., collsr I } p I ax i S r { ɛ ˆp i I } ] > 1 1 6n. 8 8
10 Recall that ˆp i I = Si 64, hence, due to the facts that and S i ɛ 4 we get that Si Si 64 SI i. By Equation 3, for each i [r], 16ˆpi I ɛ 4 [ collsi i Pr p I S i ɛ 4 ] ɛ ˆp i > 3 I 4. 9 Since each estiate collsi I Si is close to p I with high constant probability, we get fro Chernoff s bound that for r = 16 ln6n the edian of r results is close to p I with very high probability as stated in Equation 8. By union bound over all the intervals in [n], with high constant probability, the following holds for everyone of the at ost n intervals in [n], I, edian{colls1 I S 1,..., collsr I } p I ax { ɛ i ˆp i I }. 30 S r So henceforth we assue that this is the case. Assue the algorith rejects. When this occurs it iplies that there are at least k distinct intervals such that for each interval the test testflatness-l returned REJECT. For each of these intervals I we have pi 0 and edian{ colls1 I S1,..., collsr I Sr } > 1 + ax i{ ɛ ˆp i I }. In this case p I 1, and so I is not flat and contains at least one bucket boundary. Thus, there are at least k internal bucket boundaries. Therefore p is not a tiling k-histogra. Assue the algorith accepts p. When this occurs there is a partition of [n] to k intervals, I, such that for each interval I I, testflatness-l returned ACCEPT. Define p to be pi on the intervals obtained by the algorith. For every I I, If is the case that there exists i [r], such that Si < ɛ, then by fact 1 below, pi < ɛ. Therefore, fro the fact that p i x is iniized by x = pi and the Cauchy-Schwarz inequality we get that Otherwise, if Si fact 1, it follows that ˆp i I = Si p i pi p i 31 pi ɛ pi. 3 ɛ for every i [r] then by the second ite in fact 1, pi ɛ 4. By the first ite in pi and therefore edian{ colls1 I,..., collsr I } 1 + S 1 S r ɛ pi. 33 pi pi 1 This iplies that p I 1 + ɛ pi. Thus, p I u ɛ pi and since p I u = we get that p i pi ɛ pi. Hence I I p i pi ɛ, thus, p is ɛ-close to p in the l -nor. Fact 1 If we take 48 lnn γ ɛ saples, S, then with probability greater than 1 1 γ : 9
11 1. For any I such that pi ɛ 4, pi S I 3pI. For any I such that S I ɛ, pi > ɛ 4 3. For any I such that S I < ɛ, pi < ɛ Proof: Fix I, if pi ɛ 4, by Chernoff s bound with probability greater than 1 e ɛ 48, pi S I 3pI. 34 In particular, if pi = ɛ 4, then Si pi ɛ 4 or pi > ɛ 4 but then pi S I 1 n e ɛ 48 > 1 1 γ, the above is true for every I. 3ɛ 8, thus if S I ɛ > 3ɛ 8 then pi > ɛ 4. If S I < ɛ then either < ɛ. By the union bound, with probability greater than Algorith 4: testflatness-l 1 I, S 1,..., S r, ɛ 1 If there exists i [r] such that SI i < 163 then return ACCEPT; ɛ 4 collsi If edian 1 S1,,..., collsr I Sr ɛ 4 then return ACCEPT ; 3 return REJECT; Proof of Theore 4: Apply Algorith with the following changes: take each set of saples S i to be of size = 13 knɛ 5 and replace testflatness-l with testflatness-l 1. By Equation [ ] colls I Pr p I > δ p I 4 < δ. 35 S p I S Thus, if S I is such that S 16 δ 16 δ p I, then [ ] colls I Pr p I > δ p I > S By additive Chernoff s bound and the union bound for r = 16 ln6n and δ = ɛ 16, with high constant probability for every interval I that passes Step 1 in Algorith 4 it holds that colls I S p I δ p I the total nuber of intervals in [n] is less than n. So fro fro this point on we assue that the algorith obtains a δ-ultiplicative approxiation of p I for every I that passes Step 1. Assue the algorith rejects p, then there are at least k distinct intervals such that for each interval the test testflatness-l 1 returned REJECT. By our assuption each of these intervals is not flat and thus contains at least one bucket boundary. Thus, there are at least k internal buckets boundaries, therefore p is not a tiling k-histogra. Assue the algorith accepts p, then there is a partition of [n] to k intervals, I, such that for each interval I I, testflatness-l 1 returned ACCEPT. Define p to be pi on the intervals obtained by the algorith. For 10
12 ɛ any interval I for which testflatness-l 1 returned ACCEPT and passes Step 1 it holds that p I u < thus p i pi ɛ pi. Denote by L the set of intervals for which testflatness-l 1 returned ACCEPT on Step 1. By Chernoff s bound, for every I L, with probability greater than 1 e ɛ 3k, either pi ɛ 4k or pi 163. Hence, with probability greater than 1 n r e ɛ 3k > 1 1 6, the total weight of the intervals in L: ɛ 4 I L ax{ 163 ɛ ɛ 4, 4k } ɛ ɛ 4 = ɛ 1 + ɛ 4 I L I L kn, 37 where the last inequality follows fro the fact that L k which iplies that I L /n k. Therefore, p is ɛ-close to p in l 1 -nor. 4.1 Lower Bound We prove that for every k 1/ɛ, the upper bound in Theore 4 is tight in ter of the dependence in k and n. We note that for k = n, testing tiling k-histogra is trivial, i.e. every distribution is a tiling n-histogra. Hence, we can not expect to have a lower bound for any k. We also note that the testing lower bound is also an approxiation lower bound. Theore 5 Given a distribution D testing if D is a tiling k-histogra in the l 1 -nor requires Ω kn saples for every k 1/ɛ. Proof: Divide [n] into k intervals of equal size up to ±1. In the YES instance the total probability of each interval alternates between 0 and /k and within each interval the eleents have equal probability. The NO instance is defined siilarly with one exception, randoly pick one of the intervals that have total probability /k, I, and within I randoly pick half of the eleents to have probability 0 and the other half of the eleents to have twice the probability of the corresponding eleents in the YES instance. In the proof of the lower bound for testing unifority it is shown that distinguishing a unifor distribution fro a distribution that is unifor on a rando half of the eleents and has 0 weight on the other half requires Ω n. Since the nuber of eleents in I is Θn/k, by a siilar arguent we know that at least Ω n/k saples are required fro I in order to distinguish the YES instance fro the NO instance. Fro the fact that the total probability of I is Θ1/k we know that in order to obtain Θ n/k hits in I we are required to take a total nuber of saples which is of order nk, thus we obtain a lower bound of Ω nk. References [AAK + 07] N. Alon, A. Andoni, T. Kaufan, K. Matulef, R. Rubinfeld, and N. Xie. Testing k-wise and alost k-wise independence. In Proceedings of the Thirty-Ninth Annual ACM Syposiu on the Theory of Coputing STOC, pages , 007. [BDKR05] T. Batu, S. Dasgupta, R. Kuar, and R. Rubinfeld. The coplexity of approxiating the entropy. SIAM Journal on Coputing, 351:13 150, 005. [BFF + 01] T. Batu, L. Fortnow, E. Fischer, R. Kuar, R. Rubinfeld, and P. White. Testing rando variables for independence and identity. In Proceedings of the Forty-Second Annual Syposiu on Foundations of Coputer Science FOCS, pages ,
13 [BFR + 00] [BFR + 10] [BKR04] [CMN98] [GGI + 0] [GGR98] [GKS06] [GMP97] [GR00] T. Batu, L. Fortnow, R. Rubinfeld, W.D. Sith, and P. White. Testing that distributions are close. In Proceedings of the Forty-First Annual Syposiu on Foundations of Coputer Science FOCS, pages 59 69, Los Alaitos, CA, USA, 000. IEEE Coputer Society. T. Batu, L. Fortnow, R. Rubinfeld, W. D. Sith, and P. White. Testing closeness of discrete distributions. CoRR, abs/ , 010. This is a long version of [BFR + 00]. T. Batu, R. Kuar, and R. Rubinfeld. Sublinear algoriths for testing onotone and uniodal distributions. In Proceedings of the Thirty-Sixth Annual ACM Syposiu on the Theory of Coputing STOC, pages , 004. S. Chaudhuri, R. Motwani, and V. Narasayya. Rando sapling for histogra construction: how uch is enough? SIGMOD, A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, M. Muthukrishnan, and M. Strauss. Fast, sall-space algoriths for approxiate histogra aintenance. STOC, 00. O. Goldreich, S. Goldwasser, and D. Ron. Property testing and its connection to learning and approxiation. Journal of the ACM, 454: , S. Guha, N. Koudas, and K. Shi. Approxiation and streaing algoriths for histogra construction probles. ACM Transactions on Database Systes TODS, 311, 006. P.B. Gibbons, Y Matias, and V. Poosala. Fast increental aintenance of approxiate histogras. VLDB, O. Goldreich and D. Ron. On testing expansion in bounded-degree graphs. Electronic Colloqiu on Coputational Coplexity, 70, 000. [Ioa03] Y. Ioannidis. The history of histogras abridged. VLDB, 003. [JPK + 98] [Pan08] [Ron08] [RRSS09] [RS96] H. V. Jagadish, V. Poosala, N. Koudas, K. Sevcik, S. Muthukrishnan, and T. Suel. Optial histogras with quality guarantees. VLDB, L. Paninski. Testing for unifority given very sparsely-sapled discrete data. IEEE Transactions on Inforation Theory, 5410: , 008. D. Ron. Property testing: A learning theory perspective. Foundations and Trends in Machine Learning, 3:307 40, 008. S. Raskhodnikova, D. Ron, A. Shpilka, and A. Sith. Strong lower bonds for approxiating distributions support size and the distinct eleents proble. SIAM Journal on Coputing, 393:813 84, 009. R. Rubinfeld and M. Sudan. Robust characterization of polynoials with applications to progra testing. SIAM Journal on Coputing, 5:5 71, [Rub06] R. Rubinfeld. Sublinear tie algoriths. In Proc. International Congress of Matheaticians, volue 3, pages , 006. [TGIK0] [Val08] [VV11] Nitin Thaper, Sudipto Guha, Piotr Indyk, and Nick Koudas. Dynaic ultidiensional histogras. In SIGMOD Conference, pages , 00. P. Valiant. Testing syetric properties of distributions. In Proceedings of the Fourtieth Annual ACM Syposiu on the Theory of Coputing STOC, pages , 008. G. Valiant and P. Valiant. Estiating the unseen: an n/ logn-saple estiator for entropy and support size, shown optial via new CLTs. In Proceedings of the Fourty-Third Annual ACM Syposiu on the Theory of Coputing, pages , 011. See also ECCC TR and TR ECCC ISSN
Approximating and Testing k-histogram Distributions in Sub-linear Time
Electronic Colloquiu on Coputational Coplexity, Revision 1 of Report No. 171 (011) Approxiating and Testing k-histogra Distributions in Sub-linear Tie Piotr Indyk Reut Levi Ronitt Rubinfeld July 31, 014
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationCollision-based Testers are Optimal for Uniformity and Closeness
Electronic Colloquiu on Coputational Coplexity, Report No. 178 (016) Collision-based Testers are Optial for Unifority and Closeness Ilias Diakonikolas Theis Gouleakis John Peebles Eric Price USC MIT MIT
More informationEstimating Entropy and Entropy Norm on Data Streams
Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationTaming Big Probability Distributions
Taming Big Probability Distributions Ronitt Rubinfeld June 18, 2012 CSAIL, Massachusetts Institute of Technology, Cambridge MA 02139 and the Blavatnik School of Computer Science, Tel Aviv University. Research
More informationTesting Distributions Candidacy Talk
Testing Distributions Candidacy Talk Clément Canonne Columbia University 2015 Introduction Introduction Property testing: what can we say about an object while barely looking at it? P 3 / 20 Introduction
More informationOn the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs
On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationarxiv: v1 [cs.ds] 29 Jan 2012
A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationarxiv: v1 [cs.ds] 17 Mar 2016
Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationBipartite subgraphs and the smallest eigenvalue
Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.
More informationConvex Programming for Scheduling Unrelated Parallel Machines
Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical
IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationLecture 9 November 23, 2015
CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationImproved Guarantees for Agnostic Learning of Disjunctions
Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University
More informationTight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions
Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationLogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting
LogLog-Beta and More: A New Algorith for Cardinality Estiation Based on LogLog Counting Jason Qin, Denys Ki, Yuei Tung The AOLP Core Data Service, AOL, 22000 AOL Way Dulles, VA 20163 E-ail: jasonqin@teaaolco
More informationAlgorithms for parallel processor scheduling with distinct due windows and unit-time jobs
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationHomework 3 Solutions CSE 101 Summer 2017
Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing
More informationResearch Article On the Isolated Vertices and Connectivity in Random Intersection Graphs
International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for
More informationLecture 20 November 7, 2013
CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,
More informationOn Constant Power Water-filling
On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives
More informationRandomized Recovery for Boolean Compressed Sensing
Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch
More informationNecessity of low effective dimension
Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that
More informationRandomized Accuracy-Aware Program Transformations For Efficient Approximate Computations
Randoized Accuracy-Aware Progra Transforations For Efficient Approxiate Coputations Zeyuan Allen Zhu Sasa Misailovic Jonathan A. Kelner Martin Rinard MIT CSAIL zeyuan@csail.it.edu isailo@it.edu kelner@it.edu
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationHandout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.
Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationOptimal Jamming Over Additive Noise: Vector Source-Channel Case
Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper
More informationExact tensor completion with sum-of-squares
Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David
More informationNote on generating all subsets of a finite set with disjoint unions
Note on generating all subsets of a finite set with disjoint unions David Ellis e-ail: dce27@ca.ac.uk Subitted: Dec 2, 2008; Accepted: May 12, 2009; Published: May 20, 2009 Matheatics Subject Classification:
More informationMixed Robust/Average Submodular Partitioning
Mixed Robust/Average Subodular Partitioning Kai Wei 1 Rishabh Iyer 1 Shengjie Wang 2 Wenruo Bai 1 Jeff Biles 1 1 Departent of Electrical Engineering, University of Washington 2 Departent of Coputer Science,
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationHamming Compressed Sensing
Haing Copressed Sensing Tianyi Zhou, and Dacheng Tao, Meber, IEEE Abstract arxiv:.73v2 [cs.it] Oct 2 Copressed sensing CS and -bit CS cannot directly recover quantized signals and require tie consuing
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More informationWhen Short Runs Beat Long Runs
When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long
More informationWeighted- 1 minimization with multiple weighting sets
Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University
More informationarxiv: v3 [cs.lg] 7 Jan 2016
Efficient and Parsionious Agnostic Active Learning Tzu-Kuo Huang Alekh Agarwal Daniel J. Hsu tkhuang@icrosoft.co alekha@icrosoft.co djhsu@cs.colubia.edu John Langford Robert E. Schapire jcl@icrosoft.co
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationThe Transactional Nature of Quantum Information
The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationFundamental Limits of Database Alignment
Fundaental Liits of Database Alignent Daniel Cullina Dept of Electrical Engineering Princeton University dcullina@princetonedu Prateek Mittal Dept of Electrical Engineering Princeton University pittal@princetonedu
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationOn Conditions for Linearity of Optimal Estimation
On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at
More information1 Identical Parallel Machines
FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j
More informationNew Slack-Monotonic Schedulability Analysis of Real-Time Tasks on Multiprocessors
New Slack-Monotonic Schedulability Analysis of Real-Tie Tasks on Multiprocessors Risat Mahud Pathan and Jan Jonsson Chalers University of Technology SE-41 96, Göteborg, Sweden {risat, janjo}@chalers.se
More informationDetection and Estimation Theory
ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationNonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy
Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo
More informationKeywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution
Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality
More informationAsynchronous Gossip Algorithms for Stochastic Optimization
Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu
More informationOn the Use of A Priori Information for Sparse Signal Approximations
ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique
More informationTesting equivalence between distributions using conditional samples
Testing equivalence between distributions using conditional samples (Extended Abstract) Clément Canonne Dana Ron Rocco A. Servedio July 5, 2013 Abstract We study a recently introduced framework [7, 8]
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationEstimating properties of distributions. Ronitt Rubinfeld MIT and Tel Aviv University
Estimating properties of distributions Ronitt Rubinfeld MIT and Tel Aviv University Distributions are everywhere What properties do your distributions have? Play the lottery? Is it independent? Is it uniform?
More informationA Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science
A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationTail estimates for norms of sums of log-concave random vectors
Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics
More informationA Pass-Efficient Algorithm for Clustering Census Data
A Pass-Efficient Algorithm for Clustering Census Data Kevin Chang Yale University Ravi Kannan Yale University Abstract We present a number of streaming algorithms for a basic clustering problem for massive
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationThe Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost
More informationConstant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool
Constant-Space String-Matching in Sublinear Average Tie (Extended Abstract) Maxie Crocheore Universite de Marne-la-Vallee Leszek Gasieniec y Max-Planck Institut fur Inforatik Wojciech Rytter z Warsaw University
More information1 Distribution Property Testing
ECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data Instructor: Jayadev Acharya Lecture #10-11 Scribe: JA 27, 29 September, 2016 Please send errors to acharya@cornell.edu 1 Distribution
More informationPrerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.
Prerequisites 1 Set Theory We recall the basic facts about countable and uncountable sets, union and intersection of sets and iages and preiages of functions. 1.1 Countable and uncountable sets We can
More informationBirthday Paradox Calculations and Approximation
Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays
More informationAdaptive Stabilization of a Class of Nonlinear Systems With Nonparametric Uncertainty
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 11, NOVEMBER 2001 1821 Adaptive Stabilization of a Class of Nonlinear Systes With Nonparaetric Uncertainty Aleander V. Roup and Dennis S. Bernstein
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationUsing a De-Convolution Window for Operating Modal Analysis
Using a De-Convolution Window for Operating Modal Analysis Brian Schwarz Vibrant Technology, Inc. Scotts Valley, CA Mark Richardson Vibrant Technology, Inc. Scotts Valley, CA Abstract Operating Modal Analysis
More informationBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence
Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr
More informationDesign of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding
IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas
More information