Improved Guarantees for Agnostic Learning of Disjunctions

Size: px
Start display at page:

Download "Improved Guarantees for Agnostic Learning of Disjunctions"

Transcription

1 Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University Avri Blu Carnegie Mellon University Or Sheffet Carnegie Mellon University Abstract Given soe arbitrary distribution D over {0, 1} n and arbitrary target function c, the proble of agnostic learning of disjunctions is to achieve an error rate coparable to the error OPT disj of the best disjunction with respect to (D, c ). Achieving error O(n OPT disj ) + ǫ is trivial, and Winnow [13] achieves error O(r OPT disj ) + ǫ, where r is the nuber of relevant variables in the best disjunction. In recent work, Peleg [14] shows how to achieve a bound of Õ( n OPT disj )+ǫ in polynoial tie. In this paper we iprove on Peleg s bound, giving a polynoial-tie algorith achieving a bound of O(n 1/3+α OPT disj ) + ǫ for any constant α > 0. The heart of the algorith is a ethod for weak-learning when OPT disj = O(1/n 1/3+α ), which can then be fed into existing agnostic boosting procedures to achieve the desired guarantee. 1 Introduction Learning disjunctions (or conjunctions) over {0, 1} n in the PAC odel is a well-studied and easy proble. The siple list-and-cross-off algorith runs in linear tie per exaple and requires only O(n/ǫ) exaples to achieve error ǫ (ignoring the logarithic dependence on the confidence ter δ). The siilarly efficient Winnow algorith [13] requires only O((r log n)/ǫ) exaples to learn well when the target function is a disjunction of size r. However, when the data is only ostly consistent with a disjunction, the proble becoes substantially harder. In this agnostic setting, our goal is to produce a hypothesis h whose error rate err D (h) = Pr D (h(x) c (x)) satisfies err D (h) c OPT disj + ǫ, where OPT disj is the error rate of the best disjunction and c is as sall as possible. For exaple, while Winnow perfors well as a function of the nuber of attribute errors of the best disjunction 1 [12, 1], this can be a factor O(r) worse than the nuber of istakes of the best disjunction. Recently, Feldan [3] has shown that for any constant ǫ > 0, deterining whether the best disjunction for a given dataset S has error ǫ or error 1 2 ǫ is NP-hard. Even ore recently, Feldan et al. [5] extend this hardness result to the proble of agnostic learning disjunctions by the hypothesis class of halfspaces. Thus, these results show that the proble of finding a disjunction (or linear separator) of error at ost 1 2 ǫ given that the error OPT disj of the best disjunction is at ost ǫ is coputationally hard for any constant ǫ > 0. Given these hardness results, it is natural to consider what kinds of learning guarantees can be achieved. If the error OPT disj of the best disjunction is O(1/n) then learning is essentially equivalent to the noise-free case. Peleg [14] shows how to iprove this to a bound of Õ(1/ n). In particular, on any given dataset S, his algorith produces a disjunction of error rate on S at ost Õ( n OPT disj (S)). 2 1 The iniu nuber of variables that would need to be flipped in order to ake the data perfectly consistent with a disjunction. This is essentially the sae as its hinge loss. 2 His results are for the Red-Blue Set-Cover Proble [2] which is equivalent to the proble of approxiating the best disjunction, except that positive exaples ust be classified correctly (i.e., the goal is to approxiate the iniu nuber of istakes on negatives subject to correctly classifying the positives). The extension to allowing for two-sided error, however, is iediate.

2 In this work, we iprove on the result of Peleg [14], achieving a bound of O(n 1/3+α OPT disj ) + ǫ for any constant α > 0, though our algorith is not a proper learner (does not produce a disjunction as its output). 3 In particular, our ain result is an algorith for weak-learning under an arbitrary distribution D, under the assuption that the optial disjunction has error rate O(1/n 1/3+α ), which we can then feed into boosting procedures of [7, 9] or the recent ABoostDI booster of [4], to achieve the claied guarantee. Note that our guarantee holds for any distribution over {0, 1} n. In contrast, ost recent work on agnostic learning has been for the case of unifor or other nice distributions [8, 10, 11]. 1.1 Our Results We present a learning algorith whose error rate is an O(n 1/3+α ) approxiation to that of the best disjunction, for any α > 0. Forally, we prove the following theore. Theore 1 There exists an algorith that for an arbitrary distribution D over {0, 1} n and arbitrary target function c : {0, 1} n {1, 1}, for every constant α > 0 and every ǫ, δ > 0, runs in tie polynoial in 1/ǫ, log(1/δ), and n, uses poly(1/ǫ, log(1/δ), n) rando exaples fro D, and outputs a hypothesis h, such that with probability > 1 δ, where OPT disj = in f DISJUNCTIONS err D (f). err D (h) O(n 1 3 +α OPT disj ) + ǫ The proof of Theore 1 is based on finding a weak-learner under the assuption that OPT OPT disj = O(n (1/3+α) ). In particular, we show: Theore 2 There exists an algorith with the following property. For every distribution D over {0, 1} n and every target function c such that OPT < n 1 3 α, for soe constant α > 0, for every δ > 0, the algorith runs in tie t(δ, n), uses (δ, n) rando saples drawn fro D and outputs a hypothesis h, such that with probability > 1 δ, err D (h) 1 2 γ where t and are polynoials in n, 1/δ, and γ = Ω(n 2 ). The high-level idea of the algorith and proof for Theore 2 is as follows. First, we can assue the target function is balanced (nearly equal probability ass on positive and negative exaples) and that siilarly no individual variable is noticeably correlated with the target, else weak-learning is iediate. So, for each variable i, the probability ass of positive exaples with x i = 1 is approxiately equal to the fraction of negative exaples with x i = 1. Let c opt denote the (unknown) optial disjunction, which we ay assue is onotone by including negated variables as additional features. Let r denote the nuber of relevant variables; i.e., the nuber of variables in c opt. Also, assue for this discussion that we know the value of OPT = err D (c opt ). Call an exaple x good if c (x) = c opt (x) and bad otherwise. Now, since the only negative exaples than can have a relevant x i set to 1 are the bad negatives, this eans that for relevant variables i, Pr x D (x i = 1 c (x) = 1) = O(OPT). Therefore, Pr x D (x i = 1 c (x) = +1) = O(OPT) and so Pr x D (x i = 1) = O(OPT) as well. This eans that by estiating Pr x D (x i = 1) for each variable i, we can reove all variables of density ω(opt) fro the syste, knowing they are irrelevant. At this point, we have nearly all the ingredients for the Õ(1/ n) bound of Peleg [14]. In particular, since all variables have density O(OPT), this eans the average nuber of variables set to 1 per exaple is O(OPT n). Let S be the set of exaples whose density is at ost twice the average (so Pr(S ) 1/2); we now clai that if OPT = o(1/ n), then either S is unbalanced or else soe variable x i ust have noticeable correlation with the target over exaples in S. In particular, since positive exaples ust have on average at least 1 O(OPT) relevant variables set to 1, and the good negative exaples have zero relevant variables set to 1, the only way for S to be balanced and have no relevant variable with noticeable correlation is for the bad negative exaples to on average have Ω(1/OPT) relevant variables set to 1. But this is not possible since all exaples in S have only O(OPT n) variables set to 1, and 1/OPT OPT n for OPT = o(1/ n). So, soe hypothesis of the for: if x S then flip a fair coin, else predict x i ust be a weak-learner. 3 This bound hides a low-order ter of (log n) 1/α. Solving for equality yields α = O(n 1/3+o(1) ). q log log n log n and a bound of

3 In order to iprove over the Õ(1/ n) bound of [14], we do the following. Assue all variables have nearly the sae density and all exaples have nearly the sae density as well. This is not without loss of generality (and the general case adds additional coplications that ust be addressed), but siplifies the picture for this high-level sketch. Now, if no individual variable or its copleent is a weak predictor, by the above analysis it ust be the case that the bad negative exaples on average have a substantial nuber of variables set to 1 in the relevant region (essentially so that the total hinge-loss (attribute-errors) is Ω()). Suppose now that one guesses such a bad negative exaple e and focuses on only those n variables set to 1 by e. The disjunction c opt restricted to this set ay now ake any istakes on positive exaples (the substantial nuber of variables set to 1 in the relevant region in e ay still be a sall fraction of the relevant region). On the other hand, because we have restricted to a relatively sall nuber of variables n, the average density of exaples as a function of n has dropped significantly. 4 As a result, suppose we again discard all exaples with a nuber of 1 s in these n variables substantially larger than the average. Then, on the reainder, the hinge-loss (attribute-errors) caused by the bad negative exaples is now substantially reduced. This ore than akes up for the additional error on positive exaples. In particular, we show one can argue that for soe bad negative exaple e, if one perfors the above procedure, then with respect to the reaining subset of exaples, soe variable ust be a weak predictor. In the end, the final hypothesis is defined by an exaple e, a threshold θ, and a variable i, and will be of the for if x e [1, θ] then flip a coin, else predict x i. The algorith then siply searches over all such triples. In the general case (when the variables and the exaples do not all have the sae density), this is preceded by a preprocessing step that groups variables and exaples into a nuber of buckets and then runs the above algorith on each bucket. The rest of the paper is organized as follows. We start off with notation and definitions in Section 2. In Section 3 we prove Theore 1. We achieve this in two steps: first we show how to get a weak learner for the special case that the exaples and variables are fairly hoogeneous (all variables set to 1 roughly the sae nuber of ties, and all exaples with roughly the sae nuber of variables set to 1 (actually a soewhat weaker condition than this)). We then show how to reduce a general instance to this special case. In Section 3.3 we use existing boosting algoriths cobined with this weak-learner to prove our ain result. Finally, we discuss conclusions and future directions in Section 4. 2 Notation and Preliinaries Let X = {0, 1} n and let D be the data distribution over X. We have a labeling function c : X {1, 1}, and use c opt to denote the disjunction of least error with respect to c. Without loss of generality we ay assue c opt is onotone, and we denote the error rate of c opt as OPT disj or siply OPT. I.e., OPT = Pr x D [c opt (x) c (x)]. For the rest of the paper, we will assue that OPT = Ω( 1 n ) (otherwise we can use Peleg s algorith described in the previous section). We will also assue that we know the value of OPT. 5 We use r to denote the nuber of variables in c opt and we call these the relevant variables. We will call the exaples on which c and c opt agree good, and those on which c and c opt disagree bad. The exaples causing the ost difficulty will be the bad negative exaples, which can potentially satisfy any relevant variables, thus incurring up to r attribute errors (hinge-loss) and yet be labeled negative. We assue that the algorith gets as input 2 exaples out of which + are positive exaples (their label is 1), and are negative (their label is 1). We can assue for the goal of weak-learning that +, = (1 ± o(1)), else we have an iediate weak predictor. Let + bad denote the nuber of bad positive exaples, i.e., positives that do not satisfy c opt, and let bad denote the nuber of bad negative exaples, i.e., negatives that do satisfy c opt. For convenience of notation (losing at ost a factor of 2 in our guarantee) we assue that the error rate of c opt on both positive and negative exaples separately is at ost OPT. Given this, we ay assue that + bad OPT(1 + o(1)) and bad OPT(1 + o(1)). Our algorith will exaine a set of Õ(n2 ) hypotheses, of which we will prove that at least one has training error at ost 1/2 Ω(1/n 2 ), under the assuption that OPT is O(1/n 1/3+α ). In the following we assue that is sufficiently large, Õ(n4 ), so that with high probability this iplies error at ost 1/2 Ω(1/n 2 ) over D. In particular, each hypothesis is defined by a training exaple, a threshold and a variable, and so by copression bounds [6], Õ(n 4 ) training exaples are sufficient to produce a weak learner with 4 E.g., given two rando vectors with n = n 2/3 1 s, their intersection would have expected size (n ) 1/2. Of course, our dataset need not be unifor rando exaples of the given density, but the fact that all variables have the sae density allows one to ake a siilar arguent. 5 If OPT is unknown, we can efficiently enuerate over possible guesses for OPT such that one such guess will be within a 1/poly additive factor of the true value. For each guess, we can run our algorith and test it on a fresh saple to see if its output is a weak learner.

4 high probability. Finally, it will be convenient to think of algoriths that ake predictions on only a subset of the doain. If an algorith predicts on a subset of probability ass p, and has error rate 1/2 γ on that subset, then by flipping a fair coin on the reainder, the overall error rate will be 1/2 γ for γ = pγ. 3 Proof of Theore 1 We first build a weak agnostic learner for the best disjunction proble. Our weak learner has the guarantee given in Theore 2, which we restate below. Theore 2 There exists an algorith with the following property. For every distribution D over {0, 1} n and every target function c such that OPT < n 1 3 α, for soe constant α > 0, for every δ > 0, the algorith runs in tie t(δ, n), uses (δ, n) rando saples drawn fro D and outputs a hypothesis h, such that with probability > 1 δ, err D (h) 1 2 γ where t and are polynoials in n, 1/δ, and γ = Ω(n 2 ). Our algorith has two stages: a preprocessing step (which we present later in Section 3.2) that ensures that all variables are set to 1 roughly the sae nuber of ties and that the bad and good exaples have roughly the sae nuber 1s, and a core algorith (which we present first in Section 3.1) that operates on data of this for. One aspect of the preprocessing step is that in addition to partitioning exaples into buckets, it ay involve discarding soe relevant variables, yielding a dataset in which only soe /polylog(n) positive exaples satisfy c opt over the variables reaining. Thus, our assuption in Section 3.1 is that while the dataset has the hoogeneity properties desired and the fraction of bad negative exaples is OPT(1 + o(1)), the fraction of bad positive exaples ay be as large as 1 1/polylog(n). Nonetheless, this will still allow for weak learning. 3.1 (B, α, )-Sparse Instances As entioned above, in this section we give a weak learning algorith for a dataset that has certain nice hoogeneity properties. We call such a dataset a (B, α, )-sparse instance. We begin by describing what these properties are. The first property is that there exists a positive integer B such that for each variable x i, the nuber of positive exaples in the instance with x i = 1 is between B/2 and B, and the nuber of negative exaples with x i = 1 is between B 2 (1 o(1)) and B(1 + o(1)). The first property iplies that in this case the overall nuber of 1s in all exaples is at ost 2nB(1 + o(1)), and therefore, an average exaple has no ore than nb(1+o(1)) variables set to 1. If the bad negatives were typical exaples, we would expect the to contain at ost nb bad (1+o(1)) nb OPT(1+o(1)) ones in total. While in general this ay not necessarily be the case, we assue for this section that at least they are not too atypical on average. In particular, the second property we assue this instance satisfies is that the overall nuber of ones present in all the bad negatives is at ost n 1+α BOPT. Denote by the nuber of positive exaples that c opt classifies correctly. The third property is that /n o(α). If this dataset were our given training set then this would be redundant, as we already assue the stronger condition that the fraction of good positive exaples is 1 O(OPT). 6 However, will be of use in later sections, when we call this algorith as a subroutine on instances defined by only a subset of all the variables. In other words, we show here that even if we allow c opt to ake ore istakes on the positive exaples (and in particular, to label alost all positives incorrectly!) yet ake at ost OPT istakes on the negatives, we are still able to weak-learn. As our analysis shows, the condition we require of is that the ratio OPT doinates the ratio. Furtherore, the ratio n 1/3 will play a role in the definition of γ, our advantage over a rando guess. An instance satisfying all the above three properties is called a (B, α, )-sparse instance. Next, we show how to get a weak learner for such sparse instances. We first introduce the following definitions. Definition 3 Given an exaple e and a positive integer threshold θ, we define the (e, θ)-restricted doain to be the set of all exaples whose intersection with e is strictly saller than θ. That is, the set of exaples x such that x e < θ. For any hypothesis h, we define the (e, θ)-restricted hypothesis to be h over any exaple 6 Indeed, if the original instance was sparse, we would have = (1 o(1)).

5 that belongs to the (e, θ) restricted doain, and I don t know (flipping a fair coin) over any other exaple. In particular, we consider the (e, θ)-restricted (+1)-hypothesis predict +1 if the given exaple intersects e on less than θ variables. (e, θ)-restricted ( 1)-hypothesis predict 1 if the given exaple intersects e on less than θ variables. (e, θ)-restricted x i -hypothesis predict +1 if the given exaple intersects e on less than θ variables and has x i = 1. We call these n + 2 restricted hypotheses the (e, θ)-restricted base hypotheses. Our weak-learning algorith enuerates over all pairs of (e, θ), where e is a negative exaple in our training set and θ is an integer between 1 and n. For every such pair, our algorith checks whether any of the n + 2 restricted hypothesis is a Ω( OPT r )-weak-learner (see Algorith 1 below). Our next lea proves that for (B, α, )-sparse instances, this algorith indeed finds a weak-learner. In fact, we show that for every negative exaple e, it suffices to consider a particular value of θ. Algorith 1 A weak learner for sparse instances. Input: A (B, α, ) sparse instance. Step 1: For every negative exaple e in the set and every θ {1, 2,..., n} Step 1a: Check if any of the (e, θ)-restricted hypotheses fro Definition 3 is a weak learner with error at ost 1 2 Ω(n 2 ). Step 1b: If Yes, then output the corresponding hypothesis and halt. Step 2: If no restricted hypothesis is a weak learner, output failure. Lea 4 Suppose we are given a (B, α, )-sparse instance, and that c opt akes no ore than a n ( 1 3 +α) fraction of errors on the negative exaples. Then there exists a bad negative exaple e and a threshold θ such that one of the (e, θ)-restricted base hypotheses entioned in Definition 3 has error at ost 1/2 γ for γ = Ω( OPT r ). Since we ay assue OPT > 1/ n, this iplies γ = Ω(n 2 ). Thus Algorith 1 outputs a hypothesis of error at ost 1 2 Ω(n 2 ). Proof: Let + and be the nuber of positive and negative exaples in this sparse instance, where we reserve to refer to the size of the original dataset of which this sparse instance is a subset. As before, call exaples good if they are classified correctly by c opt, else call the bad. We know B = O(OPT), because relevant variables have no ore than O(OPT) occurrences of 1 over the negative exaples. Since each good positive exaple has to have at least one relevant variable set to 1, it ust also hold that B = Ω( /r). It follows that ropt = Ω( /). We now show how to find a weak learner given a (B, α, )- sparse instance, based on a bad negative exaple. Consider any bad negative exaple e i with t i variables set to 1. If we su the intersection (i.e. the dotproduct) of e i with each of the positive exaples in the instance, we siply get the total nuber of ones in the positive exaples over these t i variables. As each variable is set to 1 between B/2 and B ties, this su is B t i for soe B [B/2, B]. Therefore, the expected intersection of e i with a rando positive exaple 1 is t + i B. Set θ i = β tib, where β > 1 will be chosen later suitably. Throw out any exaple which has + ore than θ i intersection with e i. Using Markov s inequality, we deduce that we retain at least + (1 1 β ) positive exaples. The key point of the above is that focusing on the exaples that reain, none of the can contribute ore than θ i hinge-loss (attribute errors), restricting c opt to the t i variables set to 1 by e i. On the other hand, it is possible that the nuber of actual errors over positives has increased substantially: perhaps too few of the reaining positive exaples share relevant variables with e i in order for any of the (e i, θ i ) restricted hypotheses to be a weak learner. We now argue that this cannot happen siultaneously for all e i. Specifically, assue for contradiction that none of the (e i, θ i )-restricted base hypotheses yields a weak learner. Consider the total nuber of 1s contributed by the reaining negative exaples over the relevant variables of e i (the relevant variables that are set to 1 by e i ). As each bad negative contributes at ost θ i such ones, the overall contribution on the negative side is θ i OPT(1 + o(1)) = β tib OPT(1 + o(1)). + Since none of relevant variables set to 1 by e i gives a weak learner, it holds that the nuber of 1s over the positive side of these relevant variables is no ore than 2β t + i B OPT (see below, at the specification of the value of γ). So even if each occurrence of 1 coes fro a unique positive exaple, we still have no ore

6 than 2β t + i B OPT positive exaples fro the (e i, θ i ) restricted doain intersecting e i over the relevant variables. Therefore, adding back in the positive exaples not fro the restricted doain, we have no ore than 2β t + i B OPT + + /β positive exaples that intersect e i over the relevant variables. Consider now a bipartite graph with the good positive exaples on one side and the OPT bad negative exaples on the other side, with an edge between positive e j and negative e i if e j intersects e i over the relevant variables. Since each e i has degree at ost 2β t + i B OPT+ + /β, the total nuber of edges is at ost 2β BOPT + i t i+ + OPT/β, and therefore soe good positive exaples ust have degree at ost OPT[ 2βB + i t i + + β ]. On the other hand, since we are given a (B, α, )-sparse instance, we know that every good positive exaple intersects at least B(1 o(1)) 2 negative exaples, and oreover that i t i n 1+α BOPT. Putting this together we have: Setting β = ( + ) 2 2B 2 n 1+α OPT B/2 (1 + o(1))opt [ 2βB 2 n 1+α OPT β ]. to equalize the two ters in the su above, we derive B 4 2(1 + o(1))b n(1+α)/2 OPT 3/2. Thus we have n 1+α 2 OPT 3 1+o(1) Recall that / n o(α), so we derive a contradiction, as for sufficiently large n it ust hold that ( ) 1/3 1 + o(1) OPT n 1+α 3 o(α) > n 1/3 α. 32 In order to coplete the proof, we need to verify that indeed β > 1. Recall B = O(OPT) and +, so + / n o(α). Thus β 2 1 = Ω( ) = Ω(n 2α o(α) ) by our assuption on OPT. n 1+α+o(α) OPT 3 The last detail is to check what advantage do we get over a rando guess. Our analysis shows that for soe bad negative exaple e i, the nuber of ones over the relevant variables on the positive side is at least 2β t + i B OPT, whereas on the negative side, there can be at ost β t + i B OPT(1 + o(1)) ones. We deduce that at least one of the at ost in(r, t i ) relevant variables set to 1 by e i ust give a gap of at least β tib OPT(1 o(1)) + in(r,t i) > B OPT(1 o(1)) since β > 1. Finally, using the fact that B = Ω( /r) we get a gap of Ω( OPT r ) or equivalently an advantage of γ = Ω( OPT r ). This advantage is trivially Ω(n 2(1+o(α)) ), or, using the assuption OPT > 1/ n (for otherwise, we can apply Peleg s algorith [14]), we get γ = Ω(n 3 2 (1+o(α)) ). 3.2 General Instances Section 3.1 dealt with nicely behaved (hoogeneous) instances. In order to coplete the proof of Theore 2, we need to show how to reduce a general instance to such a (B, α, )-sparse instance. What we show is a (siple) algorith that partitions a given instance into sub-instances, based on the nuber of 1s of each exaple over certain variables (but without looking at the labels of the exaples). It outputs a polylog(n)-long list of sub-instances, each containing a noticeable fraction of the doain, and has the following guarantee: either soe sub-instance has a trivial weak-learner (has a noticeably different nuber of positive versus negative exaples or there is a variable with noticeable correlation), or soe sub-instance is (B, α, )-sparse. Forally, we prove this next lea. Lea 5 There exists a poly((log n) O(1/α), n, )-tie algorith, that gets as an input 2 labeled exaples in {0, 1} n, and output a list of subsets, each containing /polylog(n) exaples, s.t. either soe subset has a trivial weak-learner, or soe subset is (B, α, /polylog(n))-sparse. Cobining the algorith fro Lea 5 with the algorith presented in Section 3.1, we get our weaklearning algorith (see Algorith 2). We first run the algorith of Lea 5, traverse all sub-instances, and check whether any has a trivial weak-learner. If not, we run the algorith for (B, α, )-sparse instances over each sub-instance. Obviously, given the one sub-instance which is sparse, we find a restricted hypothesis with Ω(n 2 ) advantage over a rando guess. Proof: We start by repeating the arguent presented in the introduction (Section 1.1). For any relevant variable, no ore than bad OPT(1 + o(1)) bad exaples set it to 1. Therefore, as an initial step, we throw out any variable with ore than this any occurrences over the negative exaples, as it cannot

7 possibly be a relevant variable. For convenience, redefine n to be the nuber of variables that reain. Next, we check each individual variable to deterine if it itself is a weak predictor. If not, then this eans each variable is set to 1 on approxiately the sae nuber of positive and negative exaples. Bucket all the variables according the nuber of ties they are set to 1, where the j-bucket contains all the variables that are set to 1 any nuber of ties in the range [2 j, 2 j+1 ). Since there are at ost log n buckets, soe bucket j ust cover at least + log n positive exaples, in the sense that the disjunction over the relevant variables in this bucket agrees with at least this any good positives. So now, let B = 2 j+1, let n and r be the total nuber of variables and the nuber of relevant variables in this bucket respextively. As we can ignore all exaples that are identically 0 over the n variables in this bucket, let + (resp. ) be the nuber of positive (resp. negative) exaples covered by the variables in this bucket. Our algorith adds the reaining exaples (over these n variables) as one sub-instance to its list. Let the nuber of these exaples be 2. As before, if the nuber of positive exaples and negative exaples covered by these n variables differ significantly, or if soe variable is a weak learner (with respect to the set of exaples left), then the algorith halts. Observe that if this sub-instance is (B, α, / log(n))-sparse, then we are done, no atter what other sub-instances the algorith will add to its list. Focusing on the reaining exaples, every variable is set to 1 at ost B any ties over the positive exaples, so the total nuber of 1s, over the positive exaples is n B. If indeed the resulting instance is not (B, α, / log(n))-sparse, then the total nuber of 1s over the bad negative exaples is (n ) 1+α (B )OPT. So now, our algorith throws out any exaple with ore than 2n B / variables set to 1, and adds the reaining exaples to the list of sub-instances. By Markov s inequality, we are guaranteed not to reove ore than 1/2 of the positive exaples, so the sub-instance reaining is sufficiently large. As before, if the reaining subset of exaples (over these n variables) has a trivial weak-learner, we are done. Otherwise, the algorith continues recursively over this sub-instance re-buckets and then reoves all exaples with too any variables set to 1. Note, each tie the algorith buckets the variables, it needs to recurse over each bucket that covers at least a 1/ log(n) fraction of the positive exaples. In the worst-case, all of the log(n) buckets cover these any positive exaples, and therefore, the branching factor in each bucketing step is log(n). We now show that the depth of the bucket-and-reove recurrence is no ore than O(1/α). It is easy to see inductively that at the i-th step of the recursion, we retain a fraction of /(logn) i positive exaples. Suppose that by the first i steps, no sub-instance is sparse and no weak-learner is found. Recall, if ropt 1, we have an iediate weak-learner, so it ust hold that in the i-th step, we still retain at least n i = 1/OPT variables. Furtherore, as in the i-th step we did not have a sparse instance, it follows that the bad negative exaples had ore than (n i ) 1+α (B )OPT ones before we threw out exaples. Once we reove dense exaples, they contain no ore than 2(ni)(B ) i negatives that survive each reoval step is no ore than n α i OPT any ones. Thus, the fraction of ones over the bad i. As 1/OPT > n 1/3, this fraction is at ost n α/3 (log n) i < n α/6 (for the first O(1/α) iterations). Hence, after 6/α iterations, soe relevant variable ust be a weak-learner. To coplete the proof, note that we take no ore than (log n) 6/α bucket-and-reove steps. Each such step requires poly(n, ) tie for the bucketing, reoval and checking for weak-learner. We conclude that the run-tie of this algorith is poly((log n) 1/α, n, ). Algorith 2 A weak learner for general instances. Input: A set of 2 training exaples. Step 1: If any individual variable or the constant hypotheses is a weak learner, output it and halt. Step 2: Reove any variable which has ore than 2OPT 1 s over the negative exaples. Step 3: Bucket the reaining variables such that bucket j contains variables with density in [2 j, 2 j+1 ). Step 4: For every bucket which covers at least a log n fraction of the positive exaples Step 4a: Run the algorith for sparse instances on this bucket. If a weak learner is obtained, output it and halt. Step 4b: Let B be the density (2 j+1 ) in this bucket, n be the nuber of variables in the bucket and 2 be the total nuber of exaples with respect to this bucket (ignoring the ones which are identically zero over the n variables). Reove all the exaples which have ore than 2n B / 1 s over this bucket. Repeat steps 1-4 on this new instance.

8 3.3 Strong Learning Given Theore 2, we now prove the ain theore (Theore 1) by plugging the weak-learner into an offthe-shelf boosting algorith for the agnostic case. We use the recent ABoostDI booster of [4], which converts any algorith satisfying Theore 2 into one satisfying Theore 1. The result in [4] gives a boosting technique for (η, γ)-weak learners. In our context an (η, γ)-weak learner is an algorith which with respect to to any distribution D, with high probability, produces a hypothesis of error 1 2 γ, whenever OPT disj 1 2 η. Theore 6 (Feldan [4], Theore 3.5) There exists an algorith ABoostDI that, given a (η, γ)-weak learner, for every distribution D and ǫ > 0, produces, with high probability, a hypothesis h such that err D (h) OPT disj 1 2η + ǫ. Furtherore, the running tie of the algorith is T poly( 1 γ, 1 ǫ ), where T is the running tie of the weak learner. As an iediate corollary, we set η = n 1/3 α and obtain an hypothesis h such that err D (h) 2n 1/3+α OPT + ǫ. This concludes the proof of Theore 1. We note that as an alternative to ABoostDI, one can also use the boosting algorith of Kalai et. al [9], followed by another boosting algorith of Gavinsky [7], to get the result in Theore 1. 4 Future Directions In this paper we have presented an algorith for learning the class of disjunctions in the case that OPT < n (1/3+α), achieving an error rate of O(n 1/3+α OPT) + ǫ. The natural open question is whether one can iprove this bound. For exaple, can one achieve weak agnostic learning for OPT = n 1/4? Or, can one iprove the bounds as a function of the nuber of relevant variables, e.g., aking only a factor O(r 0.9 ) ties ore istakes than the best disjunction? An intriguing open question is whether one can extend this technique for other concept classes. For exaple, consider the class of linear separators over {0, 1} n with weights in {0, 1} (i.e., ajority vote or k of r functions). Here we do not know even how to achieve weak learning for OPT = n The algorith presented in this paper for disjunctions uses the fact that in order for individual variables not to be weak hypotheses theselves, the bad negative exaples ust in soe sense point in the direction of the target vector (they ust have a high dot-product with the target function vector if we view the disjunction as a linear threshold function) to a substantially greater extent than the positive exaples do. E.g., if a typical positive exaple has t relevant variables set to 1, then the typical bad negative exaple ust have t/opt relevant variables set to 1. For the case of ajority-vote functions, the difficulty with this approach is that instead all we can say is that if the positive exaples have r/2 + t relevant variables set to 1, then the typical bad negative exaples should have at least r/2 + t/opt relevant variables set to 1, which ight not be such a distinction in a ultiplicative sense. On a ore general note, our work here uses soewhat non-traditional hypotheses, by using the exaples theselves to define slices of the data (focusing on those exaples with no ore than a certain θ dotproduct with soe given negative exaple). Perhaps this ight be useful for other learning probles. Acknowledgents We would like to thank Aaron Roth and Maria-Florina Balcan for a nuber of helpful discussions. This work was supported in part by NSF grant CCF References [1] Peter Auer and Manfred K. Waruth. Tracking the best disjunction. In Proc. 36th Syposiu on Foundations of Coputer Science, pages , [2] R.D. Carr, S. Doddi, G. Konjevod, and M. Marathe. On the red-blue set cover proble. In Proceedings of the Eleventh Annual ACM-SIAM syposiu on Discrete algoriths (SODA), pages , [3] Vitaly Feldan. Optial hardness results for axiizing agreeents with onoials. SICOMP, 39(2), Extended abstract in CCC [4] Vitaly Feldan. Distribution-specific agnostic boosting. In 1st Syposiu on Innovations in Coputer Science (ICS), pages , [5] Vitaly Feldan, Venkatesan Guruswai, Prasad Raghavendra, and Yi Wu. Agnostic learning of onoials by halfspaces is hard. In FOCS, 2009.

9 [6] Sally Floyd and Manfred Waruth. Saple copression, learnability, and the vapnik-chervonenkis diension. Mach. Learn., 21(3): , [7] Ditry Gavinsky. Optially-sooth adaptive boosting and application to agnostic learning. J. Mach. Learn. Res., 4: , [8] A. Kalai, A. Klivans, Y. Mansour, and R. Servedio. Agnostically learning halfspaces. SIAM Journal on Coputing, 37(6): , [9] Ada Tauan Kalai, Yishay Mansour, and Elad Verbin. On agnostic boosting and parity learning. In STOC 08: Proceedings of the 40th annual ACM syposiu on Theory of coputing, pages , New York, NY, USA, ACM. [10] A. Klivans, R. O Donnell, and R. Servedio. Learning geoetric concepts via gaussian surface area. In FOCS, [11] Ada R. Klivans, Philip M. Long, and Rocco A. Servedio. Learning halfspaces with alicious noise. In ICALP, [12] N. Littlestone. Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow. In Proc. 4th Conference on Coputational Learning Theory, pages , Santa Cruz, California, Morgan Kaufann. [13] Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorith. Machine Learning, 2(4), [14] David Peleg. Approxiation algoriths for the label-coverax and red-blue set cover probles. J. of Discrete Algoriths, 5(1):55 64, 2007.

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

1 Identical Parallel Machines

1 Identical Parallel Machines FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Reliable Agnostic Learning

Reliable Agnostic Learning Reliable Agnostic Learning Ada Tauan Kalai Microsoft Research Cabridge, MA adu@icrosoft.co Varun Kanade Georgia Tech Atlanta, GA varunk@cc.gatech.edu Yishay Mansour Google Research and Tel Aviv University

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M Kakade Microsoft Research and Wharton, U Penn skakade@icrosoftco Varun Kanade SEAS, Harvard University vkanade@fasharvardedu

More information

Note on generating all subsets of a finite set with disjoint unions

Note on generating all subsets of a finite set with disjoint unions Note on generating all subsets of a finite set with disjoint unions David Ellis e-ail: dce27@ca.ac.uk Subitted: Dec 2, 2008; Accepted: May 12, 2009; Published: May 20, 2009 Matheatics Subject Classification:

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

ORIGAMI CONSTRUCTIONS OF RINGS OF INTEGERS OF IMAGINARY QUADRATIC FIELDS

ORIGAMI CONSTRUCTIONS OF RINGS OF INTEGERS OF IMAGINARY QUADRATIC FIELDS #A34 INTEGERS 17 (017) ORIGAMI CONSTRUCTIONS OF RINGS OF INTEGERS OF IMAGINARY QUADRATIC FIELDS Jürgen Kritschgau Departent of Matheatics, Iowa State University, Aes, Iowa jkritsch@iastateedu Adriana Salerno

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation

Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Algorithic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Michael Kearns AT&T Labs Research Murray Hill, New Jersey kearns@research.att.co Dana Ron MIT Cabridge, MA danar@theory.lcs.it.edu

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.

New upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany. New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Collision-based Testers are Optimal for Uniformity and Closeness

Collision-based Testers are Optimal for Uniformity and Closeness Electronic Colloquiu on Coputational Coplexity, Report No. 178 (016) Collision-based Testers are Optial for Unifority and Closeness Ilias Diakonikolas Theis Gouleakis John Peebles Eric Price USC MIT MIT

More information

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers

Ocean 420 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers Ocean 40 Physical Processes in the Ocean Project 1: Hydrostatic Balance, Advection and Diffusion Answers 1. Hydrostatic Balance a) Set all of the levels on one of the coluns to the lowest possible density.

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

I. Understand get a conceptual grasp of the problem

I. Understand get a conceptual grasp of the problem MASSACHUSETTS INSTITUTE OF TECHNOLOGY Departent o Physics Physics 81T Fall Ter 4 Class Proble 1: Solution Proble 1 A car is driving at a constant but unknown velocity,, on a straightaway A otorcycle is

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.

A := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint. 59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

arxiv: v1 [math.nt] 14 Sep 2014

arxiv: v1 [math.nt] 14 Sep 2014 ROTATION REMAINDERS P. JAMESON GRABER, WASHINGTON AND LEE UNIVERSITY 08 arxiv:1409.411v1 [ath.nt] 14 Sep 014 Abstract. We study properties of an array of nubers, called the triangle, in which each row

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6

More information

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool

Constant-Space String-Matching. in Sublinear Average Time. (Extended Abstract) Wojciech Rytter z. Warsaw University. and. University of Liverpool Constant-Space String-Matching in Sublinear Average Tie (Extended Abstract) Maxie Crocheore Universite de Marne-la-Vallee Leszek Gasieniec y Max-Planck Institut fur Inforatik Wojciech Rytter z Warsaw University

More information

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

4 = (0.02) 3 13, = 0.25 because = 25. Simi- Theore. Let b and be integers greater than. If = (. a a 2 a i ) b,then for any t N, in base (b + t), the fraction has the digital representation = (. a a 2 a i ) b+t, where a i = a i + tk i with k i =

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

Math Reviews classifications (2000): Primary 54F05; Secondary 54D20, 54D65

Math Reviews classifications (2000): Primary 54F05; Secondary 54D20, 54D65 The Monotone Lindelöf Property and Separability in Ordered Spaces by H. Bennett, Texas Tech University, Lubbock, TX 79409 D. Lutzer, College of Willia and Mary, Williasburg, VA 23187-8795 M. Matveev, Irvine,

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

On Process Complexity

On Process Complexity On Process Coplexity Ada R. Day School of Matheatics, Statistics and Coputer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand, Eail: ada.day@cs.vuw.ac.nz Abstract Process

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

time time δ jobs jobs

time time δ jobs jobs Approxiating Total Flow Tie on Parallel Machines Stefano Leonardi Danny Raz y Abstract We consider the proble of optiizing the total ow tie of a strea of jobs that are released over tie in a ultiprocessor

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Midterm 1 Sample Solution

Midterm 1 Sample Solution Midter 1 Saple Solution NOTE: Throughout the exa a siple graph is an undirected, unweighted graph with no ultiple edges (i.e., no exact repeats of the sae edge) and no self-loops (i.e., no edges fro a

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

The Frequent Paucity of Trivial Strings

The Frequent Paucity of Trivial Strings The Frequent Paucity of Trivial Strings Jack H. Lutz Departent of Coputer Science Iowa State University Aes, IA 50011, USA lutz@cs.iastate.edu Abstract A 1976 theore of Chaitin can be used to show that

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information