Learnability and Stability in the General Learning Setting
|
|
- Erik Rogers
- 5 years ago
- Views:
Transcription
1 Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago Ohad Shair The Hebrew University Nathan Srebro TTI-Chicago Karthik Sridharan TTI-Chicago Abstract We establish that stability is necessary and sufficient for learning, even in the General Learning Setting where unifor convergence conditions are not necessary for learning, and where learning ight only be possible with a non-erm learning rule. This goes beyond previous work on the relationship between stability and learnability, which focused on supervised classification and regression, where learnability is equivalent to unifor convergence and it is enough to consider the ERM. Introduction We consider the General Setting of Learning 0 where we would like to iniize a population risk functional (stochastic objective) F(h) = E Z D f(h; Z) () where the distribution D of Z is unknown, based on i.i.d. saple z,..., z drawn fro D (and full knowledge of the function f). This General Setting subsues supervised classification and regression, certain unsupervised learning probles, density estiation and stochastic optiization. For exaple, in supervised learning z = (x, y) is an instance-label pair, h is a predictor, and f(h, (x, y)) = loss(h(x), y) is the loss functional. For supervised classification and regression probles, it is well known that a proble is learnable (see precise definition in Section 2) if and only if the epirical risks F S (h) = f(h, z i ) (2) converge uniforly to their expectations. If unifor convergence holds, then the epirical risk iniizer (ERM) is consistent, i.e. the population risk of the ERM converges to the optial population risk, and the proble is learnable using the ERM. That is, learnability is equivalent to learnability by ERM, and so we can focus our attention solely on epirical risk iniizers. Stability has also been suggested as an explicit alternate condition for learnability. Intuitively, stability notions focus on particular algoriths, or learning rules, and easure their sensitivity to perturbations in the training set. In particular, it has been established that various fors of stability of the ERM are sufficient for learnability. Mukherjee et al 7 argue that since unifor convergence also iplies stability of the ERM, and is necessary for (distribution independent) learning in the supervised classification and regression setting, then stability of the ERM is necessary and sufficient for learnability in the supervised classification and regression setting. It is iportant to ephasize that this characterization of stability as necessary for learnability goes through unifor convergence, i.e. the chain of iplications is: ERM Stable with ERM Unifor Convergence However, the equivalence between (distribution independent) consistency of epirical risk iniization and unifor convergence is specific to supervised classification and regression. The results of Alon et al establishing this equivalence do not always hold in the General Learning Setting. In particular, we recently showed that in strongly convex stochastic optiization probles, the ERM is stable and thus consistent, even though the epirical risks do not converge to their expectations uniforly (Exaple 7., taken fro 9). Since the other iplications in the chain above still hold in the general learning setting (e.g., unifor convergence iplies stability and stability iplies learnability by ERM), this exaple deonstrates that stability is a strictly ore general sufficient condition for learnability. A natural question is then whether, in the General Setting, stability is also necessary for learning. Here we establish that indeed, even in the general learning setting, (distribution independent) stability of ERM is necessary and sufficient for (distribution independent) consistency of the ERM. The situation is therefore as follows: Unifor Convergence ERM Stable with ERM We ephasize that, unlike the arguents of Mukherjee et al 7, the proof of necessity does not go through unifor convergence, allowing us to deal also with settings beyond supervised classification and regression. The discussion above concerns only stability and learnability of the ERM. In supervised classification and regression there is no need to go beyond the ERM, since learnability is equivalent to learnability by e-
2 pirical risk iniization. But as we recently showed, there are learning probles in the General Setting which are learnable using soe alternate learning rule, but in which ERM is neither stable nor consistent (Exaple 7.2, taken fro 9). Stability of ERM is therefore a sufficient, but not necessary, condition for learnability: Unifor Convergence ERM Stable with ERM This propts us to study the stability properties of non-erm learning rules. We establish that, even in the General Setting, any consistent and generalizing learning rule (i.e. where in addition to consistency, the epirical risk is also a good estiate of the population risk) ust be asyptotically epirical risk iniizing (AERM, see precise definition in Section 2). We thus focus on such rules and show that also for an AERM, stability is sufficient for consistency and generalization. The converse is a bit weaker for AERMs, though. We show that a strict notion of stability, which is necessary for ERM consistency, is not necessary for AERM consistency, and instead suggest a weaker notion (averaging out fluctuations across rando training sets) that is necessary and sufficient for AERM consistency. Noting that any consistent learning rule can be transfored to a consistent and generalizing learning rule, we obtain a sharp characterization of learnability in ters of stability learnability is equivalent to the existence of a stable AERM: Exists Stable AERM with AERM 2 The General Learning Setting A learning proble is specified by a hypothesis doain H, an instance doain Z and an objective function (e.g. loss functional or cost function ) f : H Z R. Throughout this paper we assue the function is bounded by soe constant B, i.e. f(h, z) B for all h H and z Z. A learning rule is a apping A : Z H fro sequences of instances in Z to hypotheses. We refer to sequences S = {z,...,z } as saple sets, but it is iportant to reeber that the order and ultiplicity of instances ay be significant. A learning rule that does not depend on the order is said to be syetric. We will generally consider saples S D of i.i.d. draws fro D. A possible approach to learning is to iniize the epirical risk F S (h). We say that a rule A is an ERM (Epirical Risk Miniizer) if it iniizes the epirical risk F S (A(S)) = F S (ĥ) = in F S(h). (3) h H where we use F S (ĥ) = in h H F S (h) to refer to the iniu epirical risk. But since there ight be several hypotheses iniizing the epirical risk, ĥ does not refer to a specific hypotheses and there ight be any rules which are all ERM. We say that a rule A is an AERM (Asyptotical Epirical Risk Miniizer) with rate ɛ er () under distribution D if: E S D F S (A(S)) F S (ĥs) ɛ er () (4) Here and whenever talking about a rate ɛ(), we require it be onotone decreasing with ɛ() 0. A learning rule is universally an AERM with rate ɛ er (), if it is an AERM with rate ɛ er () under all distributions D over Z. Returning to our goal of iniizing the expected risk, we say a rule A is consistent with rate ɛ cons () under distribution D if for all, E S D F(A(S)) F(h ) ɛ cons (). (5) where we denote F(h ) = inf h H F(h). A rule is universally consistent with rate ɛ cons () if it is consistent with rate ɛ cons () under all distributions D over Z. A proble is said to be learnable if there exists soe universally consistent learning rule for the proble. This definition of learnability, requiring a unifor rate for all distributions, is the relevant notion for studying learnability of a hypothesis class. It is a direct generalization of agnostic PAC-learnability 4 to Vapnik s General Setting of Learning as studied by Haussler 3 and others. We say a rule A generalizes with rate ɛ gen () under distribution D if for all, E S D F(A(S)) F S (A(S)) ɛ gen (). (6) A rule universally generalizes with rate ɛ gen () if it generalizes with rate ɛ gen () under all distributions D over Z. We note that other authors soeties define consistency, and thus also learnable as a cobination of our notions of consistency and generalizing. 3 Stability We define a sequence of progressively weaker notions of stability, all based on leave-one-out validation. For a saple S of size, let S \i = {z,..., z i, z i+,..., z } be a saple of points obtained by deleting the i-th observation of S. All our easures of stability concern the effect deleting z i has on f(h, z i ), where h is the hypotheses returned by the learning rule. That is, all easures consider the agnitude of f(a(s \i ); z i ) f(a(s); z i ). Definition. A rule A is unifor-loo stable with rate ɛ stable () if for all saples S of points and for all i: f(a(s \i ); z i ) f(a(s); z i ) ɛ stable (). Definition 2. A rule A is all-i-loo stable with rate ɛ stable () under distributions D if for all i: f(a(s E \i S D ); z i ) f(a(s); z i ) ɛ stable (). Definition 3. A rule A is LOO stable with rate ɛ stable () under distributions D if E S D f(a(s \i ); z i ) f(a(s); z i ) ɛ stable (). For syetric learning rules, Definitions 2 and 3 are equivalent. Exaple 7.5 shows that the syetry assuption is necessary, and the two definitions are not equivalent for non-syetric learning rules. Our weakest notion of stability, which we show is still enough to ensure learnability, is:
3 Definition 4. A rule A is on-average-loo stable with rate ɛ stable () under distributions D if E S D f(a(s \i ); z i ) f(a(s); z i ) ɛ stable (). We say that a rule is universally stable with rate ɛ stable (), if the stability property holds with rate ɛ stable () for all distributions. Clai 3.. Unifor-LOO stability with rate ɛ stable () iplies all-i-loo stability with rate ɛ stable (), which iplies LOO stability with rate ɛ stable (), which iplies onaverage-loo stability with rate ɛ stable (). Relationship to Other Notions of Stability Many different notions of stability, soe under ultiple naes, have been suggested in the literature. In particular, our notion of all-i-loo stability has been studied by several authors under different naes: pointwise-hypothesis stability 2, CV loo stability 7, and cross-validation-(deletion) stability 8. All are equivalent, though the rate is soeties defined differently. Other authors define stability with respect to replacing, rather then deleting, an observation. E.g. CV stability 5 and cross-validation-(replaceent) 8 are analogous to all-i- LOO stability and average stability 8 is analogous to average-loo stability for syetric learning rules. In general the deletion and replaceent variants of stability are incoparable in Appendix A we briefly discuss how the results in this paper change if replaceent stability is used. A uch stronger notion is unifor stability 2, which is strictly stronger than any of our notions, and is sufficient for tight generalization bounds. However, this notion is far fro necessary for learnability (5 and Exaple 7.3 below). In the context of syetric learning rules, all-i-loo stability and LOO stability are equivalent. In order to treat nonsyetric rules ore easily, we prefer working with LOO stability. For an elaborate discussion of the relationships between different notions of stability, see 5. 4 Main Results We first establish that existence of a stable AERM is sufficient for learning: Theore 4.. If a rule is an AERM with rate ɛ er () and stable (under any of our definitions) with rate ɛ stable () under D, then it is consistent and generalizes under D with rates ɛ cons () 3ɛ er () + ɛ stable ( + ) + 2B + ɛ gen () 4ɛ er () + ɛ stable ( + ) + 6B Corollary 4.2. If a rule is universally an AERM and stable then it is universally consistent and generalizing. Seeking a converse to the above, we first note that it is not possible to obtain a converse for each distribution D separately, i.e. to Theore 4.. In Exaple 7.6, we show a specific learning proble and distribution D in which the ERM (in fact, any AERM) is consistent, but not stable, even under our weakest notion of stability. However, we are able to obtain a converse to Corollary 4.2. That is, establish that a universally consistent ERM, or even AERM, ust also be stable. For exact ERMs we have: Theore 4.3. For an ERM the following are equivalent: Universal LOO stability. Universal consistency. Universal generalization. Recall that for a syetric rule, LOO stability and all-i- LOO stability are equivalent, and so consistency or generalization of a syetric ERM (the typical case) also iply all-i-loo stability. Theore 4.3 only guarantees LOO stability as a necessary condition for consistency. Exaple 7.3 (adapted fro 5) establishes that we cannot strengthen the condition to unifor-loo stability, or any stronger definition: there exists a learning proble for which the ERM is universally consistent, but not unifor-loo stable. For AERMs, we obtain a weaker converse, ensuring only on-average-loo stability: Theore 4.4. For an AERM, the following are equivalent: Universal on-average-loo stability. Universal consistency. Universal generalization. On-average-LOO stability is strictly weaker then LOO stability, but this is the best that can be assured. In Exaple 7.4 we present a learning proble and an AERM that is universally consistent, but is not LOO stable. The exact rate conversions of Theores 4.3 and 4.4 are specified in the corresponding proofs (Section 6), and are all polynoial. In particular, an ɛ cons -universal consistent ɛ er - AERM is on-average-loo stable with rate ɛ stable () 3ɛ er ( ) + 3ɛ cons (( ) /4 ) + 6B. The above results apply only to AERMs, for which we also see that universal consistency and generalization are equivalent. Next we show that if in fact we seek universal consistency and generalization, then we ust consider only AERMs: Theore 4.5. If a rule A is universally consistent with rate ɛ cons () and generalizing with rate ɛ gen (), then it is universally an AERM with rate ɛ er () ɛ gen () + 3ɛ cons ( /4 ) + 4B Cobining theores 4.4 and 4.5, we get that the existence of a universally on-average-loo stable AERM is a necessary (and sufficient) condition for existence of soe universally consistent and generalizing rule. As we show in Exaple 7.7, there ight still be a universally consistent learning rule (hence the proble is learnable by our definition) that is not stable even by our weakest definition (and is not an AERM nor generalizing). Nevertheless, any universally consistent learning rule can be transfored into a universally consistent and generalizing learning rule (Lea 6.). Thus by Theores 4.5 and 4.4 this rule ust also be a stable AERM, establishing:
4 Theore 4.6. A learning proble is learnable if and only if there exists a universally on-average-loo stable AERM. In particular, if there exists a ɛ cons -universally consistent rule, then there exists a rule that is ɛ stable -on-average-loo stable and ɛ er -AERM where: ɛ er () = 3ɛ cons ( /4 ) + 7B, (7) ɛ stable () = 6ɛ cons (( ) /4 ) + 9B 5 Coparison with Prior Work 5. Theore 4. and Corollary 4.2 The consistency iplication in Theore 4. (specifically, that all-loo stability of an AERM iplies consistency) was established by Mukherjee et al 7, Theore 3.5. As for the generalization guarantee, Rakhlin et al 8 prove that for ERM, average-loo stability is equivalent to generalization. For ore general learning rules, 2 attepted to show that all-i-loo stability iplies generalization. However, Mukherjee et al 7 (in reark 3, pg. 73) provide a siple counterexaple and note that the proof of 2 is wrong, and in fact all-i-loo stability alone is not enough to ensure generalization. To correct this, Mukherjee et al 7 introduced an additional condition, referred to as Eloo err stability, which together with all-i-loo stability ensures generalization. For AERMs, they use arguents specific to supervised learning, arguing that universal consistency iplies unifor convergence, and establish generalization only via this route. And so, Mukherjee et al obtain a version of Corollary 4.2 that is specific to supervised learning. In suary, coparing Theore 4. to previous work, our results extend the generalization guarantee also to AERMs in the general learning setting. Rakhlin et al 8 also show that the replaceent (rather then deletion) version stability iplies generalization for any rule (even non-aerm), and hence consistency for AERMs. Recall that the deletion and replaceent version are not equivalent. We are not aware of strong converses for the replaceent variant. 5.2 Converse Results Mukherjee et al 7 argue that all-i-loo stability of the ERM (in fact, of any AERM) is also necessary for ERM universal consistency and thus learnability. However, their arguents are specific to supervised learning, and establish stability only via unifor convergence of F S (h) to F(h), as discussed in the introduction. As we now know, in the general learning setting, ERM consistency is not equivalent to this unifor convergence, and furtherore, there ight be a non-erm universally consistent rule even though the ERM is not universally consistent. Therefore, our results here apply to the general learning setting and do not use unifor convergence arguents. For an ERM, Rakhlin et al 8 show that generalization is equivalent to on-average-loo stability, for any distribution and without resorting to unifor convergence arguents. This provides a partial converse to Theore 4.. However, our results extend to AERM s as well and ore iportantly, provide a converse to AERM consistency rather than just generalization. This distinction between consistency and generalization is iportant, as there are situations with consistent but not stable AERM s (Exaple 7.6), or even universally consistent learning rules which are not stable, generalizing nor AERM s (Exaple 7.7). Another converse result that does not use unifor convergence arguents, but is specific only to the realizable binary learning setting was given by Kutin and Niyogi 5. They show that in this setting, for any distribution D, all-i- LOO stability of the ERM under D is necessary for ERM consistency under D. This is a uch stronger for of converse as it applies to any specific distribution separately, rather then requiring universal consistency. However, not only is it specific to supervised learning, but further requires the distribution be realizable (i.e. zero error is achievable). As we show in Exaple 7.6, a distribution-specific converse is not possible in the general setting. All the papers cited above focus on syetric learning rules where all-i-loo stability is equivalent to LOO stability. We prefer not to liit our attention to syetric rules, and instead use LOO stability. 6 Detailed Results and Proofs We first establish that for AERMs, on-average-loo stability and generalization are equivalent, and that for ERMs the equivalence extends also to LOO stability. This extends the work of Rakhlin et al 8 fro ERMs to AERMs, and with soewhat better rate conversions. 6. Equivalence of Stability and Generalization It will be convenient to work with a weaker version of generalization as an interediate step: We say a rule A onaverage generalizes with rate ɛ oag () under distribution D if for all, E S D F(A(S)) F S (A(S)) ɛ oag (). (8) It is straightforward to see that generalization iplies onaverage generalization with the sae rate. We show that for AERMs, the converse is also true, and also that on-average generalization is equivalent to on-average stability, establishing the equivalence between generalization and on-average stability (for AERMs). Lea 6. (For AERMs: on-average generalization on-average stability). Let A be AERM with rate ɛ er () under D. If A is on-average generalizing with rate ɛ oag () then it is on-average LOO stable with rate ɛ oag ( ) + 2ɛ er ( )+2B/. If A is on-average LOO stable with rate ɛ stable () then it is on-average generalizing with rate ɛ stable ( + ) + 2ɛ er () + 2B/. Proof. For the ERMs of S and S \i we have F S \i(ĥs \i) F S(ĥS) 2B, and so since A is AERM: FS E (A(S)) F S \i(a(s \i )) 2ɛ er ( )+ 2B (9)
5 generalization stability Applying (8) to S \i and cobining with (9) we have E F(A(S \i )) F S (A(S)) ɛ oag ( ) + 2ɛ er ( ) + 2B/, which does not actually depend on i, hence: ɛ oag ( ) + 2ɛ er ( ) + 2B/ E F(A(S \i )) F S (A(S)) (0) = E i E F(A(S \i )) F S (A(S)) = E S \i,z i E i f(a(s \i ), z i ) E S f(a(s), z i ) = E E i f(a(s \i ), z i ) f(a(s), z i ) () which establishes on-average stability. stability generalization Bounding () by ɛ stable () and working back we get that (0) is also bounded by ɛ stable (). Cobined with (9) we get E F(A(S \i )) F S \i(a(s \i )) ɛ stable () + 2ɛ oag ( ) + 2B/ which establishes on-average generalization. Lea 6.2 (AERM + on-average generalization generalization). If A is an AERM with rate ɛ er () and onaverage generalizes with rate ɛ oag () under D, then A generalizes with rate ɛ oag () + 2ɛ er () + 2B under D. Proof. Using respective optialities of ĥs and h we can bound: F S (A(S)) F(A(S)) = F S (A(S)) F S (ĥs) + F S (ĥs) F S (h ) + F S (h ) F(h ) + F(h ) F(A(S)) F S (A(S)) F S (ĥs) + F S (h ) F(h ) = Y (2) Where the final equality defines a new rando variable Y. By Lea 6.3 and the AERM guarantee we have E Y ɛ er () + B/ ). Fro Lea 6.4 we can conclude that E F S (A(S)) F(A(S)) EF S (A(S)) F(A(S)) + 2E Y ɛ oag () + 2ɛ er () + 2B. Utility Lea 6.3. For i.i.d. X i, X i B and X = X i we have E X EX B/. Proof. E X EX VarX = VarX i / B/. Utility Lea 6.4. Let X, Y be rando variables s.t. X Y alost surely. Then E X EX + 2E Y. Proof. Denote a + = ax(0, a) and observe that X Y iplies X + Y + (this holds when both have the sae sign, and when X 0 Y, while Y < 0 < X is not possible). We therefor have EX + EY + E Y. Also note that X = 2X + X. We can now calculate: E X = E2X + X = 2EX + EX 2E Y + EX. For exact ERM, we get a stronger equivalence: Lea 6.5 (ERM+on-average-LOO LOO stable). If an exact ERM A is on-average-loo stable with rate ɛ stable () under D, then it is also LOO stable under D with the sae rate. Proof. By optiality of ĥs = A(S): f(ĥs \i, z i) f(ĥs, z i ) = F S (ĥs \i) F S(ĥS) Then using on-average-loo stability: = + F S \i(ĥs) F S \i(ĥs\i) 0. (3) f( E ĥ S \i, z i ) f(ĥs, z i ) ( ) E f(ĥs \i, z i) f(ĥs, z i ) ɛ stable () Lea 6.5 can be extended also to AERMs with rate o( n ). However, for AERMs with a slower rate, or at least with rate Ω( n ), Exaple 7.4 establishes that this stronger converse is not possible. We have now established the stability generalization parts of Theores 4., 4.3 and 4.4 (in fact, even a slightly stronger converse than in Theores 4.3 and 4.4, as it does not require universality). 6.2 A Sufficient Condition for Consistency It is also fairly straightforward to see that generalization (or even on-average generalization) of an AERM iplies its consistency: Lea 6.6 (AERM+generalization consistency). If A is AERM with rate ɛ er () and it on-average generalizes with rate ɛ oag () under D then it is consistent with rate ɛ oag () + ɛ er () under D. Proof. EF(A(S)) F(h ) = EF(A(S)) F S (h ) = EF(A(S)) F S (A(S)) + EF S (A(S)) F S (h ) EF(A(S)) F S (A(S)) + E F S (A(S)) F S (ĥs) ɛ gen () + ɛ er () Cobined with the results of Section 6., this copletes the proof of Theore 4. and the stability consistency and generalization consistency parts of Theores 4.3 and Converse Direction Lea 6. already provides a converse result, establishing that stability is necessary for generalization. However, in order to establish that stability is also necessary for universal consistency we ust prove that universal consistency of an AERM iplies universal generalization. Note that consistency under a specific distribution for an AERM does not iply generalization nor stability (Exaple 7.6). We ust instead rely on universal consistency. The ain tool we use is the following lea:
6 Lea 6.7 (Main Converse Lea). If a proble is learnable, i.e. there exists a universally consistent rule A with rate ɛ cons (), then under any distribution, FS E (ĥs) F(h ) ɛ ep () where ɛ ep () = 2ɛ cons ( ) + 2B + 2B 2 for any sequence is such that and = o( ). Proof. Let I = {I,...,I } be a rando saple of indexes in the range.. where each I i is independently uniforly distributed, and I is independent of S. Let S = {z Ii }, i.e. a saple of size drawn fro the unifor distribution over saples in S (with replaceents). We first bound the probability that I has no repeated indexes ( duplicates ): (i ) Pr(I has duplicates) (4) 2 Conditioned on not having duplicates in I, the saple S is actually distributed according to D, i.e. can be viewed as a saple fro the original distribution. We therefor have by universal consistency: 2 E F(A(S )) F(h ) no dups ɛ cons ( ) (5) But viewed as a saple drawn fro the unifor distribution over instances in S, we also have: FS E S (A(S )) F S (ĥs) ɛ cons ( ) (6) Conditioned on having no duplications in I, S \S (i.e. those saples in S not chosen by I) is independent of S, and S \ S =, and so by Lea 6.3: E S F(A(S )) F S\S (A(S )) B (7) Finally, if there are no duplicates, then for any hypothesis, and in particular for A(S ) we have: F S (A(S )) F S\S (A(S )) 2B (8) Cobining (5),(6),(7) and (8), accounting for a axial discrepancy of B when we do have duplicates, and assuing 2 /2, we get the desired bound. In the supervised learning setting, Lea 6.7 is just an iediate consequence of learnability being equivalent to consistency and generalization of the ERM. However, the Lea applies also in the General Setting, where universal consistency ight be achieved only by a non-erm. The Lea states that if a proble is learnable, even though the ERM ight not be consistent (as in, e.g. Exaple 7.2), the epirical error achieved by the ERM is in fact an asyptotically unbiased estiator of F(h ). Equipped with Lea 6.7, we are now ready to show that universal consistency of an AERM iplies generalization and that any universally consistent and generalizing rule ust be an AERM. What we show is actually a bit stronger: that if a proble is learnable, and so Lea 6.7 holds, then for any distribution D separately, consistency of an AERM under D iplies generalization under D and also any consistent and generalizing rule under D ust be an AERM. Lea 6.8 (learnable+aerm+consistent generalizing). If Lea 6.7 holds with rate ɛ ep (), and A is an ɛ er - AERM and ɛ cons -consistent under D, then it is generalizing under D with rate ɛ ep () + ɛ er () + ɛ cons (). Proof. FS E F S (A(S)) F(A(S)) E (A(S)) F S (ĥs) + E F(h FS ) F(A(S)) + E (ĥs) F(h ) ɛ er () + ɛ cons () + ɛ ep () Lea 6.9 (learnable+consistent+generalizing AERM). If Lea 6.7 holds with rate ɛ ep (), and A is ɛ cons - consistent and ɛ gen -generalizing under D, then it is AERM under D with rate ɛ ep () + ɛ gen () + ɛ cons (). Proof. FS E (A(S)) F S (ĥs) E F S (A(S)) F(A(S)) + E F(A(S)) F(h F(h ) + E ) F S (ĥs) ɛ gen () + ɛ cons () + ɛ ep () Lea 6.8 establishes that universal consistency of an AERM iplies universal generalization, and thus copletes the proof of Theores 4.3 and 4.4. Lea 6.9 establishes Theore 4.5. To get the rates in 4, we use = /4 in Lea 6.7. Leas 6.6, 6.8 and 6.9 together establish an interesting relationship: Corollary 6.0. For a (universally) learnable proble, for any distribution D and learning rule A, any two of the following iply the third : A is an AERM under D. A is consistent under D. A generalizes under D. Note, however, that any one property by itself is possible, even universally: - The ERM in Exaple 7.2 is neither consistent nor generalizing, despite the proble being learnable. - Exaple 7.7 deonstrates a universally consistent learning rule which is neither generalizing nor an AERM. - A rule returning a fixed hypothesis always generalizes, but of course need not be consistent nor an AERM. In contrast, for learnable supervised classification and regression probles, it is not possible for a learning rule to be just universally consistent, without being an AERM and without generalization. Nor is it possible for a learning rule to be a universal AERM for a learnable proble, without being generalizing and consistent. Corollary 6.0 can also provide a certificate of nonlearnability. E.g. for the proble in Exaple 7.6 we show a specific distribution for which there is a consistent AERM that does not generalize. We can conclude that there is no universally consistent learning rule for the proble, otherwise the corollary is violated.
7 6.4 Existence of a Stable Rule Theores 4.5 and 4.4, which we just copleted proving, already establish that for AERMs, universal consistency is equivalent to universal on-average-loo stability. Existence of a universally on-average-loo stable AERM is thus sufficient for learnability. In order to prove that it is also necessary, it is enough to show that existence of a universally consistent learning rule iplies existence of a universally consistent AERM. This AERM ust then be on-average-loo stable by Theore 4.4. We actually show how to transfor a consistent rule to a consistent and generalizing rule. If this rule is universally consistent, then by Lea 6.9 we can then conclude it ust an AERM, and by 6. that it ust be on-average-loo stable. Lea 6.. For any rule A there exists a rule A, such that: A universally generalizes with rate 3B. For any D, if A is ɛ cons -consistent under D then A is ɛ cons ( ) consistent under D. Proof. For a saple S of size, let S be a sub-saple consisting of the first observation in S. Define A (S) = A(S ). That is, A applies A to only of the observation in S. A generalizes: We can decopose: F S (A(S )) F(A(S )) = (F S (A(S )) F(A(S ))) + ( )(F S\S (A(S )) F(A(S ))) The first ter can be bounded by 2B/. As for the second ter, S \ S is statistically independent of S and so we can use Lea 6.3 to bound its expected agnitude to obtain: E F S (A(S )) F(A(S )) 2B + ( ) B 3B A is consistent: If A is consistent, then: E F(A (S)) inf F(h) h H E F(A(S )) inf F(h) ɛ cons ( ) h H (9) Proof of Converse in Theore 4.6 If there exists a universally consistent rule with rate ɛ cons (), by Lea 6. there exists A which is universally consistent and generalizing. Choosing = /4 in Lea 6.7 and applying Leas 6.9 and 6. we get the rates specified in (7). Reark We can strengthen the above theore to show existence of an on-average-loo stable, always AERM (ie. a rule which for every saple approxiately iniizes F S (h)). The new learning rule for this purpose chooses the hypothesis returned by the original rule whenever epirical risk is sall and chooses an ERM otherwise. The proof is copleted via Markov inequality to bound the probability that we don t choose the hypothesis returned by the original learning rule. 7 Exaples Our first exaple (taken fro 9) shows that unifor convergence is not necessary for ERM consistency. I.e. universal ERM consistency holds without unifor convergence. Of course, this can also happen in trivial settings where there is one hypothesis h 0 which doinates all other hypothesis (i.e. f(h 0, z) < f(h, z) for all z and all h h 0 ) 0. However, the exaple below deonstrates a non-trivial situation with ERM universal consistency but no unifor convergence: there is no doinating hypothesis, and finding the optial hypothesis does require learning. In particular, unlike trivial probles with a doinating hypothesis, in the exaple below there is not even local unifor convergence. I.e. there is no unifor convergence even aong hypotheses that are close to being population optial. Exaple 7.. There exists a learning proble for which any ERM is universally consistent, but the epirical risks do not converge uniforly to their expectations. Proof. Consider a convex stochastic optiization proble given by: f(w; (x, α)) = α (w x) + w 2 = α 2 i(wi xi) 2 + w 2, i where w is the hypothesis, w,x are eleents in a unit ball around the origin of a Hilbert space with a countably infinite orthonoral basis e,e 2,..., and α is an infinite binary sequence. αi is the i-th coordinate of α, wi := w,e i, and xi is defined siilarly. In our other subission 9, we show that the ERM is stable, hence consistent. However, when x = 0 a.s. and α is i.i.d. unifor, there is no unifor convergence, not even locally. To see why, note that for a rando saple S of any finite size, with probability one there exists an excluded basis vector e j such that α i j = 0 for all (x i, α i ) S. For any t > 0, we have F(te j ) F S (te j ) t 2, regardless of the saple size. Setting t = establishes sup w F(w) F S (w) even as, and so there is no unifor convergence. Choosing t arbitrarily sall, we see that even when F(te j ) is close to optial, the deviations F(w) F S (w) still do not converge to zero as. Perhaps ore surprisingly, the next exaple (also taken fro 9) shows that in the general setting, learnability ight require using a non-erm. Exaple 7.2. There exists a learning proble with a universally consistent learning rule, but for which no ERM is universally consistent. Proof. Consider the sae hypothesis space and saple space as before, with: f(w, z) = α (w x) 2 + ɛ 2 2 i (w i ) 2, where ɛ = 0.0. When x = 0 a.s. and α is i.i.d. unifor, then the ERM ust have ŵ =. To see why, note that
8 for an excluded e j (which exists a.s.) increasing wj towards one decreases the objective. But since ŵ =, we have F(ŵ) /2, while inf w F(w) F(0) = ɛ, and so F(ŵ) inf w F(w). On the other hand, A(S) = arg inf S (w)+ 20 w 2 is a uniforly-loo stable AERM and hence by Theore 4. universally consistent. In the next three exaples, we show that in a certain sense, Theore 4.3 and Theore 4.4 cannot be iproved with stronger stability notions. Viewed differently, they also constitute separation results between our various stability notions, and show which are strictly stronger than the other. Exaple 7.4 also deonstrates the gap between supervised learning and a general learning setting, by presenting a learning proble and an AERM that is universally consistent, but not LOO stable. Exaple 7.3. There exists a learning proble with a universally consistent and all-i-loo stable learning rule, but there is no universally consistent and unifor LOO stable learning rule. Proof. This exaple is taken fro 5. Consider the hypothesis space {0, }, the instance space {0, }, and the objective function f(h, z) = h z. It is straightforward to verify that an ERM is a universally consistent learning rule. It is also universally all-i-loo stable, because reoving an instance can change the hypothesis only if the original saple had an equal nuber of 0 s and s (plus or inus one), which happens with probability at ost O(/ ) where is the saple size. However, it is not hard to see that the only unifor LOO stable learning rule, at least for large enough saple sizes, is a constant rule which always returns the sae hypothesis h regardless of the saple. Such a learning rule is obviously not universally consistent. Exaple 7.4. There exists a learning proble with a universally consistent (and average-loo stable) AERM, which is not LOO stable. Proof. Let the instance space, hypothesis space and objective function be as in Exaple 7.3. Consider the following learning rule, based on a saple S = (z,..., z ): if i {z}/ > /2 + log(4)/2, return. If i {z}/ < /2 log(4)/2, return 0. Otherwise, return Parity(S) = (z +...z ) od 2. This learning rule is an AERM, with ɛ er () = 2 log(4)/. Since we have only two hypotheses, we have unifor convergence of F S ( ) to F( ) for any hypothesis. Therefore, our learning rule universally generalizes (with rate ɛ gen () = log(4/δ)/2), and by Theore 4.4, this iplies that the learning rule is also universally consistent and average-loo stable. However, the learning rule is not LOO stable. Consider the unifor distribution on the instance space. By Hoeffding s inequality, i {z}/ /2 log(4)/2 with probability at least /2 for any saple size. In that case, the returned hypothesis is the parity function (even when we reove an instance fro the saple, assuing 3). When this happens, it is not hard to see that for any i, f(a(s), z i ) f(a(s \i ), z i ) = {z}( ) Parity(S). This iplies that ( ) E f(a(s \i ); z i ) f(a(s); z i ) (20) 2 E log(4) {z} 2 {z} 2 ( ) log(4) , which does not converge to zero with the saple size. Therefore, the learning rule is not LOO stable. Note that the proof iplies that average-loo stability cannot be replaced even by weaker stability notions than LOO stability. For instance, a natural stability notion interediate between average-loo stability and LOO stability is ( ) E S D f(a(s \i ); z i ) f(a(s); z i ), (2) where the absolute value is now over the entire su, but inside the expectation. In the exaple used in the proof, (2) is still lower bounded by (20), which does not converge to zero with the saple size. Exaple 7.5. There exists a learning proble with a universally consistent and LOO-stable AERM, which is not syetric and is not all-i-loo stable. Proof. Let the instance space be 0,, the hypothesis space 0, 2, and the objective function f(h, z) = {h=z}. Consider the following learning rule A: given a saple, check if the value z appears ore than once in the saple. If no, return z, otherwise return 2. Since F S (2) = 0, and z returns only if this value constitutes / of the saple, the rule above is an AERM with rate ɛ er () = /. To see universal consistency, let Pr(z ) = p. With probability ( p) 2, z / {z 2,...,z }, and the returned hypothesis is z, with F(z ) = p. Otherwise, the returned hypothesis is 2, with F(2) = 0. Hence E S F(A(S)) p( p) 2, which can be easily verified to be at ost /( ), so the learning rule is consistent with rate ɛ cons () /( ). To see LOO-stability, notice that our learning hypothesis can change by deleting z i, i >, only if z i is the only instance in z 2,...,z equal to z. So ɛ stable () 2/ (in fact, LOOstability holds even without the expectation). However, this learning rule is not all-i-loo-stable. For instance, for any continuous distribution, f(a(s \ ), z ) f(a(s), z ) = with probability, so it obviously cannot be all-i-loo-stable with respect to i =. Next we show that for specific distributions, even ERM consistency does not iply even our weakest notion of stability.
9 Unifor Convergence ERM Strict Consistency ERM Stability ERM Consistency All AERM Stable and Consistent Exists Stable AERM Exists Consistent AERM Learnability Figure : Iplications of various properties of learning probles. Consistency refers to univeral consistency and stability refers to univeral on-average-loo stability. Exaple 7.6. There exists a learning proble and a distribution on the instance space, such that the ERM (or any AERM) is consistent but is not average-loo stable. Proof. Let the instance space be 0,, the hypothesis space consist of all finite subsets of 0,, and define the objective function as f(h, z) = {z / h} ). Consider any continuous distribution on the instance space. Since the underlying distribution D is continuous, we have F(h) = for any hypothesis h. Therefore, any learning rule (including any AERM) will be consistent with F(A(S)) =. On the other hand, the ERM here always achieves F S (ĥs) = 0, so any AERM cannot generalize, or even on-average-generalize (by Lea 6.2), hence cannot be average-loo stable (by Lea 6.). Finally, the following exaple shows that while learnability is equivalent to the existence of stable and consistent AERM s (Theore 4.4 and Theore 4.6), there ight still exist other learning rules, which are neither of the above. Exaple 7.7. There exists a learning proble with a universally consistent learning rule, which is not average-loo stable, generalizing nor an AERM. Proof. Let the instance space be 0,. Let the hypothesis space consist of all finite subsets of 0,, and the objective function be the indicator function f(h, z) = {z h}. Consider the following learning rule: given a saple S 0,, the learning rule checks if there are any two identical instances in the saple. If so, the learning rule returns the epty set. Otherwise, it returns the saple. This learning rule is not an AERM, nor does it necessarily generalize or is average-loo stable. Consider any continuous distribution on 0,. The learning rule always returns a countable set A(S), with F S (A(S)) =, while F S ( ) = 0 (so it is not an AERM) and F(A(S)) = 0 (so it does not generalize). Also, f(a(s), z i ) = 0 while f(a(s \i 0, z i ) = with probability, so it is not average- LOO stable either. However, the learning rule is universally consistent. If the underlying distribution is continuous on 0,, then the returned hypothesis is S, which is countable hence, F(S) = 0 = inf h F(h). For discrete distributions, let M denote the proportion of instances in the saple which appear exactly once, and let M 0 be the probability ass of instances which did not appear in the saple. Using 6, Theore 3, we have that for any δ, it holds with probability at least δ over a saple of size that ( ) M 0 M O log(/δ), uniforly for any discrete distribution. If this event occurs, then either M <, or M 0 O(log(/δ)/ ). But in the first event, we get duplicate instances in the saple, so the returned hypothesis is the optial, and in the second case, the returned hypothesis is the saple, which has a total probability ass of at least O(log(/δ)/ ), and therefore F(A(S)) O(log(/δ)/ ). As a result, regardless of the underlying distribution, with probability of at least δ over the saple, ( ) F(A(S)) O log(/δ). Since the r.h.s. converges to 0 with for any δ, it is easy to see that the learning rule is universally consistent. 8 Discussion In the failiar setting of supervised classification or regression, the question of learnability is reduced to that of unifor convergence of epirical risks to their expectation, and in turn to finiteness of the fat-shattering diension. Furtherore, due to the equivalence of learnability and unifor convergence, there is no need to look beyond the ERM. We recently showed 9 that the situation in the General Learning Setting is substantially ore coplex. Universal ERM consistency ight not be equivalent to unifor convergence, and furtherore, learnability ight be possible only with a non-erm. We are therefore in need of a new understanding of the question of learnability that applies ore broadly then just to supervised classification and regression. In studying learnability in the General Setting, Vapnik 0 focuses solely on epirical risk iniization, which we now know is not sufficient for understanding learnability (e.g. Exaple 7.2). Furtherore, for epirical risk iniization, Vapnik establishes unifor convergence as a necessary and sufficient condition not for ERM consistency, but rather for strict consistency of the ERM. We now know that even in rather non-trivial probles (e.g. Exaple 7. taken fro 9), where the ERM is consistent and generalizes, strict consistency does not hold. Furtherore, Exaple 7. also deonstrates that ERM stability guarantees ERM consistency, but not strict consistency, perhaps giving another indication that strict consistency ight be too strict (this and other relationships are depicted in Figure ). In Exaples 7. and 7.2 we see that stability is a strictly ore general sufficient condition for learnability. This akes stability an appealing candidate for understanding learnability in the ore general setting. Indeed, we show that stability is not only sufficient, but is also necessary for learning, even in the General Learning Setting. A previous such characterization was based on unifor convergence and thus applied only to supervised clas-
10 sification and regression 7. Extending the characterization beyond these settings is particularly interesting, since for supervised classification and regression the question of learnability is already essentially solved. Extending the characterization, without relying on unifor convergence, also allows us to frae stability as the core condition guaranteeing learnability, with unifor convergence only a sufficient, but not necessary, condition for stability (see Figure ). In studying the question of learnability and its relation to stability, we encounter several differences between this ore general setting, and settings such as supervised classification and regression where learnability is equivalent to unifor convergence. We suarize soe of these distinctions: Perhaps the ost iportant distinction is that in the General Setting learnability ight be possible only with a non-erm. In this paper we establish that if a proble is learnable, although it ight not be learnable with an ERM, it ust be learnable with soe AERM. And so, in the General Setting we ust look beyond epirical risk iniization, but not beyond asyptotic epirical risk iniization. In supervised classification and regression, if one AERM is universally consistent then all AERMs are universally consistent. In the General Setting we ust choose the AERM carefully. In supervised classification and regression, a universally consistent rule ust also generalize and be AERM. In the General Setting, a universally consistent rule need not generalize nor be an AERM, as exaple 7.7 deonstrates. However, Theore 4.5 establishes that, even in the General Setting, if a rule is universally consistent and generalizing then it ust be an AERM. This gives us another reason to not look beyond asyptotic epirical risk iniization, even in the General Setting. The above distinctions can also be seen through Corollary 6.0, which concerns the relationship between AERM, consistency and generalization in learnable probles. In the General Setting, any two conditions iply the other, but it is possible for any one condition to exist without the others. In supervised classification and regression, if a proble is learnable then generalization always holds (for any rule), and so universal consistency and AERM iply each other. In supervised classification and regression, ERM inconsistency for soe distribution is enough to establish non-learnability. Establishing non-learnability in the General Setting is trickier, since one ust consider all AERMs. We show how Corollary 6.0 can provide a certificate for non-learnability, in the for of a rule that is consistent and an AERM for soe specific distribution, but does not generalize (Exaple 7.6). In the General Setting, universal consistency of an AERM only guarantees on-average-loo stability, but not LOO stability as in the supervised classification setting 7. As we show in Exaple 7.4, this is a real difference and not erely a deficiency of our proofs. We have begun exploring the issue of learnability in the General Setting, and uncovered iportant relationships between learnability and stability. But any probles are left open. Throughout the paper we ignored the issue of getting high-confidence concentration guarantees. We choose to use convergence in expectation, and defined the rates as rates on the expectation. Since the objective f is bounded, convergence in expectation is equivalent to convergence in probability and using Markov s inequality we can translate a rate of the for E ɛ() to a low confidence guarantee Pr( > ɛ()/δ) δ. Can we also obtain exponential concentration results of the for Pr( > ɛ()polylog(/δ)) δ? It is possible to construct exaples in the General Setting in which convergence in expectation of the stability does not iply exponential concentration of consistency and generalization. Is it possible to show that exponential concentration of stability is equivalent to exponential concentration of consistency and generalization? We showed that existence of an average-loo stable AERM is necessary and sufficient for learnability (Theore 4.6). Although specific AERMs ight be universally consistent and generalizing without being LOO stable (Exaple 7.4), it ight still be possible to show that for a learnable proble, there always exists soe LOO stable AERM. This would tighten our converse result and establish existence of a LOO stable AERM as equivalent to learnability. Even existence of a LOO stable AERM is not as elegant and siple as having finite VC diension, or fat-shattering diension. It would be very interesting to derive equivalent but ore cobinatorial conditions for learnability. References N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive diensions, unifor convergence, and learnability. J. ACM, 44(4):65 63, O. Bousquet and A. Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2: , David Haussler. Decision theoretic generalizations of the PAC odel for neural net and other learning applications. Inforation and Coputation, 00():78 50, Michael J. Kearns, Robert E. Schapire, and Linda M. Sellie. Toward efficient agnostic learning. In Proc. of COLT 5, S. Kutin and P. Niyogi. Alost-everywhere algorithic stability and generalization error. In Proc. of UAI 8, D.A. McAllester and R.E. Schapire. On the convergence rate of good-turing estiators. In Proc. of COLT 3, S. Mukherjee, P. Niyogi, T. Poggio, and R. M. Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of epirical risk iniization. Adv. Coput. Math., 25(- 3):6 93, S. Rakhlin, S. Mukherjee, and T. Poggio. Stability results in learning theory. Analysis and Applications, 3(4):397 49, S. Shalev-Shwartz, O. Shair, N. Srebro, and K. Sridharan. Stochastic convex optiization. In Proceedings of COLT 22, V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, 995.
11 A Replaceent Stability We used the Leave-One-Out version of stabilities throughout the paper, however any of the results hold when we use the replaceent versions instead. Here we briefly survey the differences in the ain results as they apply to replaceentbased stabilities. Let S (i) denote the saple S with z i replaced by soe other z i drawn fro the sae unknown distribution D. Definition 5. A rule A is unifor-ro stable with rate ɛ stable () if for all saples S of points and z, z,..., z Z : f(a(s (i) ); z ) f(a(s); z ) ɛ stable (). Definition 6. A rule A is on-average-ro stable with rate ɛ stable () under distributions D if E S D ;z f(a(s,...,z D (i) ); z i ) f(a(s); z i ) ɛ stable (). With the above definitions replacing unifor-loo stability and on-average-loo stability respectively, all theores in Section 4 other than Theore 4.3 hold (i.e. Theore 4., Corollary 4.2, Theore 4.4 and Theore 4.6). We do not know how to obtain a replaceent-variant of Theore 4.3 even for a consistent ERM, we can only guarantee on-average-ro stability (as in Theore 4.4), but we do not know if this is enough to ensure RO stability. However, although for ERMs we can only obtain a weaker converse, we can guarantee the existence of an AERM that is not only on-average-ro stable but actually unifor-ro stable. That is, we get a uch stronger variant of Theore 4.6: Theore A.. A learning proble is learnable if and only if there exists an unifor-ro stable AERM. Proof. Clearly if there exists any rule A that is unifor-ro stable and AERM then the proble is learnable, since the learning rule A is in fact universally consistent by theore 4.. On the other hand if there exists a rule A that is universally consistent, then consider the rule A as in the construction of Lea 6.. As shown in the lea this rule is consistent. Now note that A only uses the first saples of S. Hence for i > we have A (S (i) ) = A (S) and so: f(a(s (i) ); z ) f(a(s); z ) = f(a(s (i) ); z ) f(a(s); z ) 2B We thus showed that this rule is consistent, generalizes, and is 2B -uniforly RO stable.
Computational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationLearnability, Stability and Uniform Convergence
Journal of Machine Learning Research (200) 2635-2670 ubitted 4/0; Revised 9/0; Published 0/0 Learnability, tability and Unifor Convergence hai halev-hwartz chool of Coputer cience and Engineering The Hebrew
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationOn Learnability, Complexity and Stability
On Learnability, Complexity and Stability Silvia Villa, Lorenzo Rosasco and Tomaso Poggio 1 Introduction A key question in statistical learning is which hypotheses (function) spaces are learnable. Roughly
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More information1 Proving the Fundamental Theorem of Statistical Learning
THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationShannon Sampling II. Connections to Learning Theory
Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More informationTight Complexity Bounds for Optimizing Composite Objectives
Tight Coplexity Bounds for Optiizing Coposite Objectives Blake Woodworth Toyota Technological Institute at Chicago Chicago, IL, 60637 blake@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationA Theoretical Framework for Deep Transfer Learning
A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il
More informationImproved Guarantees for Agnostic Learning of Disjunctions
Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationEfficient Learning of Generalized Linear and Single Index Models with Isotonic Regression
Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression Sha M Kakade Microsoft Research and Wharton, U Penn skakade@icrosoftco Varun Kanade SEAS, Harvard University vkanade@fasharvardedu
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationHandout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.
Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationThe Hilbert Schmidt version of the commutator theorem for zero trace matrices
The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationOn Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40
On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationTight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions
Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More information1 Identical Parallel Machines
FB3: Matheatik/Inforatik Dr. Syaantak Das Winter 2017/18 Optiizing under Uncertainty Lecture Notes 3: Scheduling to Miniize Makespan In any standard scheduling proble, we are given a set of jobs J = {j
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationStability Bounds for Stationary ϕ-mixing and β-mixing Processes
Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences
More informationA Bernstein-Markov Theorem for Normed Spaces
A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationOn Conditions for Linearity of Optimal Estimation
On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at
More informationLecture 20 November 7, 2013
CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,
More informationStability Bounds for Stationary ϕ-mixing and β-mixing Processes
Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences
More informationarxiv: v1 [cs.ds] 17 Mar 2016
Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationThe Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost
More informationSome Classical Ergodic Theorems
Soe Classical Ergodic Theores Matt Rosenzweig Contents Classical Ergodic Theores. Mean Ergodic Theores........................................2 Maxial Ergodic Theore.....................................
More informationFairness via priority scheduling
Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation
More informationA Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay
A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer
More informationA theory of learning from different domains
DOI 10.1007/s10994-009-5152-4 A theory of learning fro different doains Shai Ben-David John Blitzer Koby Craer Alex Kulesza Fernando Pereira Jennifer Wortan Vaughan Received: 28 February 2009 / Revised:
More informationA := A i : {A i } S. is an algebra. The same object is obtained when the union in required to be disjoint.
59 6. ABSTRACT MEASURE THEORY Having developed the Lebesgue integral with respect to the general easures, we now have a general concept with few specific exaples to actually test it on. Indeed, so far
More informationOn the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs
On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing
More informationarxiv: v2 [stat.ml] 23 Feb 2016
Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute
More informationOn Process Complexity
On Process Coplexity Ada R. Day School of Matheatics, Statistics and Coputer Science, Victoria University of Wellington, PO Box 600, Wellington 6140, New Zealand, Eail: ada.day@cs.vuw.ac.nz Abstract Process
More informationEfficient Learning with Partially Observed Attributes
Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy Shai Shalev-Shwartz The Hebrew University, Jerusale, Israel Ohad Shair The Hebrew University, Jerusale, Israel Abstract We describe and
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More informationStability Bounds for Non-i.i.d. Processes
tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer
More informationMath Real Analysis The Henstock-Kurzweil Integral
Math 402 - Real Analysis The Henstock-Kurzweil Integral Steven Kao & Jocelyn Gonzales April 28, 2015 1 Introduction to the Henstock-Kurzweil Integral Although the Rieann integral is the priary integration
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More informationarxiv: v2 [math.co] 3 Dec 2008
arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20
More informationInformation Theoretic Guarantees for Empirical Risk Minimization with Applications to Model Selection and Large-Scale Optimization
with Applications to Model Selection and Large-Scale Optiization Ibrahi Alabdulohsin 1 Abstract In this paper, we derive bounds on the utual inforation of the epirical risk iniization (ERM) procedure for
More informationRobustness and Regularization of Support Vector Machines
Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationPrerequisites. We recall: Theorem 2 A subset of a countably innite set is countable.
Prerequisites 1 Set Theory We recall the basic facts about countable and uncountable sets, union and intersection of sets and iages and preiages of functions. 1.1 Countable and uncountable sets We can
More informationBootstrapping Dependent Data
Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly
More informationThe degree of a typical vertex in generalized random intersection graph models
Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More informationNew upper bound for the B-spline basis condition number II. K. Scherer. Institut fur Angewandte Mathematik, Universitat Bonn, Bonn, Germany.
New upper bound for the B-spline basis condition nuber II. A proof of de Boor's 2 -conjecture K. Scherer Institut fur Angewandte Matheati, Universitat Bonn, 535 Bonn, Gerany and A. Yu. Shadrin Coputing
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationLORENTZ SPACES AND REAL INTERPOLATION THE KEEL-TAO APPROACH
LORENTZ SPACES AND REAL INTERPOLATION THE KEEL-TAO APPROACH GUILLERMO REY. Introduction If an operator T is bounded on two Lebesgue spaces, the theory of coplex interpolation allows us to deduce the boundedness
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More informationEstimating Entropy and Entropy Norm on Data Streams
Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer
More informationBounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks
Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More informationResearch Article On the Isolated Vertices and Connectivity in Random Intersection Graphs
International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for
More informationCurious Bounds for Floor Function Sums
1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International
More information