Computable Shell Decomposition Bounds

Size: px
Start display at page:

Download "Computable Shell Decomposition Bounds"

Transcription

1 Journal of Machine Learning Research 5 (2004) Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago 1427 East 60th Street Chicago, IL 60637, USA JL@TTI-CORG MCALLESTER@TTI-CORG Editor: Manfred Waruth Abstract Haussler, Kearns, Seung and Tishby introduced the notion of a shell decoposition of the union bound as a eans of understanding certain epirical phenoena in learning curves such as phase transitions Here we use a variant of their ideas to derive an upper bound on the generalization error of a hypothesis coputable fro its training error and the histogra of training errors for the hypotheses in the class In ost cases this new bound is significantly tighter than traditional bounds coputed fro the training error and the cardinality of the class Our results can also be viewed as providing a rigorous foundation for a odel selection algorith proposed by Scheffer and Joachis Keywords: Saple coplexity, classification, true error bounds, shell bounds 1 Introduction For an arbitrary finite hypothesis class we consider the hypothesis of inial training error We give a new upper bound on the generalization error of this hypothesis coputable fro the training error of the hypothesis and the histogra of the training errors of the other hypotheses in the class This new bound is typically uch tighter than ore traditional upper bounds coputed fro the training error and cardinality of the class As a siple exaple, suppose that we observe that all but one epirical error in a hypothesis space is 1/2 and one epirical error is 0 Furtherore, suppose that the saple size is large enough (relative to the size of the hypothesis class) that with high confidence we have that, for all hypotheses in the class, the true (generalization) error of a hypothesis is within 1/5 of its training error This iplies, that with high confidence, hypotheses with training error near 1/2 have true error in [3/10, 7/10] Intuitively, we would expect the true error of the hypothesis with iniu epirical error to be very near to 0 rather than siply less than 1/5 because none of the hypotheses which produced an epirical error of 1/2 could have a true error close enough to 0 that there exists a significant probability of producing 0 epirical error The bound presented here validates this intuition We show that you can ignore hypotheses with training error near 1/2 in calculating an effective size of the class for hypotheses with training error near 0 This new effective class size allows us to calculate a tighter bound on the difference between training error and true error for hypotheses with training error near 0 The new bound is proved using a distribution-dependent application of the union bound siilar in spirit to the shell decoposition introduced by Haussler et al (1996) c 2004 John Langford and David McAllester

2 LANGFORD AND MCALLESTER We actually give two upper bounds on generalization error an uncoputable bound and a coputable bound The uncoputable bound is a function of the unknown distribution of true error rates of the hypotheses in the class The coputable bound is, essentially, the uncoputable bound with the unknown distribution of true errors replaced by the known histogra of training errors Our ain contribution is that this replaceent is sound, ie, the coputable version reains, with high confidence, an upper bound on generalization error When considering asyptotic properties of learning theory bounds it is iportant to take liits in which the cardinality (or VC diension) of the hypothesis class is allowed to grow with the size of the saple In practice, ore data typically justifies a larger hypothesis class For exaple, the size of a decision tree is generally proportional the aount of training data available Here we analyze the asyptotic properties of our bounds by considering an infinite sequence of hypothesis classes H, one for each saple size, such that ln H approaches a liit larger than zero This kind of asyptotic analysis provides a clear account of the iproveent achieved by bounds that are functions of error rate distributions rather than siply the size (or VC diension) of the class We give a lower bound on generalization error showing that the uncoputable upper bound is asyptotically as tight as possible any upper bound on generalization error given as a function of the unknown distribution of true error rates ust asyptotically be greater than or equal to our uncoputable upper bound Our lower bound on generalization error also shows that there is essentially no loss in working with an upper bound coputed fro the true error distribution rather than expectations coputed fro this distribution as used by Scheffer and Joachis (1999) Asyptotically, the coputable bound is siply the uncoputable bound with the unknown distribution of true errors replaced with the observed histogra of training errors Unfortunately, we can show that in liits where ln H converges to a value greater than zero, the histogra of training errors need not converge to the distribution of true errors the histogra of training errors is a seared out version of the distribution of true errors This searing loosens the bound even in the large-saple asyptotic liit We give a precise asyptotic characterization of this searing effect for the case where distinct hypotheses have independent training errors In spite of the divergence between these bounds, the coputable bound is still significantly tighter than classical bounds not involving error distributions The coputable bound can be used for odel selection In the case of odel selection we can assue an infinite sequence of finite odel classes H 0,H 1, where each H j is a finite class with ln H j growing linearly in j To perfor odel selection we find the hypothesis of inial training error in each class and use the coputable bound to bound its generalization error We can then select, aong these, the odel with the sallest upper bound on generalization error Scheffer and Joachis propose (without foral justification) replacing the distribution of true errors with the histogra of training errors Under this replaceent, the odel selection algorith based on our coputable upper bound is asyptotically identical to the algorith proposed by Scheffer and Joachis The shell decoposition is a distribution-dependent use of the union bound Distributiondependent uses of the union bound have been previously exploited in so-called self-bounding algoriths Freund (1998) defines, for a given learning algorith and data distribution, a set S of hypotheses such that with high probability over the saple, the algorith always returns a hypothesis in that set Although S is defined in ters of the unknown data distribution, Freund gives a way of coputing a set S fro the given algorith and the saple such that, with high confidence, S contains S and hence the effective size of the hypothesis class is bounded by S Langford and 530

3 COMPUTABLE SHELL DECOMPOSITION BOUNDS Blu (1999) give a ore practical version of this algorith Given an algorith and data distribution they conceptually define a weighting over the possible executions of the algorith Although the data distribution is unknown, they give a way of coputing a lower bound on the weight of the particular execution of the algorith generated by the saple at hand In this paper we consider distribution dependent union bounds defined independent of any particular learning algorith The bounds given in this paper apply to finite concept classes Of course ore sophisticated easures of the coplexity of a concept class, such as VC diension or Radeacher coplexity, are possible and can soeties result in tighter bounds However, insight into finite classes reains useful in at least two ways Finite class analysis is useful as a pedagogical tool, teaching about directions in which to look for the reoval of slack fro these ore sophisticated bounds Indeed, various localized Radeacher coplexity results (Bartlett et al, 2002) and the peeling technique (van de Geer, 1999) appear to (roughly) correspond to the orthogonal cobination of shell bounds and earlier Radeacher coplexity results One advantage of the shell bounds is the KL-divergence for of the bounds which soothly interpolates between the linear bounds of the realizable case and the quadratic bounds of the unrealizable case This realizable-unrealizable interpolation is orthogonal to the shell principle that concepts with large epirical error are unlikely to be confused with concepts with low error rate The shell bound also supports intuitions that are difficult to achieve in ore coplex settings For exaple, the siple shell bounds clearly exhibit phase transitions in the learning bound, soething which does not appear to be well-elucidated for localalized Radeacher bounds In suary, the siplicity of finite classes (and a shell bound analysis on a finite class) provides a clarity that is difficult to achieve with ore coplex structure-exploiting bounds Finite class analysis is also useful in a ore practical sense In practice a finite VC diension class usually has a finite paraeterization Given that these real paraeters are typically represented as 32 bit floating point nubers, the class becoes finite and the log of the class size is linear in the nuber of paraeters Since any of the ore sophisticated infinite-class techniques are loose by large ultiplicative constants, a finite class analysis applied to a VC class discretized to a sall nuber of bits can actually yield tighter bounds as shown in Figure 1 2 Matheatical Preliinaries For an arbitrary easure on an arbitrary saple space we use the notation 1 δ S Φ[S,δ] to ean that with probability at least 1 δ over the choice of the saple S we have that Φ[S,δ] holds In practice S is the training saple of a learning algorith Note that x δ S Φ[x, S, δ] does not iply δ S x Φ[x, S, δ] If X is a finite set, and for all x X we have the assertion δ > 0 δ S Φ[S,x,δ] then by a standard application of the union bound we have the assertion δ > 0 δ S x X Φ[S,x, δ X ] We call this the quantification rule If δ > 0 δ S Φ[S,δ] and δ > 0 δ S Ψ[S, δ] then by a standard application of the union bound we have δ > 0 δ S Φ[S, δ 2 ] Ψ[S, δ 2 ] We call this the conjunction rule The KL-divergence of p fro q, denoted D(q p), is qln( q 1 q p ) + (1 q)ln( 1 p ) with 0ln( 0 p ) = 0 and qln( q 0 ) = Let ˆp be the fraction of heads in a sequence S of tosses of a biased coin where 1 This can be read as for all but δ sets S, the predicate Φ[S,δ] holds or with probability 1 δ over the draw of S, the predicate Φ[S,δ] holds 531

4 LANGFORD AND MCALLESTER True Error Bound VC bound ORB (32 bits) ORB (16 bits) ORB (8 bits) Training Error Training Error Figure 1: A graph coparing the (infinite hypothesis) VC bound to the finite hypothesis Occa s razor bound For all curves we use VC diension d = 10, bound failure probability, δ = 01, and = 1000 exaples For the VC bound calculation (see Moore, 2004, for details) the forula is true error train error + (d ln 2 d + ln 4 δ )/ For the Occa s Razor Bound (see Langford, 2003, for details) calculation, we use a unifor distribution over the 2 kd discrete classifiers which ight be representable when we discretize d paraeters to k = 8, 16, 32 bits per diension The basic forula is: KL(train error true error) (kd ln2 + ln 1 δ )/ This graph is approxiatly the sae for any siilar ratio of d/ with saller values favoring the Occa s Razor Bound 532

5 COMPUTABLE SHELL DECOMPOSITION BOUNDS the probability of heads is p We have the following inequality given by Chernoff (1952): This bound can be rewritten as follows: q [p,1] : Pr( ˆp q) e D(q p) (1) δ > 0 δ S D(ax( ˆp, p) p) ln( 1 δ ) (2) To derive (2) fro (1) note that Pr(D(ax( ˆp, p) p) ln( 1 δ ) ) equals Pr( ˆp q) where q p and D(q p) = ln( 1 δ ) By (1) we then have that this probability is no larger than e D(q p) = δ It is just as easy to derive (1) fro (2) so the two stateents are equivalent By duality, ie, by considering the proble defined by replacing p by 1 p, we get Conjoining (2) and (3) yields the following corollary of (1): δ > 0 δ S D(in( ˆp, p) p) ln( 1 δ ) (3) δ > 0 δ S D( ˆp p) ln( 2 δ ) (4) Using the inequality D(q p) 2(q p) 2 one can show that (4) iplies the better known for of the Chernoff bound δ > 0 δ ln( 2 δ S p ˆp ) 2 (5) Using the inequality D(q p) (p q)2 2q, which holds for q p, we can show that (3) iplies the following: 2 δ > 0 δ 2 ˆpln( 1 δ S p ˆp + ) + 2ln( 1 δ ) (6) Note that for sall values of ˆp forula (6) gives a tighter upper bound on p than does (5) The upper bound on p iplicit in (4) is soewhat tighter than the iniu of the bounds given by (5) and (6) We now consider a foral setting for hypothesis learning We assue a finite set H of hypotheses and a space X of instances We assue that each hypothesis represents a function fro X to {0,1} where we write h(x) for the value of the function represented by hypothesis h when applied to instance x We also assue a distribution D on pairs x, y with x X and y {0,1} For any hypothesis h we define the error rate of h, denoted e(h), to be P x, y D (h(x) y) For a given saple S of pairs drawn fro D we write ê(h) to denote the fraction of the pairs x, y in S such that h(x) y Quantifying over h H in (4) yields the following second corollary of (1): δ S h H D(ê(h) e(h)) ln H + ln( 2 δ ) (7) 2 A derivation of this forula can be found in Mansour and McAllester (2000) or McAllester and Schapire (2000) To see the need for the last ter consider the case where ˆp = 0 533

6 LANGFORD AND MCALLESTER By considering bounds on D(q p) we can derive the ore well known corollary of (7), ln H δ + ln( 2 δ S h H e(h) ê(h) ) (8) 2 These two forulas both liit the distance between ê(h) and e(h) In this paper we work with (7) rather than (8) because it yields an (iplicit) upper bound on generalization error that is optial up to asyptotic equality 3 The Upper Bound Our goal now is to iprove on (7) Our first step is to divide the hypotheses in H into disjoint sets based on their true error rates More specifically, for p [0,1] define p to be ax(1, p ) Note that p is of the for k k 1 where either p = 0 and k = 1 or p > 0 and p (, k ] In either case we have p { 1,, } and if p = k k 1 then p [, k ] Now we define H ( k ) to be the set of h H such that e(h) = k We define s( k ) to be ln(ax(1, H ( k ) )) We now have the following lea Lea 31 With high probability over the draw of S, for all hypotheses, the deviation between the epirical error ê(h), and true error e(h), of every hypothesis is bounded by s( e(h) ) More precisely, δ > 0 δ S h H, D(ê(h) e(h)) s( e(h) ) + ln( 2 δ ) Proof Quantifying over p { 1,, } and h H (p) in (4) gives δ > 0, δ S, p { 1,, }, h H (p), But this iplies the lea D(ê(h) e(h)) lns(p) + ln( 2 δ ) Lea 31 iposes a constraint, and hence a bound, on e(h) More specifically, we have the following where lub {x : Φ[x]} denotes the least upper bound (the axiu) of the set {x : Φ[x]}: e(h) lub {q : D(ê(h) q) s( q ) + ln( 2 δ ) } (9) This is our uncoputable bound It is uncoputable because the nubers s( 1 ),, s( ) are unknown Ignoring this proble, however, we can see that this bound is typically significantly tighter than (7) More specifically, we can rewrite (7) as e(h) lub {q : D(ê(h) q) ln H + ln( 2 δ ) } (10) Since s( k ln ) ln H, and since is sall for large, we have that (9) is never significantly looser than (10) Now consider a hypothesis h such that the bound on e(h) given by (7), or equivalently, 534

7 COMPUTABLE SHELL DECOMPOSITION BOUNDS (10), is significantly less than 1/2 Assuing is large, the bound given by (9) ust also be significantly less than 1/2 But for q significantly less than 1/2 we typically have that s( q ) is significantly saller than ln H For exaple, suppose H is the set of all decision trees of size /10 For large, a rando decision tree of this size has error rate near 1/2 The set of decision trees with error rate significantly saller than 1/2 is an exponentially sall fraction of the set of all possible trees So for q sall copared to 1/2 we get that s( q ) is significantly saller than ln H This akes the bound given by (9) significantly tighter than the bound given by (10) We now show that the distribution of true errors can be replaced, essentially, by the histogra of training errors We first introduce the following definitions: Ĥ ( ) k,δ ( ) k ŝ, δ h H : ( ( ln ax 1, 2 Ĥ ê(h) k 1 + ( ) )) k, δ The definition of ŝ ( k, δ) is otivated by the following lea ln( 162 δ ) 2 1 Lea 32 With high probability over the draw of S, for all q, s(q) ŝ(q,2δ) More precisely, δ > 0, δ S, q { 1,, }, s(q) ŝ(q, 2δ) Before proving Lea 32 we note that by conjoining (9) and Lea 32 we get the following This is our ain result Theore 33 With high probability over the draw of S, for all hypotheses, the deviation between the epirical error ê(h), and true error e(h), of every hypothesis is bounded by ŝ( q, δ) More precisely, δ > 0, δ S, h H, e(h) lub { q : D(ê(h) q) ŝ( q, δ) + ln( 4 δ ) As for Lea 31, the bound iplicit in Theore 33 is typically significantly tighter than the bound in (7) or its equivalent for (10) The arguent for the iproved tightness of Theore 33 over (10) is siilar to the arguent for (9) More specifically, consider a hypothesis h for which the bound in (10) is significantly less than 1/2 Since ŝ( q, δ) ln H, the set of values of q satisfying the condition in Theore 33 ust all be significantly less than 1/2 But for large we ln(16 2 /δ) 2 1 have that is sall So if q is significantly less than 1/2 then all hypotheses in Ĥ ( q,δ) have epirical error rates significantly less than 1/2 But for ost hypothesis classes, eg, decision trees, the set of hypotheses with epirical error rates far fro 1/2 should be an exponentially sall fraction of the class Hence we get that ŝ( q, δ) is significantly less than ln H and Theore 33 is tighter than (10) The reainder of this section is a proof of Lea 32 Our departure point for the proof is the following lea fro McAllester (1999) }, 535

8 LANGFORD AND MCALLESTER Lea 34 (McAllester 99) For any easure on any hypothesis class we have the following where E h f (h) denotes the expectation of f (h) under the given easure on h: δ > 0 δ S E h e (2 1)(ê(h) e(h))2 4 δ Intuitively, this lea states that with high confidence over the choice of the saple ost hypotheses have epirical error near their true error This allows us to prove that ŝ( q, δ) bounds s( q ) More specifically, by considering the unifor distribution on H ( k ), Lea 34 iplies ( E h H ( k ) e (2 1)(ê(h) e(h))2) 4 δ ( Pr h H ( k ) e (2 1)(ê(h) e(h))2 8 ) δ ( Pr h H ( k ) e (2 1)(ê(h) e(h))2 8 ) δ h H ( k ln( 8 δ ) : ê(h) e(h) ) 2 1 h H ( k ) : ê(h) k 1 + ln( 8 δ ) 2 1 H ( k ) 2 Ĥ ( ) k, 2δ Lea 32 now follows by quantification over q { 1,, } H ( k ) 1 2 H ( k ) 4 Asyptotic Analysis and Phase Transitions This section and the two that follow give an asyptotic analysis of the bounds presented earlier The asyptotic analysis is stated in Theore 41 and Stateent 61 To develop the asyptotic analysis, however, a preliinary discussion is needed regarding the phenoenon of phase transitions The bounds given in (9) and Theore 33 exhibit phase transitions More specifically, the bounding expression can be discontinuous in δ and, eg, arbitrarily sall changes in δ can cause large changes in the bound To see how this happens consider the constraint on the quantity q: D(ê(h) q) s( q ) + ln( 2 δ ) (11) The bound given by (9) is the least upper bound of the values of q satisfying (11) Assue that is sufficiently large that we can think of s( q ) as a continuous function of q which we write as s(q) We can then rewrite (11) where λ is a quantity not depending on q and s(q) does not depend on δ: D(ê(h) q) s(q) + λ (12) 536

9 COMPUTABLE SHELL DECOMPOSITION BOUNDS For q ê(h) we know that D(ê(h) q) is a onotonically increasing function of q It is reasonable to assue that for q 1/2 we also have that s(q) is a onotonically increasing function of q But even under these conditions it is possible that the feasible values of q, ie, those satisfying (12), can be divided into separated regions Furtherore, increasing λ can cause a new feasible region to coe into existence When this happens the bound, which is the least upper bound of the feasible values, can increase discontinuously At a ore intuitive level, consider a large nuber of high error concepts and saller nuber of lower error concepts At a certain confidence level the high error concepts can be ruled out But as the confidence requireent becoes ore stringent suddenly (and discontinuously) the high error concepts ust be considered A siilar discontinuity can occur in saple size Phase transitions in shell decoposition bounds are discussed in ore detail by Haussler et al (1996) Phase transitions coplicate asyptotic analysis But asyptotic analysis illuinates the nature of phase transitions As entioned in the introduction, in the asyptotic analysis of learning theory bounds it is iportant that one does not hold H fixed as the saple size increases If we hold H ln H fixed then li = 0 But this is not what one expects for large saples in practice As the saple size increases one typically uses larger hypothesis classes Intuitively, we expect that even for very large we have that ln H is far fro zero For the asyptotic analysis of the bound in (9) we assue an infinite sequence of hypothesis classes H 1, H 2, H 3 and an infinite sequence of data distributions D 1, D 2, D 3, Let s ( k ) be s( k ) defined relative to H and D In the asyptotic analysis we assue that the sequence of functions s ( q ), viewed as functions of q [0, 1], converge uniforly to a continuous function s(q) This eans that for any ε > 0 there exists a k such that for all > k we have q [0,1] s ( q ) s(q) ε Given the functions s ( p ) and their liit function s(p), we define the following functions of an epirical error rate ê: { B (ê) lub q : D(ê q) s ( q ) + ln( 2 δ ) }, B(ê) lub {q : D(ê q) s(q)} The function B (ê) corresponds directly to the upper bound in (9) The function B(ê) is intended to be the large asyptotic liit of B (ê) However, phase transitions coplicate asyptotic analysis The bound B(ê) need not be a continuous function of ê A value of ê where the bound B(ê) is discontinuous corresponds to a phase transition in the bound At a phase transition the sequence B (ê) need not converge Away fro phase transitions, however, we have the following theore Theore 41 If the bound B(ê) is continuous at the point ê (so we are not at a phase transition), and the functions (paraeterized by ) s ( q ), viewed as functions of q [0,1], converge uniforly to a continuous function s(q), then we have li B (ê) = B(ê) 537

10 LANGFORD AND MCALLESTER Proof Define the set F (ê) as This gives F (ê) { Siilarly, define F(ê, ε) and B(ê, ε) as q : D(ê q) s ( q ) + ln( 2 δ ) B (ê) = lub F (ê) F(ê, ε) {q [0,1] : D(ê q) s(q) + ε} B(ê, ε) lub F(ê, ε) We first show that the continuity of B(ê) at the point ê iplies the continuity of B(ê, ε) at the point ê, 0 We note that there exists a continuous function f (ê, ε) with f (ê, 0) = ê and such that for any ε sufficiently near 0 we have We then have D( f (ê, ε) q) = D(ê q) ε B(ê, ε) = B( f (x, ε)) Since f is continuous, and B(ê) is continuous at the point ê, we get that B(ê, ε) is continuous at the point ê, 0 We now prove the lea The functions of the for s ( q )+ln 2 δ converge uniforly to the function s(q) This iplies that for any ε > 0 there exists a k such that for all > k we have But this in turn iplies that F(ê, ε) F (ê) F(ê, ε) B(ê, ε) B (ê) B(ê, ε) (13) The lea now follows fro the continuity of the function B(ê, ε) at the point ê, 0 } Theore 41 can be interpreted as saying that for large saple sizes, and for values of ê other than the special phase transition values, the bound has a well defined value independent of the confidence paraeter δ and deterined only by a sooth function s(q) A siilar stateent can be ade for the bound in Theore 33 for large, and at points other than phase transitions, the bound is independent of δ and is deterined by a sooth liit curve For the asyptotic analysis of Theore 33 we assue an infinite sequence H 1, H 2, H 3, of hypothesis classes and an infinite sequence S 1, S 2, S 3, of saples such that saple S has size Let H ( k, δ) and ŝ ( k, δ) be H ( k, δ) and ŝ( k, δ) respectively defined relative to hypothesis class H and saple S Let U ( k ) be the set of hypotheses in H having an epirical error of exactly k in the saple S Let u ( k ) be ln(ax(1, U ( k ) ) In the analysis of Theore 33 we allow that the functions u ( q ) are only locally uniforly convergent to a continuous function ū(q), ie, for any q [0,1] and any ε > 0 there exists an integer k and real nuber γ > 0 satisfying > k, p (q γ, q + γ) u ( p ) ū(p) ε Locally unifor convergence plays a role in the analysis in Section 6 538

11 COMPUTABLE SHELL DECOMPOSITION BOUNDS Theore 42 If the functions u ( q ) then, for any fixed value of δ, the functions ŝ( q, δ) converge locally uniforly to a continuous function ū(q) convergence of u ( q ) is unifor, then so is the convergence of ŝ( q, δ) also converge locally uniforly to ū(q) If the Proof Consider an arbitrary value q [0,1] and ε > 0 We construct the desired k and γ More specifically, select k sufficiently large and γ sufficiently sall that we have the properties > k, p (q 2γ, q + 2γ) u ( p ) ū(p) < ε 3, p (q 2γ, q + 2γ) ū(p) ū(q) ε 3, 1 k + ln( 16k2 δ ) 2k 1 < γ, lnk k ε 3 Consider an > k and p (q γ, q + γ) It now suffices to show that ŝ ( p, δ) Because U ( p ) is a subset of H ( p, δ) we have We can also upper bound ŝ( p, δ) ŝ ( p, δ) as follows: u ( p ) H ( p,δ) ū(p) ε k p γ ū(p) ε 3 ( k U ) ŝ( p, δ) e u( k ) k p γ e (ū( k )+ 3 ε ) k p γ e (ū(p)+ 2ε 3 ) k p γ 2ε (ū(p)+ e 3 ) ū(p) + 2ε 3 + ln ū(p) + ε 539

12 LANGFORD AND MCALLESTER A siilar arguent shows that if u ( q ) converges uniforly to ū(q) then so does u ( q ) Given quantities ŝi( q, δ) that converge uniforly to ū(q) the reainder of the analysis is identical to that for the asyptotic analysis of (9) We define the upper bounds { ˆB (ê) lub q : D(ê q) ŝ( q, δ) + ln ( )} 4 δ ˆB(ê) lub {q : D(ê q) ū(q)} Again we say that ê is at a phase transition if the function ˆB(ê) is discontinuous at the value ê We then get the following whose proof is identical to that of Theore 41 Theore 43 If the bound ˆB(ê) is continuous at the point ê (so we are not at a phase transition), and the functions u ( q ) converge uniforly to ū(q), then we have that 5 Asyptotic Optiality of (9) li ˆB (ê) = ˆB(ê) Forula (9) can be viewed as providing an upper bound on e(h) as a function of ê(h) and the function s In this section we show that for any curve s and value ê there exists a hypothesis class and data distribution such that the upper bound in (9) is realized up to asyptotic equality Up to asyptotic equality, (9) is the tightest possible bound coputable fro ê(h) and the nubers s( 1 ),, s( ) The classical VC diensions bounds are nearly optial over bounds coputable fro the chosen hypothesis error rate ê(h ) and the class H The nubers s( 1 ),, s( ) depend on both H and the data distribution Hence the bound in (9) uses inforation about the distribution and hence can be tighter than classical VC bounds A siilar stateent applies to the bound in Theore (33) coputed fro the epirically observable nubers ŝ( 1 ),, ŝ( ) In this case, the bound uses ore inforation fro the saple than just ê(h) The optiality theore given here also differs fro the traditional lower bound results for VC diension in that here the lower bounds atch the upper bounds up to asyptotic equality The departure point for our optiality analysis is the following lea fro Cover and Thoas (1991) Lea 51 (Cover and Thoas) If ˆp is the fraction of heads out of tosses of a coin where the true probability of heads is p then for q p we have Pr( ˆp q) e D(q p) This lower bound on Pr( ˆp q) is very close to Chernoff s 1952 upper bound (1) The tightness of (9) is a direct reflection of the tightness (1) To exploit Lea 51 we need to construct hypothesis classes and data distributions where distinct hypotheses have independent training errors More specifically, we say that a set of hypotheses {h 1,, h n } has independent training errors if the rando variables ê(h 1 ),, ê(h n ) are independent 540

13 COMPUTABLE SHELL DECOMPOSITION BOUNDS By an arguent siilar to the derivation of (3) fro (1) we can prove the fro Lea 51 that ( Pr D(in( ˆp, p) p) ln( ) 1 δ ) ln( + 1) δ (14) Lea 52 Let X be any finite set, S a rando variable, and Θ[S,x,δ] a forula such that for every x X and δ > 0 we have Pr(Θ[S,x,δ]) δ, and Pr( x X Θ[S, x, δ]) = x X Pr(Θ[S, x, δ]) We then have δ > 0 δ S x X Θ[S,x, ln( 1 δ ) X ] Proof Pr(Θ[S,x, ln( 1 δ ) X ]) ln( 1 δ ) X Pr( Θ[S,x, ln( 1 δ ) X ]) 1 ln( 1 δ ) X e ln( 1 ) δ X Pr( x X Θ[S, x, ln( 1 δ ) X ]) e ln( 1 δ ) = δ Now define h ( k ) to be the hypothesis of inial training error in the set H ( k ) Let glb {x : Φ[x]} denote the greatest lower bound (the iniu) of the set {x : Φ[x]} We now have the following lea Lea 53 If the hypotheses in the class H ( q ) are independent then δ > 0, δ S, q { 1,, }, ê(h (q)) glb { ê : D(in(ê, q 1 ) q) s(q) ln(+1) ln(ln( δ )) Proof To prove Lea 53 let q be a fixed rational nuber of the for k Assuing independent hypotheses we can applying Lea 52 to (14) to get δ > 0, δ S, h H ( k ), D(in(ê(h),e(h)) e(h)) s(q) ln( + 1) ln(ln( 1 δ )) Let w be the hypothesis in H (q) satisfying this forula We now have ê(h (q)) ê(w) and q 1 e(w) q These two conditions iply δ > 0, δ S, This iplies that ê(h (q)) glb D(in(ê(h (q)),q 1 ) q) s(q) ln(+1) ln(ln 1 δ ) { ê : D(in(ê, q 1 ) q) s(q) ln(+1) ln(ln( 1 δ )) Lea 53 now follows by quantification over q { 1,, } } } 541

14 LANGFORD AND MCALLESTER For q [0,1] we have that Lea 31 iplies that { ê(h ( q )) glb ê : D ( } ê q ) 1 s( q )+ln( 2 δ ) We now have upper and lower bounds on the quantity ê(h ( q )) which agree up to asyptotic equality in a large liit where s ( q ) converges (pointwise) to a continuous function s(q) we have that the upper and lower bound on ê(h ( q )) both converge (pointwise) to ê(h (q)) = glb {ê : D(ê q) s(q)} This asyptotic value of ê(h (q)) is a continuous function of q Since q is held fixed in calculating the bounds on ê( q ), phase transitions are not an issue and unifor convergence of the functions s ( q ) is not required Note that for large and independent hypotheses we get that ê(h (q)) is deterined as a function of the true error rate q and s( q ) The following lea states that any liit function s(p) is consistent with the possibility that hypotheses are independent This, together with Lea 53 iplies that no unifor bound on e(h) as a function of ê(h) and H ( 1 ),, H ( ) can be asyptotically tighter than (9) Theore 54 Let s(p) be any continuous function of p [0,1] There exists an infinite sequence of hypothesis spaces H 1, H 2, H 3,, and sequence of data distributions D 1, D 2, D 3, such that each class H has independent hypotheses for data distribution D and such that s ( p ) converges (pointwise) to s(p) Proof First we show that if H ( i ) = e s( i ) then the functions s ( p ) s(p) Assue H ( i ) = e s( i ) In this case we have converge (pointwise) to s ( p ) = s( p ) Since s(p) is continuous, for any fixed value of p we get that s ( p ) converges to s(p) Recall that D is a probability distribution on pairs x, y with y {0,1} and x X for soe set X We take H to be a disjoint union of sets H ( k ) where H ( k ) is selected as above Let f 1,, f N be the eleents of H with N = H Let X be the set of all N-bit bit strings and define f i (x) to be the value of ith bit of the bit vector x Now define the distribution D on pairs x, y by selecting y to be 1 with probability 1/2 and then selecting each bit of x independently where the ith bit is selected to disagree with y with probability k where k is such that f i H ( k ) 6 Relating ŝ and s In this section we show that in large liits of the type discussed in Section 4 the histogra of epirical errors need not converge to the histogra of true errors So even in the large asyptotic liit, the bound given by Theore 33 is significantly weaker than the bound given by (9) To show that ŝ( q, δ) can be asyptotically different fro s( q ) we consider the case of independent hypotheses More specifically, given a continuous function s(p) we construct an infinite 542

15 COMPUTABLE SHELL DECOMPOSITION BOUNDS sequence of hypothesis spaces H 1, H 2, H 3, and an infinite sequence of data distributions D 1, D 2, D 3, using the construction in the proof of Theore 54 We note that if s(p) is differentiable with bounded derivative then the functions s ( p ) converge uniforly to s(p) For a given infinite sequence data distributions we generate an infinite saple sequence S 1, S 2, S 3,, by selecting S to consists of pairs x, y drawn IID fro distribution D For a given saple sequence and h H we define ê (h) and ŝ ( k, δ) in a anner siilar to ê(h) and ŝ( k, δ) but for saple S The ain result of this section is the following Conjecture 61 If each H has independent hypotheses under data distribution D, and the functions s ( p ) converge uniforly to a continuous function s(p), then for any δ > 0 and p [0, 1], we have, with probability 1 over the generation of the saple sequence, that ŝ ( p,δ) li = sup s(q) D(p q) q [0,1] We call this a conjecture rather than a theore because the proof has not been worked out to a high level of rigor Nonetheless, we believe the proof sketch given below can be expanded to a fully rigorous arguent Before giving the proof sketch we note that the liiting value of ŝ( p, δ) is independent of δ This is consistent with Theore 42 Define ŝ(p) sup q [0,1] s(q) D(p q) Note that ŝ(p) s(p) This gives an asyptotic version of Lea 32 But since D(p q) can be locally approxiated as c(p q) 2 (up to its second order Taylor expansion), if s(p) is increasing at the point p then we also get that ŝ(p) is strictly larger than s(p) Proof Outline: To prove Stateent 61 we first define H (p, q) for p,q { 1,, } to be the set of all h H (q) such that ê (h) = p Intuitively, H (p, q) is the set of concepts with true error rate near q that have epirical error rate p Ignoring factors that are only polynoial in, the probability of a hypothesis with true error rate q having epirical error rate p can be written as (approxiately) e D(p q) So the expected size of H (p, q) can be written as H (q) e D(p q), or alternatively, (approxiately) as e s(q) e D(p q) or e ( s(q) D(p q)) More forally, we have, for any fixed value of p and q, ln(ax(1, E( H ( p, q ) ))) li = ax(0, s(q) D(p q)) We now show that the expectation can be eliinated fro the above liit First, consider distinct values of p and q such that s(q) D(p q) > 0 Since p and q are distinct, the probability that a fixed hypothesis in H ( q ) is in H ( p, q ) declines exponentially in Since s(q) D(p q) > 0 the expected size of H ( p, q ) grows exponentially in Since the hypotheses are independent, the distribution of possible values of H ( p, q ) becoes essentially a Poisson ass distribution with an expected nuber of arrivals growing exponentially in The probability that H ( p, q ) deviates fro its expectation by as uch as a factor of 2 declines exponentially in We say that a saple sequence is safe after k if for all > k we 543

16 LANGFORD AND MCALLESTER have that H ( p, q ) is within a factor of 2 of its expectation Since the probability of being unsafe at declines exponentially in, for any δ there exists a k such that with probability at least 1 δ the saple sequence is safe after k So for any δ > 0 we have that with probability at least 1 δ the sequence is safe after soe k But since this holds for all δ > 0, with probability 1 such a k ust exist: We now define li ln(ax(1, H ( p, q ) )) = s(q) D(p q) s ( p, q ) ln(ax(1, H ( p, q ) )) It is also possible to show for p = q we have that with probability 1 we have that s ( p, q ) approaches s(p) and that for distinct p and q with s(q) D(p q) 0 we have that s ( q, q ) approaches 0 Putting these together yields that, with probability 1, we have li s ( p, q ) = ax(0, s(q) D(p q)) (15) Define U ( k ) and u ( k ) as in Section 4 We now have the following equality: U (p) = q { 1,, }H (p, q) u (p) We now show that with probability 1, approaches ŝ(p) First, consider a p [0, 1] such that ŝ(p) > 0 Let Since s(q) D(q p) is a continuous function, and [0,1] is a copact set, sup q [0,1] s(q) D(p q) ust be realized at soe value q [0,1] Let q be such that s(q ) D(p q ) equals ŝ(p) We have that u ( p ) s ( p, q ) This, together with (15), iplies that u ( p ) li inf ŝ(p) The saple sequence is safe at and k if H ( p, k ) does not exceed twice the expectation of H ( p, q ) Assuing unifor convergence of s ( p ), the probability of not being safe at and k declines exponentially in at a rate at least as fast as the rate of decline of the probability of not being safe at and q By the union bound this iplies that for a given the probability that there exists an unsafe k also declines exponentially We say that the sequence is safe after N if it is safe for all and k with > N The probability of not being being safe after N also declines exponentially with N By an arguent siilar to that given above, this iplies that with probability 1 over the choice of the sequence there exists a N such that the sequence is safe after N But if we are safe at then U ( p ) 2E H (p, q ) This iplies that li sup u ( p ) ŝ(p) Putting the two bounds together we get li u ( p ) 544 = ŝ(p)

17 COMPUTABLE SHELL DECOMPOSITION BOUNDS The above arguent establishes (to soe level of rigor) pointwise convergence of u ( p ) to ŝ(p) It is also possible to establish a convergence rate that is a continuous function of p This iplies that the convergence of u ( p ) can be ade locally unifor Theore 42 then iplies the desired result 7 Iproveents Theore 33 has been iproved in various ways (Langford, 2002): Reoving the discretization of true errors, Using one-sided bounds, Using nonunifor union bounds over discrete values of the for k, Tightening the Chernoff bound using direct calculation of Binoial coefficients, and Iproving Lea 34 These iproveents allow the reoval of all but one ln() ters fro the stateent of the bound However, they do not iprove the asyptotic equations given by Theore 41 and Stateent 61 A practical difficulty with the bound in Theore 33 is that it is usually ipossible to enuerate the eleents of an exponentially large hypothesis class and hence ipractical to copute the histogra of training errors for the hypotheses in the class In practice the values of s( k ) ight be estiated using soe for of Monte-Carlo Markov chain sapling over the hypotheses For certain hypothesis spaces it ight also be possible to directly calculate the epirical error distribution without evaluating every hypothesis For exaple, this can be done with partition rules which, given a fixed partition of the input space, ake predictions which are constant on each partition If there are n eleents in the partition then there are 2 n partition rules For a fixed partition, the histogra of epirical errors for the 2 n partition rules can be coputed in polynoial tie Note that the class of decision trees is a union of partition rules where the structure of a tree defines a partition and the labels at the leaves of the tree define a particular partition rule relative to that partition Taking advantage of this, it is suprisingly easy to copute a shell bound for sall decision trees (Langford, 2002) 8 Discussion and Conclusion Traditional PAC bounds are stated in ters of the training error and class size or VC diension The coputable bound given here is soeties uch tighter because it exploits the additional inforation in the histogra of training errors The uncoputable bound uses the additional (unavailable) inforation in the distribution of true errors Any distribution of true errors can be realized in a case with independent hypotheses We have shown that in such cases this uncoputable bound is asyptotically equal to actual generalization error Hence this is the tightest possible bound, up to asyptotic equality, over all bounds expressed as functions of ê(h ) and the distribution of true errors We have also shown that the use of the histogra of epirical errors results in a bound that, while still tighter than traditional bounds, is looser than the uncoputable bound even in the large saple asyptotic liit 545

18 LANGFORD AND MCALLESTER One of the goals of learning theory is to give generalization guarantees that are predictive of actual generalization error It is well known that the actual generalization error can exhibit phase transitions as the saple size increases the expected generalization error can jup essentially discontinuously in saple size So accurate true error bounds should also exhibit phase transitions Shell bounds exhibit these phase transitions while other bounds such as VC diension results do not The phase transitions can also be interpreted as a stateent about the bound as a function of the confidence paraeter δ As the value of δ is varied the bound ay shift essentially discontinuously To put this another way, let h be the hypothesis of inial training error on a large saple Near a phase transition in true generalization error (as opposed to a phase transition in the bound) we ay have that with probability 1 δ the true error of h is near its training error but with probability δ/2, say, the true error of h can be far fro its training error More traditional bounds do not exhibit this kind of sensitivity to δ Bounds that exhibit phase transitions see to bring the theoretical analysis of generalization closer to the actual phenoenon Acknowledgents Yoav Freund, Avri Blu, and Tobias Scheffer all provided useful discussion in foring this paper References P Bartlett, O Bousquet and S Mendelson Localized Radeacher coplexities Proceedings of the 15th Annual Conference on Coputational Learning Theory, pp 44-58, (2002) H Chernoff A easure of asyptotic efficiency for test of a hypothesis based on the su of observations Annals of Matheatical Statistics, 23: , 1952 T M Cover and J A Thoas Eleents of Inforation Theory, Wiley, 1991 Y Freund Self bounding algoriths Coputational Learning Theory (COLT), 1998 D Haussler, M Kearns, H S Seung, and N Tishby (1996) Rigorous learning curve bounds fro statistical echanics Machine Learning 25, pp , 1996 J Langford Practical prediction theory for classification, ICML 2003 tutorial, avaliable at jl/projects/prediction bounds/tutorial/tutorialps J Langford Quantitatively tight saple coplexity bounds PhD Thesis, Carnegie Mellon, 2002 J Langford and A Blu, Microchoice and self-bounding algoriths Coputational Learning Theory (COLT), 1999 Y Mansour and D McAllester Generalization bounds for decision trees Coputational Learning Theory (COLT), 2000 A Moore VC diension for characterizing classifiers Tutorial at 2cscuedu/ aw/tutorials/vcdi08pdf 546

19 COMPUTABLE SHELL DECOMPOSITION BOUNDS D McAllester PAC-Bayesian odel averaging Coputational Learning Theory (COLT), 1999 D McAllester and R Schapire On the convergence rate of good-turing estiators Coputational Learning Theory (COLT), 2000 T Scheffer and T Joachis Expected error analysis for odel selection International Conference on Machine Learning (ICML), 1999 S van de Geer Epirical Process in M-Estiation Cabridge University Press,

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Improved Guarantees for Agnostic Learning of Disjunctions

Improved Guarantees for Agnostic Learning of Disjunctions Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

The Frequent Paucity of Trivial Strings

The Frequent Paucity of Trivial Strings The Frequent Paucity of Trivial Strings Jack H. Lutz Departent of Coputer Science Iowa State University Aes, IA 50011, USA lutz@cs.iastate.edu Abstract A 1976 theore of Chaitin can be used to show that

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions Tight Inforation-Theoretic Lower Bounds for Welfare Maxiization in Cobinatorial Auctions Vahab Mirrokni Jan Vondrák Theory Group, Microsoft Dept of Matheatics Research Princeton University Redond, WA 9805

More information

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437 cs-ftl 010/9/8 0:40 page 431 #437 16 Independence 16.1 efinitions Suppose that we flip two fair coins siultaneously on opposite sides of a roo. Intuitively, the way one coin lands does not affect the way

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I Contents 1. Preliinaries 2. The ain result 3. The Rieann integral 4. The integral of a nonnegative

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone Characterization of the Line Coplexity of Cellular Autoata Generated by Polynoial Transition Rules Bertrand Stone Abstract Cellular autoata are discrete dynaical systes which consist of changing patterns

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Note on generating all subsets of a finite set with disjoint unions

Note on generating all subsets of a finite set with disjoint unions Note on generating all subsets of a finite set with disjoint unions David Ellis e-ail: dce27@ca.ac.uk Subitted: Dec 2, 2008; Accepted: May 12, 2009; Published: May 20, 2009 Matheatics Subject Classification:

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel

Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel 1 Iterative Decoding of LDPC Codes over the q-ary Partial Erasure Channel Rai Cohen, Graduate Student eber, IEEE, and Yuval Cassuto, Senior eber, IEEE arxiv:1510.05311v2 [cs.it] 24 ay 2016 Abstract In

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation

Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Algorithic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Michael Kearns AT&T Labs Research Murray Hill, New Jersey kearns@research.att.co Dana Ron MIT Cabridge, MA danar@theory.lcs.it.edu

More information