arxiv: v4 [cs.lg] 4 Apr 2016

Size: px
Start display at page:

Download "arxiv: v4 [cs.lg] 4 Apr 2016"

Transcription

1 e-publication Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions arxiv:35796v4 cslg 4 Apr 6 Corinna Cortes Google Research, 76 Ninth Avenue, New York, NY Spencer Greenberg Courant Institute, 5 Mercer Street, New York, NY Mehryar Mohri Courant Institute and Google Research 5 Mercer Street, New York, NY Editor: TBD Abstract CORINNA@GOOGLECOM SPENCERG@CIMSNYUEDU MOHRI@CIMSNYUEDU We present an extensive analysis of relative deviation bounds, including detailed proofs of twosided inequalities and their iplications We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assuption that a oent of the loss is bounded These bounds are useful in the analysis of iportance weighting and other learning tasks such as unbounded regression Keywords: Generalization bounds, learning theory, unbounded loss functions Introduction Most generalization bounds in learning theory hold only for bounded loss functions This includes standard VC-diension bounds Vapnik, 998, Radeacher coplexity Koltchinskii and Panchenko, ; Bartlett et al, a; Koltchinskii and Panchenko, ; Bartlett and Mendelson, or local Radeacher coplexity bounds Koltchinskii, 6; Bartlett et al, b, as well as ost other bounds based on other coplexity ters This assuption is typically unrelated to the statistical nature of the proble considered but it is convenient since when the loss functions are uniforly bounded, standard tools such as Hoeffding s inequality Hoeffding, 963; Azua, 967, McDiarid s inequality McDiarid, 989, or Talagrand s concentration inequality Talagrand, 994 apply There are however natural learning probles where the boundedness assuption does not hold This includes unbounded regression tasks where the target labels are not uniforly bounded, and a variety of applications such as saple bias correction Dudík et al, 6; Huang et al, 6; Cortes et al, 8; Sugiyaa et al, 8; Bickel et al, 7, doain adaptation Ben-David et al, 7; Blitzer et al, 8; Daué III and Marcu, 6; Jiang and Zhai, 7; Mansour et al, 9; Cortes and Mohri, 3, or the analysis of boosting Dasgupta and Long, 3, where the iportance weighting technique is used Cortes et al, It is therefore critical to derive learning guarantees that hold for these scenarios and the general case of unbounded loss functions When the class of functions is unbounded, a single function ay take arbitrarily large values with arbitrarily sall probabilities This is probably the ain challenge in deriving unifor conc 3 Corinna Cortes and Spencer Greenberg and Mehryar Mohri

2 CORTES, GREENBERG, AND MOHRI vergence bounds for unbounded losses This proble can be avoided by assuing the existence of an envelope, that is a single non-negative function with a finite expectation lying above the absolute value of the loss of every function in the hypothesis set Dudley, 984; Pollard, 984; Dudley, 987; Pollard, 989; Haussler, 99, an alternative assuption siilar to Hoeffding s inequality based on the expectation of a hyperbolic function, a quantity siilar to the oent-generating function, is used by Meir and Zhang 3 However, in any probles, eg, in the analysis of iportance weighting even for coon distributions, there exists no suitable envelope function Cortes et al, Instead, the second or soe other th-oent of the loss sees to play a critical role in the analysis Thus, instead, we will consider here the assuption that soe th-oent of the loss functions is bounded as in Vapnik 998, 6b This paper presents in detail two-sided generalization bounds for unbounded loss functions under the assuption that soe th-oent of the loss functions, >, is bounded The proof of these bounds akes use of relative deviation generalization bounds in binary classification, which we also prove and discuss in detail Much of the results and aterial we present is not novel and the paper has therefore a survey nature However, our presentation is otivated by the fact that the proofs given in the past for these generalization bounds were either incorrect or incoplete We now discuss in ore detail prior results and proofs One-side relative deviation bounds were first given by Vapnik 998, later iproved by a constant factor by Anthony and Shawe-Taylor 993 These publications and several others have all relied on a lower bound on the probability that a binoial rando variable of trials exceeds its expected value when the bias verifies p > This also later appears in Vapnik 6a and iplicitly in other publications referring to the relative deviations bounds of Vapnik 998 To the best of our knowledge, no actual proof of this inequality was ever given in the past in the achine learning literature before our recent work Greenberg and Mohri, 3 One attept was ade to prove this lea in the context of the analysis of soe generalization bounds Jaeger, 5, but unfortunately that proof is not sufficient to port the general case needed for the proof of the relative deviation bound of Vapnik 998 We present the proof of two-sided relative deviation bounds in detail using the recent results of Greenberg and Mohri 3 The two-sided versions we present, as well as several consequences of these bounds, appear in Anthony and Bartlett 999 However, we could not find a full proof of the two-sided bounds in any prior publication Our presentation shows that the proof of the other side of the inequality is not syetric and cannot be iediately obtained fro that of the first side inequality Additionally, this requires another proof related to the binoial distributions given by Greenberg and Mohri 3 Relative deviation bounds are very inforative guarantees in achine learning of independent interest, regardless of the key role they play in the proof of unbounded loss learning bounds They lead to sharper generalization bounds whose right-hand side is expressed as the interpolation of a O/ ter and a O/ ter that adits as a ultiplier the epirical error or the generalization error In particular, when the epirical error is zero, this leads to faster rate bounds We present in detail the proof of this type of results as well as that of several others of interest Anthony and Bartlett, 999 Let us ention that, in the for presented by Vapnik 998, relative deviation bounds suffer fro a discontinuity at zero zero denoinator, a proble that also affects inequalities for the other side and which sees not to have been rigorously treated by previous work Our proofs and results explicitly deal with this issue We use relative deviations bounds to give the full proofs of two-sided generalization bounds for unbounded losses with finite oents of order, both in the case < and the case >

3 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS One-sided generalization bounds for unbounded loss functions were first given by Vapnik 998, 6b under the sae assuptions and also using relative deviations The one-sided version of our bounds for the case < coincides with that of Vapnik, 998, 6b odulo a constant factor, but the proofs given by Vapnik in both books see to be incorrect The core coponent of our proof is based on a different technique using Hölder s inequality We also present soe ore explicit bounds for the case < by approxiating a coplex ter appearing in these bounds The one-sided version of the bounds for the case > are also due to Vapnik 998, 6b with siilar questions about the proofs In that case as well, we give detailed proofs using the Cauchy-Schwarz inequality in the ost general case where a positive constant is used in the denoinator to avoid the discontinuity at zero These learning bounds can be used directly in the analysis of unbounded loss functions as in the case of iportance weighting Cortes et al, The reainder of this paper is organized as follows In Section, we briefly introduce soe definitions and notation used in the next sections Section 3 presents in detail relative deviation bounds as well as several of their consequences Next, in Section 4 we present generalization bounds for unbounded loss functions under the assuption that the oent of order is bounded first in the case < Section 4, then in the case > Section 4 eliinaries We consider an input space X and an output space Y, which in the particular case of binary classification is Y = {, +} or Y = {, }, or a easurable subset of R in regression We denote by D a distribution over Z = X Y For a saple S of size drawn fro D, we will denote by D the corresponding epirical distribution, that is the distribution corresponding to drawing a point fro S uniforly at rando Throughout this paper, H denotes a hypothesis of functions apping fro X to Y The loss incurred by hypothesis h H at z Z is denoted by Lh, z L is assued to be non-negative, but not necessarily bounded We denote by Lh the expected loss or generalization error of a hypothesis h H and by L S h its epirical loss for a saple S: Lh = E Lh, z LS h = E Lh, z z D z D For any >, we also use the notation L h = E z D L h, z and L h = E z DL h, z for the th oents of the loss When the loss L coincides with the standard zero-one loss used in binary classification, we equivalently use the following notation Rh = E hx y RS h = E hx y z=x,y D z=x,y D In Vapnik, 998p4-6, stateent 537 cannot be derived fro assuption 535, contrary to what is claied by the author, and in general it does not hold: the first integral in 537 is restricted to a sub-doain and is thus saller than the integral of 535 Furtherore, the ain stateent claied in Section 56 is not valid In Vapnik, 6bp-, the author invokes the Lagrange ethod to show the ain inequality, but the proof steps are not atheatically justified Even with our best efforts, we could not justify soe of the steps and strongly believe the proof not to be correct In particular, the way function z is concluded to be equal to one over the first interval is suspicious and not rigorously justified Several of the coents we ade for the case < hold here as well In particular, the author s proof is not based on clear atheatical justifications Soe steps see suspicious and are not convincing, even with our best efforts to justify the 3

4 CORTES, GREENBERG, AND MOHRI We will soeties use the shorthand x to denote a saple of > points x,, x X For any hypothesis set H of functions apping X to Y = {, +} or Y = {, } and saple x, we denote by S H x the nuber of distinct dichotoies generated by H over that saple and by Π H the growth function: S H x = Card {hx,, hx : h H } 3 Π H = 3 Relative deviation bounds ax S Hx x X 4 In this section we prove a series of relative deviation learning bounds which we use in the next section for deriving generalization bounds for unbounded loss functions We will assue throughout the paper, as is coon in uch of learning theory, that each expression of the for is a easurable function, which is not guaranteed when H is not a countable set This assuption holds nevertheless in ost coon applications of achine learning We start with the proof of a syetrization lea Lea originally presented by Vapnik 998, which is used by Anthony and Shawe-Taylor 993 These publications and several others have all relied on a lower bound on the probability that a binoial rando variable of trials exceeds its expected value when the bias verifies p > To our knowledge, no rigorous proof of this fact was ever provided in the literature in the full generality needed The proof of this result was recently given by Greenberg and Mohri 3 Lea Greenberg and Mohri 3 Let X be a rando variable distributed according to the binoial distribution B, p with a positive integer the nuber of trials and p > the probability of success of each trial Then, the following inequality holds: X EX > 4, 5 where EX = p The lower bound is never reached but is approached asyptotically when = as p fro the right Our proof of Lea is ore concise than that of Vapnik 998 Furtherore, our stateent and proof handle the technical proble of discontinuity at zero ignored by previous authors The denoinator ay in general becoe zero, which would lead to an undefined result We resolve this issue by including an arbitrary positive constant τ in the denoinator in ost of our expressions For the proof of the following result, we will use the function F defined over, +, + by F : x, y x y x+y+ By Lea 9, F x, y is increasing in x and decreasing in y Lea Let < Assue that ɛ > Then, for any hypothesis set H and any τ >, the following holds: S D Rh R S h Rh + τ 4 S,S D R S h R S h R S h + R S h + 4

5 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS oof We give a concise version of the proof given by Vapnik, 998 We first show that the following iplication holds for any h H: Rh R S h RS h > Rh F R S h, R S h 6 Rh + τ The first condition can be equivalently rewritten as R S h < Rh ɛrh + τ, which iplies R S h < Rh ɛrh and ɛ < Rh, 7 since R S h Assue that the antecedent of the iplication 6 holds for h H Then, in view of the onotonicity properties of function F Lea 9, we can write: F R S h, R S h F Rh, Rh ɛrh RS h > Rh and st ineq of 7 = Rh Rh ɛrh Rh ɛrh + ɛrh nd ineq of 7 Rh ɛ + Rh = ɛ, ɛ Rh > which proves 6 Now, by definition of the reu, for any η >, there exists h H such that Rh R S h Rh R S h Rh + τ Rh + τ η 8 Using the definition of h and iplication 6, we can write R S h R S h S,S D R S h + R S h + R S h R S h S,S D S,S D = S D R S h + R S h + Rh R S h Rh + τ Rh R S h Rh + τ by def of R S h > Rh iplication 6 R S D S h > Rh independence We now show that this iplies the following inequality R S h R S h S,S D R S h + R S h + 4 S D Rh R S h Rh + τ + η, 9 5

6 CORTES, GREENBERG, AND MOHRI by distinguishing two cases If Rh, since ɛ >, by Theore the inequality S D R S h > Rh > 4 holds, which yields iediately 9 Otherwise we have Rh ɛ Then, by 7, the condition Rh R S h cannot hold for any saple S D Rh +τ Rh which by 8 iplies that the condition R S h + η cannot hold for any saple Rh+τ S D, in which case 9 trivially holds Now, since 9 holds for all η >, we can take the liit η and use the right-continuity of the cuulative distribution to obtain S,S D R S h R S h R S h + R S h + which copletes the proof of Lea 4 S D Rh R S h, Rh + τ Note that the factor of 4 in the stateent of lea can be odestly iproved by changing the condition assued fro ɛ > to ɛ > k for constant values of k > This leads to a slightly better lower bound on S D R S h > Rh, eg 3375 rather than 4 for k =, k at the expense of not covering cases where the nuber of saples is less than For soe values of k, eg k =, covering these cases is not needed for the proof of our ain theore Theore 5 though However, this does not see to siplify the critical task of proving a lower bound on S D R S h > Rh, that is the probability that a binoial rando variable B, p exceeds its expected value when p > k One ight hope that restricting the range of p in this way would help siplify the proof of a lower bound on the probability of a binoial exceeding its expected value Unfortunately, our analysis of this proble and proof Greenberg and Mohri, 3 suggest that this is not the case since the regie where p is sall sees to be the easiest one to analyze for this proble The result of Lea is a one-sided inequality The proof of a siilar result Lea 4 with the roles of Rh and R S h interchanged akes use of the following theore Lea 3 Greenberg and Mohri 3 Let X be a rando variable distributed according to the binoial distribution B, p with a positive integer and p < Then, the following inequality holds: X EX > 4, where EX = p The proof of the following lea Lea 4 is novel 3 While the general strategy of the proof is siilar to that of Lea, there are soe non-trivial differences due to the requireent p < of Theore 3 The proof is not syetric as shown by the details given below Lea 4 Let < Assue that ɛ > Then, for any hypothesis set H and any τ > the following holds: S D R S h Rh R S h + τ 4 S,S D 3 A version of this lea is stated in Boucheron et al, 5, but no proof is given ɛ R S h R S h R S h + R S h + 6

7 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS X p X p p p Figure : These plots depict X EX, the probability that a binoially distributed rando variable X exceeds its expectation, as a function of the trial success probability p The left plot shows only regions satisfying p > whereas the right plot shows only regions satisfying p > Each colored line corresponds to a different nuber of trials, =, 3,, 4 The dashed horizontal line at 4 represents the value of the lower bound used in the proof of lea oof oceeding in a way siilar to the proof of Lea, we first show that the following iplication holds for any h H: R S h Rh Rh R S h F R S h, R S h R S h + τ The first condition can be equivalently rewritten as Rh < R S h ɛ R S h+τ, which iplies Rh < R S h ɛ R S h and ɛ < RS h, since R S h Assue that the antecedent of the iplication holds for h H Then, in view of the onotonicity properties of function F Lea 9, we can write: F R S h, R S h F R S h, Rh Rh R S h F R S h, R S h ɛ R S h st ineq of = R S h R S h ɛ R S h R S h ɛ R S h + ɛrh nd ineq of Rh ɛ + Rh = ɛ, ɛ Rh > 7

8 CORTES, GREENBERG, AND MOHRI which proves For the application of Theore 3 to a hypothesis h, the condition Rh < is required Observe that this is iplied by the assuptions R S h ɛ and ɛ > : Rh < R S h ɛ R S h ɛ ɛ The rest of the proof proceeds nearly identically to that of Lea = ɛ < In the stateents of all the following results, the ter E x D S Hx can be replaced by the upper bound Π H to derive sipler expressions By Sauer s lea Sauer, 97; Vapnik and Chervonenkis, 97, the VC-diension d of the faily H can be further used to bound these quantities since Π H e d d for d The first inequality of the following theore was originally stated and proven by Vapnik 998, 6b, later by Anthony and Shawe-Taylor 993 in the special case = with a soewhat ore favorable constant, in both cases odulo the incoplete proof of the syetrization and the technical issue related to the denoinator taking the value zero, as already pointed out The second inequality of the theore and its proof are novel Our proofs benefit fro the iproved analysis of Anthony and Shawe-Taylor 993 Theore 5 For any hypothesis set H of functions apping a set X to {, }, and any fixed < and τ >, the following two inequalities hold: Rh R S h S D Rh + τ R S h Rh S D R S h + τ 4 ES H x exp 4 ES H x exp + ɛ + ɛ oof We first consider the case where ɛ then write 4 ES H x exp + ɛ, which is not covered by Lea We can 4 ES H x exp >, + for < Thus, the bounds of the theore hold trivially in that case On the other hand, when, we can apply Lea and Lea 4 Therefore, to prove theore 5, it is sufficient to ɛ work with the syetrized expression with our original expressions Rh R S h Rh+τ RS h R S h, rather than working directly R S h+ R S h+ and Rh RS h Rh+τ To upper bound the probability that the syetrized expression is larger than ɛ, we begin by introducing a vector of Radeacher rando variables σ = σ, σ,, σ, where the σ i are independent, identically distributed rando variables each equally likely to take the value + or Using the shorthand 8

9 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS x for x,, x, we can then write S,S D R S h R S h R S h + R S h + = x D = x D,σ = E x D Now, for a fixed x, we have E σ =, thus, by Hoeffding s inequality, we can write i= hx +i hx i i= hx +i + hx i + σ i= σ ihx +i hx i σ i= hx +i + hx i i= σ ihx +i hx i i= hx +i + hx i + i= σ ihx +i hx i i= hx +i + hx i + i= σ ihx +i hx i i= hx +i+hx i + x x i= exp hx +i + hx i + ɛ i= hx +i hx i exp + i= hx +i + hx i ɛ i= hx +i hx i + Since the variables hx i, i,, take values in {, }, we can write hx +i hx i = i= hx +i + hx i hx +i hx i i= hx +i + hx i hx +i + hx i i= i=, where the last inequality holds since and the su is either zero or greater than or equal to one In view of this identity, we can write i= σ ihx +i hx i σ x ɛ exp i= hx +i + hx i We note now that the reu over h H in the left-hand side expression in the stateent of our theore need not be over all hypothesis in H: without changing its value, we can replace H with a saller hypothesis set where only one hypothesis reains for each unique binary vector + 9

10 CORTES, GREENBERG, AND MOHRI hx, hx,, hx The nuber of such hypotheses is S H x, thus, by the union bound, the following holds: σ i= σ ihx +i hx i i= hx +i + hx i x S H x exp + ɛ The result follows by taking expectations with respect to x and applying Lea and Lea 4 respectively Corollary 6 Let < and let H be a hypothesis set of functions apping X to {, } Then, for any δ >, each of the following two inequalities holds with probability at least δ: Rh R S h + Rh Rh R S h + Rh log ES H x + log 4 δ log ES H x + log 4 δ oof The result follows directly fro Theore 5 by setting δ to atch the upper bounds and taking the liit τ For =, the inequalities becoe Rh R S h Rh log ES Hx + log 4 δ R S h Rh 3 Rh log ES Hx + log 4 δ, 4 with the failiar dependency O log/d /d The advantage of these relative deviations is clear For sall values of Rh or Rh these inequalities provide tighter guarantees than standard generalization bounds Solving the corresponding second-degree inequalities in Rh or Rh leads to the following results Corollary 7 Let < and let H be a hypothesis set of functions apping X to {, } Then, for any δ >, each of the following two inequalities holds with probability at least δ: Rh R S h + R S h Rh + R S h log ES Hx + log 4 δ Rh log ES Hx + log 4 δ + 4 log ES Hx + log 4 δ + 4 log ES Hx + log 4 δ

11 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS oof The second-degree inequality corresponding to 3 can be written as Rh Rhu RS h, with u = gives: log ES H x +log 4 δ, and iplies Rh u + u + R S h Squaring both sides Rh u + u + R S h = u + u u + R S h + u + R S h u + u u + R S h + u + R S h = 4u + u R S h + R S h The second inequality can be proven in the sae way fro 4 The learning bounds of the corollary ake clear the presence of two ters: a ter in O/ and a ter in O/ which adits as a factor R S h or Rh and which for sall values of these ters can be ore favorable than standard bounds Theore 5 can also be used to prove the following relative deviation bounds The following theore and its proof assuing the result of Theore 5 were given by Anthony and Bartlett 999 Theore 8 For all < ɛ <, ν >, the following inequalities hold: S D S D Rh R S h Rh + Rh + ν R S h Rh Rh + Rh + ν νɛ 4 ES H x exp ɛ νɛ 4 ES H x exp ɛ oof We prove the first stateent, the proof of the second stateent is identical odulo the perutation of the roles of Rh and R S h To do so, it suffices to deterine ɛ > such that S D Rh R S h Rh + Rh + ν S D Rh R S h, Rh + τ since we can then apply theore 5 with = to bound the right-hand side and take the liit as τ to eliinate the τ-dependence To find such a choice of ɛ, we begin by observing that for any h H, Rh R S h Rh + Rh + ν ɛ Rh + ɛ ɛ R S h + ɛ ν 5 ɛ Assue now that Rh R S h ɛ for soe ɛ >, which is equivalent to Rh R S h + Rh+τ ɛ Rh + τ We will prove that this iplies 5 To show that, we distinguish two cases, Rh + τ µ ɛ and Rh + τ > µ ɛ, with µ > The first case iplies the following: Rh + τ µ ɛ Rh R S h + ɛ µ ɛ Rh R S h + µɛ

12 CORTES, GREENBERG, AND MOHRI The second case Rh + τ > µ ɛ is equivalent to ɛ < Rh+τ µ and iplies ɛ < Rh + τ µ Rh R S h + Rh + τ µ Rh µ µ R S h + τ µ Observe now that since µ µ >, both cases iply Rh µ µ R S h + τ µ + µɛ 6 We now choose ɛ and µ to ake 6 atch 5 by setting µ which gives: µ = + ɛ ɛ = ɛ ν τ ɛ ɛ With these choices, the following inequality holds for all h H: which concludes the proof µ = +ɛ ɛ and Rh R S h Rh + τ ɛ Rh R S h Rh + Rh + ν ɛ, The following corollary was given by Anthony and Bartlett 999 Corollary 9 For all ɛ >, v >, the following inequality holds: S D oof Observe that Rh + v R S h 4 ES H x exp τ µ + µɛ = ɛ ɛ ν, vɛ 4 + v Rh R S h Rh + Rh + ν Rh R S h > Rh + Rh + νɛ Rh > + ɛ ɛ Rh + ɛν ɛ To derive the stateent of the corollary fro that of Theore 8, we identify +ɛ ɛ with + v, which gives ɛ + v = v, that is we choose ɛ = v ɛν +v, and siilarly identify ɛ with ɛ, that is ɛ = v +v ν = v ν, thus we choose ν = v ɛ With these choices of ɛ and ν, the coefficient +v vɛ in the exponential appearing in the bounds of Theore 8 can be rewritten as follows: ɛ v v +v 4v+4 = ɛ +v v v 4v+ = ɛ v 4v+, which concludes the proof ɛ = The result of Corollary 9 is rearkable since it shows that a fast convergence rate of O/ can be achieved provided that we settle for a slightly larger value than the epirical error, one differing by a fixed factor + v The following is an iediate corollary when R S h =, where we take v Corollary For all ɛ >, v >, the following inequality holds: h H : Rh R S h = S D 4 ES H x exp This is the failiar fast rate convergence result for separable cases ɛ 4

13 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS 4 Generalization bounds for unbounded losses In this section we will ake use of the relative deviation bounds given in the previous section to prove generalization bounds for unbounded loss functions under the assuption that the oent of order of the loss is bounded We will start with the case < and then ove on to considering the case when > As already indicated earlier, the one-sided version of the results presented in this section were given by Vapnik 998 with slightly different constants, but the proofs do not see to be correct or coplete The second stateents in all these results other side of the inequality are new Our proofs for both sets of results are new 4 Bounded oent with < Our first theore reduces the proble of deriving a relative deviation bound for an unbounded loss function with L h = E z D Lh, z < + for all h H, to that of relative deviation bound for binary classification To siplify the presentation of the results, in what follows we will use the shorthand Lh, z > t instead of z D Lh, z > t, and siilarly Lh, z > t instead of z DLh, z > t Theore Let <, < ɛ, and < τ < ɛ For any loss function L not necessarily bounded and hypothesis set H such that L h < + for all h H, the following two inequalities hold: Lh L S h L h + τ > Γ, ɛ ɛ Lh L S h L h + τ > Γ, ɛ ɛ,t R,t R with Γ, ɛ = + τ + + τ Lh, z > t Lh, z > t Lh, z > t + τ Lh, z > t Lh, z > t Lh, z > t + τ + log/ɛ, oof We prove the first stateent The second stateent can be shown in a very siilar way Fix < and ɛ > and assue that for any h H and t, the following holds: Lh, z > t Lh, z > t ɛ 7 Lh, z > t + τ We show that this iplies that for any h H, Lh L S h L h+τ Lebesgue integral, we can write and, siilarly, L h = L h = Lh = Lh = E Lh, z = z D E Lh, z = z D L h, z > t dt = 3 Γ, ɛɛ By the properties of the Lh, z > t dt Lh, z > t dt, t Lh, z > t dt

14 CORTES, GREENBERG, AND MOHRI In what follows, we use the notation I = L h + τ Let t = si and t = t ɛ for s > To bound Lh Lh, we siply bound Lh, z > t Lh, z > t by Lh, z > t for large values of t, that is t > t, and use inequality 7 for saller values of t: Lh Lh = t Lh, z > t Lh, z > t dt ɛ Lh, z > t + τ dt + t Lh, z > t dt For relatively sall values of t, Lh, z > t is close to one Thus, we can write Lh Lh = t ɛ t + τ dt + ftgt dt, t ɛ Lh, z > t + τ dt + t Lh, z > tdt with γ I ɛ + τ if t t ft = γ t Lh, z > t + τ ɛ if t < t t γ t Lh, z > t ɛ if t < t gt = γ I γ t Lh,z>t γ t if t t if t < t t ɛ if t < t, where γ, γ are positive paraeters that we shall select later Now, since >, by Hölder s inequality, Lh Lh ft + dt gt dt The first integral on the right-hand side can be bounded as follows: ft dt = t + τγ I ɛ dt + γ ɛ τ t t t dt + γ + τγ I t ɛ + γ ɛ τt t + γ ɛ I γ + τs + γ + s /ɛ τɛ I γ + τs + γ + s τ ɛ I t t Lh, z > tɛ dt 4

15 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS Since t /t = /ɛ, the second one can be coputed and bounded following gt dt = t t t γ dt + γ I = s + γ γ s + γ γ s + γ γ = s + γ γ dt + t + Cobining the bounds obtained for these integrals yields directly Lh Lh γ + τs + γ + s τ ɛ I = γ + τs + γ + s τ s γ s γ + γ Lh, z > t dt t γ ɛ t log ɛ + t Lh, z > t dt t γ ɛ t log ɛ + t Lh, z > t dt t γ ɛ I t log ɛ + γ ɛ s I ɛ log ɛ + s + γ log ɛ + s log ɛ + ɛi s Observe that the expression on the right-hand side can be rewritten as u v ɛi where the vectors u and v are defined by u = γ + τ s, γ + s τ and v = v, v = s γ, γ The inner product u v does not depend on γ log ɛ + s and γ and by the properties of Hölder s inequality can be reached when u and the vector v = v, v are collinear γ and γ can be chosen so that detu, v =, since this condition can be rewritten as s γ + τ log γ ɛ + s + s τ γ =, 8 s γ or equivalently, γ γ Thus, for such values of γ and γ, the following inequality holds: log ɛ + + s τ = 9 s Lh Lh u v ɛi = fs ɛi, 5

16 CORTES, GREENBERG, AND MOHRI with fs = + τ s + + s τ = + τ s + + s τ log ɛ + s log ɛ + s Setting s = yields the stateent of the theore The next corollary follows iediately by upper bounding the right-hand side of the learning bounds of theore using theore 5 It provides learning bounds for unbounded loss functions in ters of the growth functions in the case < Corollary Let ɛ <, <, and < τ < ɛ For any loss function L not necessarily bounded and hypothesis set H such that L h < + for all h H, the following inequalities hold: Lh Lh L h + τ > Γ, ɛɛ 4 ES Q z exp Lh Lh L h + τ > Γ, ɛɛ 4 ES Q z exp + ɛ + ɛ, where Q is the set of functions Q = {z Lh,z>t h H, t R}, and Γ, ɛ = + τ + + τ + log/ɛ The following corollary gives the explicit result for = Corollary 3 Let ɛ < and < τ < ɛ 4 For any loss function L not necessarily bounded and hypothesis set H such that L h < + for all h H, the following inequalities hold: with Γ, ɛ = +τ h H, t R} + Lh Lh L h + τ > Γ, ɛɛ 4 ES Q z exp Lh Lh L h + τ > Γ, ɛɛ 4 ES Q z exp ɛ 4 ɛ + 4 τ + log ɛ and Q the set of functions Q = {z Lh,z>t 4, Corollary 4 Let L be a loss function not necessarily bounded and H a hypothesis set such that L h < + for all h H Then, for any δ >, with probability at least δ, each of the 6

17 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS following inequalities holds for all h H: Lh L S h + L h log ES Q z + log δ Γ, log ES Q z L S h Lh + L + log δ h Γ, log ES Q z + log δ log ES Q z + log δ, where Q is the set of functions Q = {z Lh,z>t h H, t R} and Γ, ɛ = + + log ɛ oof For any ɛ >, let fɛ = Γ, ɛɛ Then, by Corollary 3, Lh Lh f L h + τ 4 ES Q z ɛ exp 4 Setting the right-hand side to ɛ and using inversion yields iediately the first inequality The second inequality is proven in the sae way Observe that, odulo the factors in Γ, the bounds of the corollary adit the standard / dependency and that the factors in Γ are only logarithic in 4 Bounded oent with > This section gives two-sided generalization bounds for unbounded losses with finite oents of order, with > As for the case < <, the one-sided version of our bounds coincides with that of Vapnik 998, 6b odulo a constant factor, but, here again, the proofs given by Vapnik in both books see to be incorrect oposition 5 Let > For any loss function L not necessarily bounded and hypothesis set H such that < L h < + for all h H, the following two inequalities hold: Lh, z > tdt Ψ L h and Lh, z > tdt Ψ L h, where Ψ = oof We prove the first inequality The second can be proven in a very siilar way Fix > and h H As in the proof of Theore, we bound Lh, z > t by for t close to, say t t for soe t > that we shall later deterine We can write t Lh, z > tdt dt + with functions f and g defined as follows: ft = {γi if t t t Lh, z > t if t < t t Lh, z > tdt = 7 gt = γi t ftgtdt, if t t if t < t,

18 CORTES, GREENBERG, AND MOHRI where I = L h and where γ is a positive paraeter that we shall select later By the Cauchy- Schwarz inequality, Thus, we can write Lh, z > tdt Lh, z > tdt ft + dt gt dt γ I t + t γ I t + I t t Lh, z > tdt γ I t γ I + t + dt t t Introducing t with t = I / t leads to Lh, z > tdt γ I t + I γ t + We now seek to iniize the expression γ t + t γ I t γ + t + t I I t +, first as a function of γ t γ This expression can be viewed as the product of the nors of the vectors u = γt, and v = t γ,, with a constant inner product not depending on γ Thus, by the properties of t the Cauchy-Schwarz inequality, it is iniized for collinear vectors and in that case equals their inner product: u v = t + t Differentiating this last expression with respect to t and setting the result to zero gives the iniizing value of t : = For that value of t, u v = which concludes the proof + t = =, 8

19 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS Theore 6 Let >, < ɛ, and < τ ɛ Then, for any loss function L not necessarily bounded and hypothesis set H such that L h < + and L h < + for all h H, the following two inequalities hold: Lh Lh L h + τ > Λɛ,t R Lh, z > t Lh, z > t Lh, z > t + τ Lh Lh L h + τ > Λɛ,t R Lh, z > t Lh, z > t Lh, z > t + τ, where Λ = + τ oof We prove the first stateent since the second one can be proven in a very siilar way Lh,z>t Lh,z>t Assue that h,t ɛ Fix h H, let J = Lh, z > t dt Lh,z>t+τ and ν = L h By Markov s inequality, for any t >, Lh, z > t = L h, z > t L h t = ν t Using this inequality, for any t >, we can write Lh Lh = = ɛ ɛ t t t Lh, z > t Lh, z > t dt Lh, z > t Lh, z > t dt + Lh, z > t + τ dt + Lh, z > t dt Lh, z > t + τ dt + ɛj + ɛ ν τt + t t t t ν t dt Choosing t to iniize the right-hand side yields t = ν ɛ and gives τ Lh Lh ɛj + ν ɛ τ Lh, z > t Lh, z > t dt Since τ ɛ, ɛ τ = ɛτ oposition 5, the following holds: τ ɛɛ τ = ɛτ Thus, by Lh Lh L h + τ ɛψ ν ν + τ + ɛτ ν ν + τ ɛψ + ɛτ, which concludes the proof Cobining Theore 6 with Theore 5 leads iediately to the following two results 9

20 CORTES, GREENBERG, AND MOHRI Corollary 7 Let >, < ɛ, and < τ ɛ Then, for any loss function L not necessarily bounded and hypothesis set H such that L h < + and L h < + for all h H, the following two inequalities hold: Lh Lh ɛ L h + τ > Λɛ 4 ES Q z exp 4 Lh Lh ɛ > Λɛ 4 ES Q z exp, 4 L h + τ where Λ = Lh,z>t h H, t R} + τ and where Q is the set of functions Q = {z In the following result, PdiG denotes the pseudo-diension of a faily of real-valued functions G Pollard, 984, 989; Vapnik, 998, which coincides with the VC-diension of the corresponding thresholded functions: PdiG = VCdi {x, t gx t> : g G } Corollary 8 Let >, < ɛ Let L be a loss function not necessarily bounded and H a hypothesis set such that L h < + for all h H, and d = Pdi{z Lh, z h H} < + Then, for any δ >, with probability at least δ, each of the following inequalities holds for all h H: Lh Lh + Λ d log e d + log 4 δ L h d log Lh Lh + Λ L e d + log 4 δ h where Λ = 5 Conclusion We presented a series of results for relative deviation bounds used to prove generalization bounds for unbounded loss functions These learning bounds can be used in a variety of applications to deal with the ore general unbounded case The relative deviation bounds are of independent interest and can be further used for a sharper analysis of guarantees in binary classification and other tasks References M Anthony and J Shawe-Taylor A result of Vapnik with applications Discrete Applied Matheatics, 47:7 7, 993 Martin Anthony and Peter L Bartlett Neural Network Learning: Theoretical Foundations Cabridge University ess, 999

21 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS Kazuoki Azua Weighted sus of certain dependent rando variables Tohoku Matheatical Journal, 93: , 967 Peter L Bartlett and Shahar Mendelson Radeacher and Gaussian coplexities: Risk bounds and structural results Journal of Machine Learning Research, 3, Peter L Bartlett, Stéphane Boucheron, and Gábor Lugosi Model selection and error estiation Machine Learning, 48:85 3, Septeber a Peter L Bartlett, Olivier Bousquet, and Shahar Mendelson Localized Radeacher coplexities In COLT, volue 375, pages Springer-Verlag, b S Ben-David, J Blitzer, K Craer, and F Pereira Analysis of representations for doain adaptation NIPS, 7 Steffen Bickel, Michael Brückner, and Tobias Scheffer Discriinative learning for differing training and test distributions In ICML, pages 8 88, 7 J Blitzer, K Craer, A Kulesza, F Pereira, and J Wortan Learning bounds for doain adaptation NIPS 7, 8 Stéphane Boucheron, Olivier Bousquet, and Gábor Lugosi Theory of classification: a survey of recent advances ESAIM: obability and Statistics, 9:33 375, 5 Corinna Cortes and Mehryar Mohri Doain adaptation and saple bias correction theory and algorith for regression Theoretical Coputer Science, 9474, 3 Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostaizadeh Saple selection bias correction theory In ALT, 8 Corinna Cortes, Yishay Mansour, and Mehryar Mohri Learning bounds for iportance weighting In Advances in Neural Inforation ocessing Systes NIPS, Vancouver, Canada, MIT ess Sanjoy Dasgupta and Philip M Long Boosting with diverse base classifiers In COLT, 3 Hal Daué III and Daniel Marcu Doain adaptation for statistical classifiers Journal of Artificial Intelligence Research, 6: 6, 6 Miroslav Dudík, Robert E Schapire, and Steven J Phillips Correcting saple selection bias in axiu entropy density estiation In NIPS, 6 R M Dudley A course on epirical processes Lecture Notes in Matheatics, 97: 4, 984 R M Dudley Universal Donsker classes and etric entropy Annals of obability, 44:36 36, 987 S Greenberg and M Mohri Tight lower bound on the probability of a binoial exceeding its expectation Technical Report 3-957, Courant Institute, New York, New York, 3 David Haussler Decision theoretic generalizations of the PAC odel for neural net and other learning applications Inf Coput, :78 5, 99

22 CORTES, GREENBERG, AND MOHRI Wassily Hoeffding obability inequalities for sus of bounded rando variables Journal of the Aerican Statistical Association, 583:3 3, 963 Jiayuan Huang, Alexander J Sola, Arthur Gretton, Karsten M Borgwardt, and Bernhard Schölkopf Correcting saple selection bias by unlabeled data In NIPS, volue 9, pages 6 68, 6 Savina Andonova Jaeger Generalization bounds and coplexities based on sparsity and clustering for convex cobinations of functions fro rando classes Journal of Machine Learning Research, 6:37 34, 5 Jing Jiang and ChengXiang Zhai Instance Weighting for Doain Adaptation in NLP In ACL, 7 V Koltchinskii Local radeacher coplexities and oracle inequalities in risk iniization Annals of Statistics, 346, 6 Vladiir Koltchinskii and Ditry Panchenko Radeacher processes and bounding the risk of function learning In High Diensional obability II, pages Birkhäuser, Vladir Koltchinskii and Ditry Panchenko Epirical argin distributions and bounding the generalization error of cobined classifiers Annals of Statistics, 3, Yishay Mansour, Mehryar Mohri, and Afshin Rostaizadeh Doain adaptation: Learning bounds and algoriths In COLT, 9 Colin McDiarid On the ethod of bounded differences Surveys in Cobinatorics, 4: 48 88, 989 Ron Meir and Tong Zhang Generalization Error Bounds for Bayesian Mixture Algoriths Journal of Machine Learning Research, 4:839 86, 3 David Pollard Convergence of Stochastic ocessess Springer, New York, 984 David Pollard Asyptotics via epirical processes Statistical Science, 44:34 366, 989 Norbert Sauer On the density of failies of sets Journal of Cobinatorial Theory, Series A, 3 :45 47, 97 M Sugiyaa, S Nakajia, H Kashia, P von Bünau, and M Kawanabe Direct iportance estiation with odel selection and its application to covariate shift adaptation In NIPS, 8 Michael Talagrand Sharper bounds for gaussian and epirical processes Annals of obability, :8 76, 994 Vladiir N Vapnik Statistical Learning Theory John Wiley & Sons, 998 Vladiir N Vapnik Estiation of Dependences Based on Epirical Data Springer-Verlag, 6a Vladiir N Vapnik Estiation of Dependences Based on Epirical Data, second edition Springer, Berlin, 6b

23 RELATIVE DEVIATION AND GENERALIZATION WITH UNBOUNDED LOSS FUNCTIONS Vladiir N Vapnik and Alexey Chervonenkis On the unifor convergence of relative frequencies of events to their probabilities Theory of obability and Its Applications, 6:64, 97 Appendix A Leas in port of Section 3 Lea 9 Let < and for any η >, let f :, +, + R be the function defined by f : x, y x y Then, f is a strictly increasing function of x and a strictly x+y+η decreasing function of y oof f is differentiable over its doain of definition and for all x, y, +, +, f x f y x + y + η x, y = x + y + η x, y = x y x + y + η x y x + y + η x + y + η x + y + η = x + + y + η > x + y + η + = + x + y + η x + y + η + < 3

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Corinna Cortes Spencer Greenberg Mehryar Mohri January, 9 Abstract We present an extensive analysis of relative deviation

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Learning Bounds for Importance Weighting

Learning Bounds for Importance Weighting Learning Bounds for Iportance Weighting Corinna Cortes Google Research New York, NY 00 corinna@googleco Yishay Mansour Tel-Aviv University Tel-Aviv 69978, Israel ansour@tauacil Mehryar Mohri Courant Institute

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

arxiv: v2 [stat.ml] 23 Feb 2016

arxiv: v2 [stat.ml] 23 Feb 2016 Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute

More information

A theory of learning from different domains

A theory of learning from different domains DOI 10.1007/s10994-009-5152-4 A theory of learning fro different doains Shai Ben-David John Blitzer Koby Craer Alex Kulesza Fernando Pereira Jennifer Wortan Vaughan Received: 28 February 2009 / Revised:

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Structured Prediction Theory Based on Factor Graph Complexity

Structured Prediction Theory Based on Factor Graph Complexity Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Deep Boosting. Abstract. 1. Introduction

Deep Boosting. Abstract. 1. Introduction Corinna Cortes Google Research, 8th Avenue, New York, NY Mehryar Mohri Courant Institute and Google Research, 25 Mercer Street, New York, NY 2 Uar Syed Google Research, 8th Avenue, New York, NY Abstract

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Accuracy at the Top. Abstract

Accuracy at the Top. Abstract Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Learning Kernels -Tutorial Part III: Theoretical Guarantees.

Learning Kernels -Tutorial Part III: Theoretical Guarantees. Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Prediction by random-walk perturbation

Prediction by random-walk perturbation Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra lucdevroye@gail.co gabor.lugosi@gail.co

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Rademacher Complexity Bounds for Non-I.I.D. Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department

More information

a a a a a a a m a b a b

a a a a a a a m a b a b Algebra / Trig Final Exa Study Guide (Fall Seester) Moncada/Dunphy Inforation About the Final Exa The final exa is cuulative, covering Appendix A (A.1-A.5) and Chapter 1. All probles will be ultiple choice

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

lecture 36: Linear Multistep Mehods: Zero Stability

lecture 36: Linear Multistep Mehods: Zero Stability 95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 5 Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient coputation of inner products

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

12 Towards hydrodynamic equations J Nonlinear Dynamics II: Continuum Systems Lecture 12 Spring 2015

12 Towards hydrodynamic equations J Nonlinear Dynamics II: Continuum Systems Lecture 12 Spring 2015 18.354J Nonlinear Dynaics II: Continuu Systes Lecture 12 Spring 2015 12 Towards hydrodynaic equations The previous classes focussed on the continuu description of static (tie-independent) elastic systes.

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees

Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Saple Guarantees Jean Honorio CSAIL, MIT Cabridge, MA 0239, USA jhonorio@csail.it.edu Toi Jaakkola CSAIL, MIT Cabridge, MA

More information

ORIE 6340: Mathematics of Data Science

ORIE 6340: Mathematics of Data Science ORIE 6340: Matheatics of Data Science Daek Davis Contents 1 Estiation in High Diensions 1 1.1 Tools for understanding high-diensional sets................. 3 1.1.1 Concentration of volue in high-diensions...............

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes Graphical Models in Local, Asyetric Multi-Agent Markov Decision Processes Ditri Dolgov and Edund Durfee Departent of Electrical Engineering and Coputer Science University of Michigan Ann Arbor, MI 48109

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information