Learning Bounds for Importance Weighting

Size: px
Start display at page:

Download "Learning Bounds for Importance Weighting"

Transcription

1 Learning Bounds for Iportance Weighting Corinna Cortes Google Research New York, NY 00 Yishay Mansour Tel-Aviv University Tel-Aviv 69978, Israel Mehryar Mohri Courant Institute and Google New York, NY 002 Abstract This paper presents an analysis of iportance weighting for learning fro finite saples and gives a series of theoretical and algorithic results We point out siple cases where iportance weighting can fail, which suggests the need for an analysis of the properties of this technique We then give both upper and lower bounds for generalization with bounded iportance weights and, ore significantly, give learning guarantees for the ore coon case of unbounded iportance weights under the weak assuption that the second oent is bounded, aconditionrelatedtotherényidivergenceofthetrainingand test distributions These results are based on a series of novel and general boundswederiveforunbounded loss functions, which are of independent interest We use these bounds to guide the definition of an alternative reweighting algorith andreporttheresults of eperients deonstrating its benefits Finally, we analyze the properties of noralized iportance weights which are also coonly used Introduction In real-world applications of achine learning, often the sapling of the training and test instances ay differ, which results in a isatch between the two distributions For eaple, in web search applications, there ay be data regarding users who clicked on soe advertiseent link but little or no inforation about other users Siilarly, in credit default analyses, there is typically soe inforation available about the credit defaults of custoers who were granted credit, but no such inforation is at hand about rejected costuers In other probles such as adaptation, the training data available is drawn fro a source doain different fro the target doain These issues of biased sapling or adaptation have been long recognized and studied in the statistics literature There is also a large body of literature dealing with different techniques for saple bias correction, 29, 6, 8, 25, 6] or doain adaptation 3, 7, 9, 0, 7] in the recent achine learning and natural language processing literature Acoontechniqueusedinseveralofthesepublicationsforcorrecting the bias or discrepancy is based on the so-called iportance weighting technique This consists of weighting the cost of errors on training instances to ephasize the error on soe or de-ephasize it on others, with the objective of correcting the isatch between the distributions of training and test points, as in saple bias correction, adaptation, and other related contets such as active learning 24, 4, 8, 9, 5] Different definitions have been adopted for these weights A coon definition of the weight for point is w()=p ()/() where P is the target or test distribution and is the distribution according to which training points are drawn A favorable property of thisdefinition,whichisnothardtoverify, is that it leads to unbiased estiates of the generalization error 8] This paper presents an analysis of iportanceweighting for learning fro finite saples Our study was originally otivated by the observation that, while this corrective technique sees natural, in soe cases in practice it does not succeed An eaple in diension two is illustrated by Figure The target distribution P is the even iture of two Gaussians centered at (0, 0) and (0, 2) both with

2 4 3 2 σ P σ P Ratio σ σ = 03 P Ratio σ σ = 075 P 0 σ σ Figure : aple of iportance weighting Left figure: P (in blue) and (in red) are even itures of Gaussians The labels are positive within the unit sphere centered at the origin (in grey), negative elsewhere The hypothesis class is that of hyperplanes tangent to the unit sphere Right figures: plots of test error vs training saple size using iportance weighting for two different values of the ratio σ /σ P Theresultsindicateeanvaluesoftheerrorover40runs± one standard deviation standard deviation σ P,whilethesourcedistribution is the even iture of two Gaussians centered at (0, 0) and (2, 0) but with standard deviation σ The hypothesis class is that of hyperplanes tangent to the unit sphere The best classifier is selected by epirical risk iniization As shown in Figure, for σ P /σ =3,theerrorofthehypothesislearnedusingiportanceweighting is close to 50% even for a training saple of 5,000 points and the standard deviation of the error is quite high In contrast, for σ P /σ =75,convergenceoccursrelativelyrapidlyandlearningissuccessful In Section 4, we discuss other eaples where iportance weighting does not succeed The proble just described is not liited to isolated eaples Siilar observations have been ade in the past in both the statistics and learning literature, ore recently in the contet of the analysis of boosting by 9] who suggest that iportance weighting ustbeusedwithcareandhighlightthe need for convergence bounds and learning guarantees for thistechnique We study the theoretical properties of iportance weighting We show using standard generalization bounds that iportance weighting can succeed when the weights are bounded However, this condition often does not hold in practice We also show that, rearkably, convergence guarantees can be given even for unbounded weights under the weak assuption that the second oent of the weights is bounded, a condition that relates to the Rényi divergence of P and We further etend these bounds to guarantees for other possible reweightings These results suggest iniizing a biasvariance tradeoff that we discuss and that leads to several algorithic ideas We eplore in detail an algorith based on these ideas and report the results of eperients deonstrating its benefits Throughout this paper, we consider the case where the weight function w is known When it is not, it is typically estiated fro finite saples The effect ofthisestiationerrorisspecifically analyzed by 8] This setting is closely related to the proble of iportance sapling in statistics which is that of estiating the epectation of a rando variable according to P while using a saple drawn according to, withw given 8] Here, we are concerned with the effect of the weights on learning fro finite saples A different setting is when further full access to is assued, von Neuann s rejection sapling technique 28] can then be used We note however that it requires w to be bounded by soe constant M, whichisoftennotguaranteedandisthesiplestcaseofour bounds ven then, the ethod is wasteful as it requires on average M saples to obtain one point The reainder of this paper is structured as follows Section2introducesthedefinitionoftheRényi divergences and gives soe basic properties of the iportance weights In Section 3, we give generalization bounds for iportance weighting in the bounded case We also present a general lower bound indicating the key role played by the Rényi divergenceofp and in this contet Section 4 deals with the ore frequent case of unbounded w Standard generalization bounds do not apply here since the loss function is unbounded We give novel generalization bounds for unbounded loss functions under the assuption that the second oent is bounded (see Appendi) and use the to derive learning guarantees for iportance weighting in this oregeneralsetting InSection5,we discuss an algorith inspired by these guarantees for which we report preliinary eperiental results We also discuss why the coonly used reedy of truncating or capping iportance weights ay not always provide the desired effect of iproved perforance Finally, in Section 6, we study 2

3 the properties of an alternative reweighting also coonly used which is based on noralized iportance weights, and discuss its relationship with the (unnoralized) weights w 2 Preliinaries Let X denote the input space, Y the label set, and let L: Y Y 0, ] be a loss function We denote by P the target distribution and by the source distribution according to which training points are drawn We also denote by H the hypothesis set used by the learning algorith and by f : X Y the target labeling function 2 Rényi divergences Our analysis akes use of the notion of Rényi divergence, an inforation theoretical easure of the difference between two distributions directly relevanttothestudyofiportanceweightingfor α 0,theRényidivergenceD α (P ) between distributions P and is defined by 23] D α (P ) = α log 2 P () ( P () () ) α () The Rényi divergence is a non-negative quantity and for any α>0, D α (P )=0iff P = For α =,itcoincideswiththerelativeentropywedenotebyd α (P ) the eponential in base 2 of the Rényi divergence D α (P ): d α (P ) =2 Dα(P ) P α ] α () = (2) α () 22 Iportance weights The iportance weight fordistributions P and is defined by w() =P ()/() In the following, the epectations are taken with respect to Lea The following identities hold for the epectation, second oent, and variance of w: w] = w 2 ]=d 2 (P ) σ 2 (w) =d 2 (P ) (3) Proof The first equality is iediate The second oent of w can be epressed as follows in ters of the Rényi divergence: w 2 ]= w 2 () () = ( ) 2 P () () = ( ) P () P () = d 2 (P ) () () X X X Thus, the variance of w is given by σ 2 (w) = w 2 ] w] 2 = d 2 (P ) For any hypothesis h H, wedenotebyr(h) its loss and by R w (h) its weighted epirical loss: R(h) = L(h(),f())] R w (h) = w( i ) L(h( i ),f( i )) P We shall use the abbreviated notation L h () for L(h(),f()), intheabsenceofanyabiguity about the target function f Notethattheunnoralizediportanceweightingofthelossisunbiased: w()l h ()] = P () () L h() () = P ()L h () =R(h) The following lea gives a bound on the second oent Lea 2 For all α>0 and X, thesecondoentoftheiportanceweightedlosscanbe bounded as follows: w2 () L 2 h ()] d α(p ) R(h) α (4) For α =,thisbecoesr(h) 2 w 2 () L 2 h ()] d 2(P ) i= 3

4 Proof Thesecondoentcanbeboundedasfollows: w2 () L 2 h ()] = ] 2 P () () L 2 h () () = P () P () α () ] α ] P () α P () P () L 2α α h () () ] α = d α (P ) P () L h ()L α α α () h ] P () α α L 2 h () ] α α d α (P ) R(h) α B α = dα (P ) R(h) α 3 Learning Guarantees -Bounded Case (Hölder s inequality) P () Note that sup w()=sup () =d (P ) Wefirsteainethecased (P )< and use the notation M =d (P ) ThefollowingpropositionfollowsthendirectlyHoeffding s inequality Proposition (single hypothesis) Fi h H Foranyδ>0,withprobabilityatleast δ, R(h) R log(2/δ) w (h) M 2 The upper bound M, thoughfinite,canbequitelarge Thefollowingtheoreprovides a ore favorable bound as a function of the ratio M/ when any of the oents of w, d α (P ), is finite, which is the case when d (P ) < since the Rényi divergence is a non-decreasing function of α 23, 2], in particular: α >0, d α (P ) d (P ) (5) Theore (single hypothesis) Fi h H Then,foranyα, foranyδ>0, withprobabilityat least δ,thefollowingboundholdsfortheiportanceweightingethod: R(h) R w (h) 2M log δ 3 2 d α (P ) R(h) α R(h) 2] log δ (6) For α =after further siplification, this gives R(h) R w (h) 2M log δ 3 2d 2(P )log δ Proof LetZ denote the rando variable w() L h ()R(h) Then, Z M By lea 2, the variance of the rando variable Z can be bounded in ters of the Rényi divergence d α (P ): σ 2 (Z) = w 2 () L h () 2 ] R(h) 2 d α (P ) R(h) α R(h) 2 Thus, by Bernstein s inequality 4], it follows that: ( PrR(h) R w (h) >ɛ] ep ɛ 2 /2 σ 2 (Z)ɛM/3 Setting δ to atch this upper bound shows that with probability at least δ, thefollowingbound holds for the iportance weighting ethod: R(h) R w (h) M log δ 3 M 2 log 2 δ 9 2 2σ2 (Z)log δ Using the sub-additivity of leads to the sipler epression R(h) R w (h) 2M log δ 3 ) 2σ 2 (Z)log δ These results can be straightforwardly etended to general hypothesis sets In particular, for a finite hypothesis set and for α =,theapplicationoftheunionboundyieldsthefollowingresult 4

5 Theore 2 (finite hypothesis set) Let H be a finite hypothesis set Then, for any δ>0, with probability at least δ,thefollowingboundholdsfortheiportanceweightingethod: R(h) R w (h) 2M(log H log δ ) 2d 2 (P )(log H log δ ) (7) 3 For infinite hypothesis sets, a siilar result can be shown straightforwardlyusing coveringnubers instead of H or a related easure based on saples of size 20] In the following proposition, we give a lower bound that further ephasizes the role of the Rényi divergence of the second order in the convergence of iportance weighting in the bounded case Proposition 2 (Lower bound) Assue that M< and σ 2 (w)/m 2 / Assue that H contains a hypothesis h 0 such that L h0 ()=for all Then,thereeistsanabsoluteconstantc, c=2/4 2,suchthat Pr sup R(h) R w (h) ] d2 (P ) c>0 (8) h H 4 Proof Let σ H =sup h H σ(wl h )Ifforall X, L h0 ()=,thenσ 2 (wl h0 )=d 2 (P ) = σ 2 (w)=σh 2 Theresultthenfollowsageneraltheore,Theore9provenin the Appendi 4 Learning Guarantees -Unbounded Case The condition d (P ) < assued in the previous section does not always hold, even in soe natural cases, as illustrated by the following eaples 4 aples Assue that P and both follow a Gaussian distribution with the standard deviations σ P and σ and with eans µ and µ : ] ( µ)2 P () = ep 2πσP 2σP 2 () = ep ( µ ) 2 ] 2πσ 2σ 2 ] In that case, P () () = σ σ P ep σ2 (µ)2 σ 2 P (µ ) 2,thus,evenforσ 2σP 2 P = σ and µ µ the σ2 P () iportance weights are unbounded, d (P ) =sup () =, andtheboundoftheore is not inforative The Rényi divergence of the second orderisgivenby: d 2 (P ) = σ ep σ2 ( µ)2 σp 2 ( µ ) 2 σ P = σ σp 2 2π 2σ 2 P σ2 ] P ()d ep 2σ2 ( µ)2 σp 2 ( µ ) 2 2σ 2 P σ2 ] d That is, for σ > 2 2 σ P the variance of the iportance weights is bounded By the additivity property of the Rényi divergence, a siilar situation holdsfortheproductandsusofsuchgaussian distributions Hence, in the rightost eaple of Figure, the iportance weights are unbounded, but their second oent is bounded In the net section we provide learning guarantees even for this setting in agreeent with the results observed For σ =03σ P,thesaefavorableguarantees do not hold, and, as illustrated in Figure, learning is significantly ore difficult This eaple of Gaussians can further illustrate what can go wrong in iportance weighting Assue that µ = µ =0, σ =and σ P =0Onecouldhaveepectedthistobeaneasycasefor iportance weighting since sapling fro provides useful inforation about P The proble is, however, that a saple fro will contain a very sall nuber of points far fro the ean (of either negative or positive label) and that these points will be assigned very large weights For asapleofsize and σ =,theepectedvalueofanetreepointis 2log o() and its 5

6 weight will be in the order of /σ2 P /σ2 = 099 Therefore,afewetreepointswilldoinate all other weights and necessarily have a huge influence on theselectionofahypothesisbythe learning algorith Another related eaple is when σ = σ P =and µ =0Letµ 0 depend on the saple size If µ is large enough copared to log(), then,with high probability,all the weights will be negligible This is especially probleatic, since the estiate of the probability of any event would be negligible (in fact both an event and its copleent) If we noralizetheweights,theissue is overcoe, but then, with high probability, the aiu weight doinates the su of all other weights, reverting the situation back to that of the previouseaple 42 Iportance weighting learning bounds - unbounded case As in these eaples, in practice, the iportance weights are typicallynotboundedhowever,we shall show that, rearkably, under the weak assuption that the second oent of the weights w, d 2 (P ), isbounded,generalizationboundscanbegivenforthiscaseaswell Thefollowing result relies on a general learning bound for unbounded loss functions proven in the Appendi (Corollary ) We denote by Pdi(U) the pseudo-diension of a real-valued function class U 2] Theore 3 Let H be a hypothesis set such that Pdi({L h (): h H}) =p< Assuethat d 2 (P ) < and w() 0for all Then,foranyδ>0, withprobabilityatleast δ, the following holds: R(h) R w (h)2 5/4 d 2 (P ) 3 p log 2e 8 p log 4 δ Proof Since d 2 (P ) <, thesecondoentofw()l h () is finite and upper bounded by d 2 (P ) (Lea 2) Thus, by Corollary, we can write R(h) Pr sup R ] ( w (h) >ɛ 4ep p log 2e ) ɛ8/3, h H d2 (P ) p 4 5/3 where p is the pseudo-diension of the function class H = {w()l h (): h H} Wenowshow that p =Pdi({L h (): h H}) LetH denote {L h (): h H} LetA = {,, k } be a set shattered by H Then,thereeistrealnubersr,,r k such that for any subset B A there eists h H such that i B, w( i )L h ( i ) r i i A B, w( i )L h ( i ) <r i (9) Since by assuption w( i )>0 for all i,k],thisipliesthat i B, L h ( i ) r i /w( i ) i A B, L h ( i ) <r i /w( i ) (0) Thus, H shatters A with the witnesses s i = r i /w( i ), i,k] Usingthesaeobservations,itis straightforward to see that conversely, any set shattered by H is shattered by H The convergence rate of the bound is slightly weaker (O( 3/8 ))thanintheboundedcase (O( /2 )) A faster convergence can be obtained however using the orepreciseboundoftheore 8 at the epense of readability The Rényi divergence d 2 (P ) sees to play a critical role in the bound and thus in the convergence of iportance weighting in the unbounded case 5 Alternativereweighting algoriths The previous analysis can be generalized to the case of an arbitrary positive function u: X R, u>0 Let R u (h)= i= u( i)l h ( i ) and let denote the epirical distribution Theore 4 Let H be a hypothesis set such that Pdi({L h (): h H})=p< Assuethat 0 < u 2 ()] < and u() 0for all Then,foranyδ>0, withprobabilityatleast δ, the following holds: R(h) R u (h) w() u()]lh () ] 2 5/4 a ( u 2 ()L 2 h ()], b u 2 ()L 2 h ()] ) 38 p log 2e p log 4 δ 6

7 Unweighted, Ratio σp σ = 075 Iportance, Ratio σ P σ = 075 uantile, Ratio σp σ = 075 Capped %, Ratio σp σ = Figure 2: Coparison of the convergence of 4 different algoriths for the learning task of Figure : learning with equal weights for all eaples (Unweighted), Iportance weighting, using uantiles to paraeterize the function u, andcappingthelargestweights Proof Since R(h) =w()l h ()], wecanwrite R(h) R u (h) = w() u()]lh () ] u()l h ()] R u (h), and thus R(h) R u (h) w() u()]lh () ] u()lh ()] R u (h) By Corollary 2 applied to the function ul h, u()l h ()] R u (h) can be bounded by p log 2e log 4 δ 2 5/4 a( u 2 ()L 2 h ()], b u 2 ()L 2 h ()]) 3 8 p with probability δ, with p =Pdi({L h (): h H}) by a proof siilar to that of Theore 3 The theore suggests that other functions u than w can be used to reweight the cost of an error on each training point by iniizing the upper bound, which is atrade-offbetweenthebiaster (w()u())l h ()] and the second oent a ( u 2 ()L 2 h ()], b u 2 ()L 2 h ()]), where the coefficients are eplicitly given Function u can be selected fro different failies Using an upper bound on these quantities that is independent of h and a ultiplicative bound of the for a ( u 2 ], b u 2 ] ) u 2 ] ( O(/ ) ), leads to the following optiization proble: in ] w() u() γ u 2 ], () u U where γ>0 is a paraeter controlling the trade-off between bias and variance iniization and where U is a faily of possible weight functions out of which u is selected Here, we consider a faily of functions U paraeterized by the quantiles q of the weight function w Afunctionu q U is then defined as follows: within each quantile, the value taken by u q is the average of w over that quantile For sall values of γ,thebiasterdoinates,andveryfine-grained quantiles iniize the bound of equation () For large values of γ the variance ter doinates and the bound is iniized by using just one quantile, corresponding to an even weighting of the training eaples Hence by varying γ fro sall to large values, the algorith interpolates between standard iportance weighting with just one eapleperquantile,andunweightedlearning where all eaples are given the sae weight Figure 2 also shows the results of eperients for the learning task of Figure using the algorith defined by () with this faily of functions The optial q is deterined by 0-fold cross-validation We see that a ore rapid convergence can be obtained by using these weights copared to the standard iportance weights w Another natural faily of functions is that of thresholded versions of the iportance weights {u θ : θ>0, X, u θ ()=in(w(),θ)} Infact,inpractice,usersoftencapiportanceweights by choosing an arbitrary value θ Theadvantageofthisfailyisthat,bydefinition,theweights are 7

8 bounded However, in soe cases, larger weights could be critical to achieve a better perforance Figure 2 illustrates the perforance of this approach Copared to iportance weighting, no change in perforance is observed until the largest % of the weights are capped, in which case we only observe a perforance degradation We epect the thresholding to be less beneficial when the large weights reflect the true w and are not an artifact of estiation uncertainties 6 Relationshipbetween noralizedand unnoralized weights An alternative approach based on the weight function w = P ()/() consists of noralizing the weights Thus, while in the unnoralized case the unweightedepiricalerrorisreplacedby w( i ) w( i ) L h ( i )= L h( i ), i= in the noralized case it is replaced by i= i= w( i ) W L h( i ), with W = i= w( i) We refer to ŵ() =w()/w as the noralized iportance weight An advantage of the noralized weights is that they are by definition bounded by one However, the price to pay for this benefit is the fact that the weights are no ore unbiased In fact, several issues siilar to those we pointed out in the Section 4 affect the noralized weights as well Here, we aintain the assuption that the second oent of the iportanceweightsisbounded and analyze the relationship between noralized and unnoralized weights We show that, under this assuption, noralized and unnoralized weights are in fact very close, with high probability Observe that for any i,], ŵ( i ) w( i) = w( i) W ] = w( i) W ] W Thus, since w(i) W, wecanwrite ŵ(i ) w(i) W Since w()]=, wealsohave S W ]= k= w( k)]= Thus,byCorollary2,foranyδ>0,withprobabilityatleastδ, the following inequality holds W { } 25/4 a d 2 (P ), d 2 (P ) which iplies the sae upper bound on ŵ( i ) w(i) 7 Conclusion 3 8 log 2e log 4 δ,,siultaneouslyforalli,] We presented a series of theoretical results for iportance weighting both in the bounded weights case and in the ore general unbounded case under the assuption that the second oent of the weights is bounded We also initiated a preliinaryeploration of alternative weights and showed its benefits A ore systeatic study of new algoriths based on these learning guarantees could lead to even ore beneficial and practically useful results Several of the learning guarantees we gave depend on the Rényi divergence of the distributions P and Accuratelyestiatingthatquantity is thus critical and should otivate further studies of the convergence of its estiates fro finite saples Finally, our novel unbounded loss learning bounds are of independent interest and could be useful in a variety of other contets References ] M Anthony and J Shawe-Taylor A result of Vapnik with applications Discrete Applied Matheatics,47:207 27,993 8

9 2] C Arndt Inforation Measures: Inforation and its Description in Science and ngineering Signals and Counication Technology Springer Verlag, ] S Ben-David, J Blitzer, K Craer, and F Pereira Analysis of representations for doain adaptation NIPS,2007 4] S N Bernstein Sur l etension du théorèe liite du calcul des probabilités au soes de quantités dépendantes Matheatische Annalen,97: 59,927 5] A Beygelzier, S Dasgupta, and J Langford IportanceweightedactivelearningInICML, pages 49 56, New York, NY, USA, ] S Bickel, M Brückner, and T Scheffer Discriinativelearningfordifferingtrainingandtest distributions In ICML, pages8 88,2007 7] J Blitzer, K Craer, A Kulesza, F Pereira, and J Wortan Learning bounds for doain adaptation NIPS 2007,2008 8] C Cortes, M Mohri, M Riley, and A Rostaizadeh Sapleselectionbiascorrectiontheory In ALT, ] S Dasgupta and P M Long Boosting with diverse base classifiers In COLT, ] H Daué III and D Marcu Doain adaptation for statistical classifiers Journal of Artificial Intelligence Research,26:0 26,2006 ] M Dudík, R Schapire, and S J Phillips Correcting sapleselectionbiasinaiu entropy density estiation In NIPS,2006 2] R M Dudley A course on epirical processes Lecture Notes in Math,097:2 42,984 3] R M Dudley Universal Donsker classes and etric entropy Annals of Probability,4(4): ,987 4] C lkan The foundations of cost-sensitive learning In IJCAI,pages ,200 5] D Haussler Decision theoretic generalizations of the PACodelforneuralnetandother learning applications Inf Coput, 00():78 50, 992 6] J Huang, A J Sola, A Gretton, K M Borgwardt, and B Schölkopf Correcting saple selection bias by unlabeled data In NIPS,volue9,pages60 608,2006 7] J Jiang and C Zhai Instance Weighting for Doain Adaptation in NLP In ACL, ] J S Liu Monte Carlo strategies in scientific coputing Springer,200 9] Y Mansour, M Mohri, and A Rostaizadeh Doain adaptation: Learning bounds and algoriths In COLT, ] A Maurer and M Pontil pirical bernstein bounds and saple-variance penalization In COLT, Montréal,Canada,June2009Onipress 2] D Pollard Convergence of Stochastic Processess Springer,NewYork,984 22] D Pollard Asyptotics via epirical processes Statistical Science,4(4):34 366,989 23] A Rényi On easures of inforation and entropy In Proceedings of the 4th Berkeley Syposiu on Matheatics, Statistics and Probability,page54756,960 24] H Shiodaira Iproving predictive inference under covariate shift by weighting the loglikelihood function Journal of Statistical Planning and Inference, 90(2): , ] M Sugiyaa, S Nakajia, H Kashia, P von Bünau, and M Kawanabe Direct iportance estiation with odel selection and its application to covariate shift adaptation In NIPS, ] V N Vapnik Statistical Learning Theory JohnWiley&Sons,998 27] V N Vapnik stiation of Dependences Based on pirical Data, 2nd ed Springer, ] J von Neuann Various techniques used in connection with rando digits Monte Carlo ethods Nat Bureau Standards,2:36 38,95 29] B Zadrozny, J Langford, and N Abe Cost-sensitive learning by cost-proportionate eaple weighting In ICDM,

Learning Bounds for Importance Weighting

Learning Bounds for Importance Weighting Learning Bounds for Importance Weighting Corinna Cortes Google Research corinna@google.com Yishay Mansour Tel-Aviv University mansour@tau.ac.il Mehryar Mohri Courant & Google mohri@cims.nyu.edu Motivation

More information

arxiv: v4 [cs.lg] 4 Apr 2016

arxiv: v4 [cs.lg] 4 Apr 2016 e-publication 3 3-5 Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions arxiv:35796v4 cslg 4 Apr 6 Corinna Cortes Google Research, 76 Ninth Avenue, New York, NY Spencer

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Corinna Cortes Spencer Greenberg Mehryar Mohri January, 9 Abstract We present an extensive analysis of relative deviation

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015

learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 learning bounds for importance weighting Tamas Madarasz & Michael Rabadi April 15, 2015 Introduction Often, training distribution does not match testing distribution Want to utilize information about test

More information

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Stability Bounds for Non-i.i.d. Processes

Stability Bounds for Non-i.i.d. Processes tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Doubly Robust Covariate Shift Correction

Doubly Robust Covariate Shift Correction Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Doubly Robust Covariate Shift Correction Sashank J. Reddi Machine Learning Departent Carnegie Mellon University sjakkar@cs.cu.edu

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen

TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS By Willia W. Chen The distribution of the non-null characteristic roots of a atri derived fro saple observations

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

A theory of learning from different domains

A theory of learning from different domains DOI 10.1007/s10994-009-5152-4 A theory of learning fro different doains Shai Ben-David John Blitzer Koby Craer Alex Kulesza Fernando Pereira Jennifer Wortan Vaughan Received: 28 February 2009 / Revised:

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

A Unified Approach to Universal Prediction: Generalized Upper and Lower Bounds

A Unified Approach to Universal Prediction: Generalized Upper and Lower Bounds 646 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL 6, NO 3, MARCH 05 A Unified Approach to Universal Prediction: Generalized Upper and Lower Bounds Nuri Denizcan Vanli and Suleyan S Kozat,

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

Accuracy at the Top. Abstract

Accuracy at the Top. Abstract Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence Best Ar Identification: A Unified Approach to Fixed Budget and Fixed Confidence Victor Gabillon Mohaad Ghavazadeh Alessandro Lazaric INRIA Lille - Nord Europe, Tea SequeL {victor.gabillon,ohaad.ghavazadeh,alessandro.lazaric}@inria.fr

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Structured Prediction Theory Based on Factor Graph Complexity

Structured Prediction Theory Based on Factor Graph Complexity Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

8.1 Force Laws Hooke s Law

8.1 Force Laws Hooke s Law 8.1 Force Laws There are forces that don't change appreciably fro one instant to another, which we refer to as constant in tie, and forces that don't change appreciably fro one point to another, which

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Improved Guarantees for Agnostic Learning of Disjunctions

Improved Guarantees for Agnostic Learning of Disjunctions Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University

More information

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

LARGE DEVIATIONS AND RARE EVENT SIMULATION FOR PORTFOLIO CREDIT RISK

LARGE DEVIATIONS AND RARE EVENT SIMULATION FOR PORTFOLIO CREDIT RISK LARGE DEVIATIONS AND RARE EVENT SIMULATION FOR PORTFOLIO CREDIT RISK by Hasitha de Silva A Dissertation Subitted to the Graduate Faculty of George Mason University In Partial fulfillent of The Requireents

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax: A general forulation of the cross-nested logit odel Michel Bierlaire, EPFL Conference paper STRC 2001 Session: Choices A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics,

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

A Note on Online Scheduling for Jobs with Arbitrary Release Times

A Note on Online Scheduling for Jobs with Arbitrary Release Times A Note on Online Scheduling for Jobs with Arbitrary Release Ties Jihuan Ding, and Guochuan Zhang College of Operations Research and Manageent Science, Qufu Noral University, Rizhao 7686, China dingjihuan@hotail.co

More information

Rotational Prior Knowledge for SVMs

Rotational Prior Knowledge for SVMs Rotational Prior Knowledge for SVMs Arkady Epshteyn and Gerald DeJong University of Illinois at Urbana-Chapaign, Urbana, IL 68, USA {aepshtey,dejong}@uiuc.edu Abstract. Incorporation of prior knowledge

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information