arxiv: v2 [cs.lg] 20 May 2010

Size: px
Start display at page:

Download "arxiv: v2 [cs.lg] 20 May 2010"

Transcription

1 Olie Learig of Noisy Data with Kerels Nicolò Cesa-Biachi Uiversità degli Studi di Milao Shai Shalev Shwartz he Hebrew Uiversity Ohad Shamir he Hebrew Uiversity arxiv: v2 cslg] 20 May 200 Abstract We study olie learig whe idividual istaces are corruted by adversarially chose radom oise We assume the oise distributio is ukow, ad may chage over time with o restrictio other tha havig zero mea ad bouded variace Our techique relies o a family of ubiased estimators for o-liear fuctios, which may be of ideedet iterest We show that a variat of olie gradiet descet ca lear fuctios i ay dotroduct eg, olyomial or Gaussia kerel sace with ay aalytic covex loss fuctio Our variat uses radomized estimates that eed to query a radom umber of oisy coies of each istace, where with high robability this umber is uer bouded by a costat Allowig such multile queries caot be avoided: Ideed, we show that olie learig is i geeral imossible whe oly oe oisy coy of each istace ca be accessed Itroductio I may machie learig alicatios traiig data are tyically collected by measurig certai hysical quatities Examles iclude bioiformatics, medical tests, robotics, ad remote sesig hese measuremets have errors that may be due to several reasos: sesor costs, commuicatio costraits, or itrisic hysical limitatios I all such cases, the learer trais o a distorted versio of the actual target data, which is where the learer s redictive ability is evetually evaluated I this work we ivestigate the extet to which a learig algorithm ca achieve a good redictive erformace whe traiig data are corruted by oise with ukow distributio We rove uer ad lower bouds o the learer s cumulative loss i the framework of olie learig, where examles are geerated by a arbitrary ad ossibily adversarial source We model the measuremet error via a radom erturbatio which affects each istace observed by the learer We do ot assume ay secific roerty of the oise distributio other tha zero-mea ad bouded variace Moreover, we allow the oise distributio to chage at every ste i a adversarial way ad fully hidde from the learer Our ositive results are quite geeral: by usig a radomized ubiased estimate for the loss gradiet ad a radomized feature maig to estimate kerel values, we show that a variat of olie gradiet descet ca lear fuctios i ay dot-roduct eg, olyomial or Gaussia RKHS uder ay give aalytic covex loss fuctio Our techiques are readily extedable to other kerel tyes as well I order to obtai ubiased estimates of loss gradiets ad kerel values, we allow the learer to query a radom umber of ideedetly erturbed coies of the curret usee istace We show how low-variace estimates ca be comuted usig a umber of queries that is costat with high robability his is i shar cotrast with stadard averagig techiques which attemts to directly estimate the oisy istace, as these require a samle whose size deeds o the scale of the roblem Fially, we formally show that learig is imossible, eve without kerels, whe oly oe erturbed coy of each istace ca be accessed his is true for essetially ay reasoable loss fuctio Our aer is orgaized as follows I the ext subsectio we discuss related work I Sec 2 we itroduce our settig ad justify some of our choices I Sec 4 we reset our mai results but before that, i Sec 3, we discuss the techiques used to obtai them I the same sectio, we also exlai why existig techiques are isufficiet to deal with our roblem he detailed roofs ad subroutie imlemetatios aear i Sec 5, with some of the more techical lemmas ad roofs

2 relegated to the aedix We wra u with a discussio o ossible aveues for future work i Sec 6 Related Work I the machie learig literature, the roblem of learig from oisy examles, ad, i articular, from oisy traiig istaces, has traditioally received a lot of attetio see, for examle, the recet survey ] O the other had, there are comarably few theoretically-riciled studies o this toic wo of them focus o models quite differet from the oe studied here: radom attribute oise i PAC boolea learig 3, 8], ad malicious oise 9, 5] I the first case, learig is restricted to classes of boolea fuctios ad the oise must be ideedet across each boolea coordiate I the secod case, a adversary is allowed to erturb a small fractio of the traiig examles i a arbitrary way, makig learig imossible i a strog iformatioal sese uless this erturbed fractio is very small of the order of the desired accuracy for the redictor he revious work erhas closest to the oe reseted here is 0], where biary classificatio mistake bouds are rove for the olie Wiow algorithm i the resece of attribute errors Similarly to our settig, the sequece of istaces observed by the learer is chose by a adversary However, i 0] the oise is geerated by a adversary, who may chage the value of each attribute i a arbitrary way he fial mistake boud, which oly alies whe the oiseless data sequece is liearly searable without kerels, deeds o the sum of all adversarial erturbatios 2 Settig We cosider a settig where the goal is to redict values y R based o istaces x R d I this aer we focus o kerel-based liear redictors of the form x w, Ψx, where Ψ is a feature maig ito some reroducig kerel Hilbert sace RKHS We assume there exists a kerel fuctio that efficietly imlemets dot roducts i that sace, ie, kx, x Ψx, Ψx Note that a secial case of this settig is liear kerels, where Ψ is the idetity maig ad kx, x x, x he stadard olie learig rotocol for liear redictio with kerels is defied as follows: at each roud t, the learer icks a liear hyothesis w t from the RKHS he adversary the icks a examle x t, y t ad reveals it to the learer he loss suffered by the learer is l w t, Ψx t, y t, where l is a kow ad fixed loss fuctio he goal of the learer is to miimize regret with resect to a fixed covex set of hyotheses W, amely l w t, Ψx t, y t mi l w, Ψx t, y t yically, we wish to fid a strategy for the learer, such that o matter what is the adversary s strategy of choosig the sequece of examles, the exressio above is sub-liear i We ow make the followig twist, which limits the iformatio available to the learer: istead of receivig x t, y t, the learer observes y t ad is give access to a oracle A t O each call, A t returs a ideedet coy of x t + Z t, where Z t is a zero-mea radom vector with some kow fiite boud o its variace i the sese that E Z t 2] a for some uiform costat a I geeral, the distributio of Z t is ukow to the learer It might be chose by the adversary, ad chage from roud to roud or eve betwee cosecutive calls to A t Note that here we assume that y t remais uerturbed, but we emhasize that this is just for simlicity - our techiques ca be readily exteded to deal with oisy values as well he learer may call A t more tha oce I fact, as we discuss later o, beig able to call A t more tha oce is ecessary for the learer to have ay hoe to succeed O the other had, if the learer calls A t a ulimited umber of times, it ca recostruct x t arbitrarily well by averagig, ad we are back to the stadard learig settig I this aer we focus o learig algorithms that call A t oly a small, essetially costat umber of times, which deeds oly o our choice of loss fuctio ad kerel rather tha, the orm of x t, or the variace of Z t, which will hae with aïve averagig techiques Moreover, sice the umber of queries is bouded with very high robability, oe ca eve roduce a algorithm with a absolute boud o the umber of queries, which will fail or itroduce some bias with a arbitrarily small robability For simlicity, we igore these issues i this aer I this settig, we wish to miimize the regret i hidsight with resect to the uerturbed data ad averaged over the oise itroduced by the oracle, amely ] E l w t, Ψx t, y t mi l w, Ψx t, y t

3 where the radom quatities are the redictors w, w 2, geerated by the learer, which deed o the observed oisy istaces i the aedix, we briefly discuss alterative regret measures, ad why they are usatisfactory his kid of regret is relevat where we actually wish to lear from data, without the oise causig a hidrace I articular, cosider the batch settig, where the examles {x t, y t } are actually samled iid from some ukow distributio, ad we wish to fid a redictor which miimizes the exected loss El w, x, y] with resect to ew examles x, y Usig stadard olie-to-batch coversio techiques, if we ca fid a olie algorithm with a subliear boud o Eq, the it is ossible to costruct learig algorithms for the batch settig which are robust to oise hat is, algorithms geeratig a redictor w with close to miimal exected loss El w, x, y] amog all w W While our techiques are quite geeral, the exact algorithmic ad theoretical results deed a lot o which loss fuctio ad kerel is used Discussig the loss fuctio first, we will assume that l w, Ψx, y is a covex fuctio of w for each examle x, y Somewhat abusig otatio, we assume the loss ca be writte either as l w, Ψx, y fy w, Ψx or as l w, Ψx, y f w, Ψx y for some fuctio f We refer to the first tye as classificatio losses, as it ecomasses most reasoable losses for classificatio, where y {, +} ad the goal is to redict the label We refer to the secod tye as regressio losses, as it ecomasses most reasoable regressio losses, where y takes arbitrary real values For simlicity, we reset some of our results i terms of classificatio losses, but they all hold for regressio losses as well with slight modificatios We reset our results uder the assumtio that the loss fuctio is smooth, i the sese that l a ca be writte as γ a, for ay a i its domai his assumtio holds for istace for the squared loss la a 2, the exoetial loss la exa, ad smoothed versios of loss fuctios such as the hige loss ad the absolute loss we discuss examles i more details i Subsectio 42 his assumtio ca be relaxed uder certai coditios, ad this is further discussed i Subsectio 32 urig to the issue of kerels, we ote that the geeral resetatio of our aroach is somewhat hamered by the fact that it eeds to be tailored to the kerel we use I this aer, we focus o two families of kerels: Dot Product Kerels: the kerel kx, x ca be writte as a fuctio of x, x Examles of such kerels kx, x are liear kerels x, x ; homogeeous olyomial kerels x, x, ihomogeeous olyomial kerels + x, x ; exoetial kerels e x,x ; biomial kerels + x, x α, ad more see for istace 4, 6] Gaussia Kerels: kx, x e x x 2 /σ 2 for some σ 2 > 0 Agai, we emhasize that our techiques are extedable to other kerel tyes as well 3 echiques Our results are based o two key ideas: the use of olie gradiet descet algorithms, ad costructio of ubiased gradiet estimators i the kerel settig he latter is based o a geeral method to build ubiased estimators for o-liear fuctios, which may be of ideedet iterest 3 Olie Gradiet Descet here exist well develoed theory ad algorithms for dealig with the stadard olie learig settig, where the examle x t, y t is revealed after each roud, ad for geeral covex loss fuctios Oe of the simlest ad most well kow oes is the olie gradiet descet algorithm due to Zikevich 7] Sice this algorithm forms a basis for our algorithm i the ew settig, we briefly review it below as adated to our settig he algorithm iitializes the classifier w 0 At roud t, the algorithm redicts accordig to w t, ad udates the learig rule accordig to w t+ P w t η t t, where ηt is a suitably chose costat which might deed o t; t l y t w t, Ψx t y t Ψx t is the gradiet of l y t w, Ψx t with resect to w t ; ad P is a rojectio oerator o the covex set W, o whose elemets we wish to achieve low regret I articular, if we wish to comete with hyotheses of bouded squared orm B w, P simly ivolves rescalig the orm of the redictor so as to have squared orm at most B w With this algorithm, oe ca rove regret bouds with resect to ay w W A folklore result about this algorithm is that i fact, we do ot eed to udate the redictor by the gradiet at each ste Istead, it is eough to udate by some radom vector of bouded variace, which merely equals the gradiet i exectatio his is a useful roerty i settigs where x t, y t is ot revealed to the learer, ad has bee used before, such as i the olie badit settig see for istace 6, 7, ] Here, we will use this roerty i a ew way, i order to devise

4 algorithms which are robust to oise Whe the kerel ad loss fuctio are liear eg, Ψx x ad la ca + b for some costats b, c, this roerty already esures that the algorithm is robust to oise without ay further chages his is because the oise ijected to each x t merely causes the exact gradiet estimate to chage to a radom vector which is correct i exectatio: If we assume l is a classificatio loss, the E l y t w t, Ψ x t Ψ x t ] E c x t ] x t O the other had, whe we use oliear kerels ad oliear loss fuctios, usig stadard olie gradiet descet leads to systematic ad ukow biases sice the oise distributio is ukow, which revets the method from workig roerly o deal with this roblem, we ow tur to describe a techique for estimatig exressios such as l y t w t, Ψx t i a ubiased maer I Subsectio 33, we discuss how Ψx t ca be estimated i a ubiased maer 32 Ubiased Estimators for No-Liear Fuctios Suose that we are give access to ideedet coies of a real radom variable X, with exectatio EX], ad some real fuctio f, ad we wish to costruct a ubiased estimate of fex] If f is a liear fuctio, the this is easy: just samle x from X, ad retur fx By liearity, EfX] fex] ad we are doe he roblem becomes less trivial whe f is a geeral, oliear fuctio, sice usually EfX] fex] I fact, whe X takes fiitely may values ad f is ot a olyomial fuctio, oe ca rove that o ubiased estimator ca exist see 3], Proositio 8 ad its roof Nevertheless, we show how i may cases oe ca costruct a ubiased estimator of fex], icludig cases covered by the imossibility result here is o cotradictio, because we do ot costruct a stadard estimator Usually, a estimator is a fuctio from a give samle to the rage of the arameter we wish to estimate A imlicit assumtio is that the size of the samle give to it is fixed, ad this is also a crucial igrediet i the imossibility result We circumvet this by costructig a estimator based o a radom umber of samles Here is the key idea: suose f : R R is ay fuctio cotiuous o a bouded iterval It is well kow that oe ca costruct a sequece of olyomials Q, where Q is a olyomial of degree, which coverges uiformly to f o the iterval If Q x i0 γ,ix i, let Q x,, x i0 γ i,i j x j Now, cosider the estimator which draws a ositive iteger N accordig to some distributio PN, samles X for N times to get x, x 2,, x N, ad returs N Q N x,, x N Q N x,, x N, where we assume Q 0 0 he exected value of this estimator is equal to: E N,x,,x N Q N x,, x N Q N x,, x N ] N E x,,x Q x,, x Q x,, x ] Q EX] Q EX] fex] hus, we have a ubiased estimator of fex] his techique aeared i a rather obscure early 960 s aer 5] from sequetial estimatio theory, ad aears to be little kow, articularly outside the sequetial estimatio commuity However, we believe this techique is iterestig, ad exect it to have useful alicatios for other roblems as well While this may seem at first like a very geeral result, the variace of this estimator must be bouded for it to be useful Ufortuately, this is ot true for geeral cotiuous fuctios More recisely, let N be distributed accordig to, ad let θ be the value retured by the estimator I 2], it is show that if X is a Beroulli radom variable, ad if EθN k ] < for some iteger k, the f must be k times cotiuously differetiable Sice EθN k ] Eθ 2 ] + EN 2k ]/2, this meas that fuctios f which yield a estimator with fiite variace, while usig a umber of queries with bouded variace, must be cotiuously differetiable Moreover, i case we desire the umber of queries to be essetially costat ie choose a distributio for N with exoetially decayig tails, we must have EN k ] < for all k, which meas that f should be ifiitely differetiable i fact, i 2] it is cojectured that f must be aalytic i such cases hus, we focus i this aer o fuctios f which are aalytic, ie, they ca be writte as fx i0 γ ix i for aroriate costats γ 0, γ, I that case, Q ca simly be the trucated

5 aylor exasio of f to order, ie, Q i0 γ ix i Moreover, we ca ick / for ay > So the estimator becomes the followig: we samle a oegative iteger N accordig to PN / +, samle X ideedetly N times to get x, x 2,, x N, ad retur θ γ N+ N x x 2 x N where we set θ γ 0 if N 0 We have the followig: Lemma For the above estimator, it holds that Eθ] fex] he exected umber of samles used by the estimator is /, ad the robability of it beig at least z is z Moreover, if we assume that f + x γ x exists for ay x i the domai of iterest, the Eθ 2 ] f + 2 EX2 ] Proof he fact that Eθ] fex] follows from the discussio above he results about the umber of samles follow directly from roerties of the geometric distributio As for the secod momet, Eθ 2 ] equals E N,x,,x N γ N 2 2N+ ] 2 x2 x 2 2 x 2 2+ N 2 + γ2 E x,,x x 2 x 2 2 x 2 ] γ 2 EX 2 ] 2 γ EX2 ] 2 γ EX2 ] f + 2 EX2 ] he arameter rovides a tradeoff betwee the variace of the estimator ad the umber of samles eeded: the larger is, the less samles do we eed, but the estimator has more variace I ay case, the samle size distributio decays exoetially fast, so the samle size is essetially bouded It should be emhasized that the estimator associated with Lemma is tailored for geerality, ad is subotimal i some cases For examle, if f is a olyomial fuctio, the γ 0 for sufficietly large, ad there is o reaso to samle N from a distributio suorted o all oegative itegers - it just icreases the variace Nevertheless, i order to kee the resetatio uified ad geeral, we will always use this tye of estimator If eeded, the estimator ca always be otimized for secific cases We also ote that this techique ca be imroved i various directios, if more is kow about the distributio of X For istace, if we have some estimate of the exectatio ad variace of X, the we ca erform a aylor exasio aroud the estimated EX] rather tha 0, ad tue the robability distributio of N to be differet tha the oe we used above hese modificatios ca allow us to make the variace of the estimator arbitrarily small, if the variace of X is small eough Moreover, oe ca take olyomial aroximatios to f which are erhas better tha trucated aylor exasios I this aer, for simlicity, we will igore these otetial imrovemets Fially, we ote that a related result i 2] imlies that it is imossible to estimate fex] i a ubiased maer whe f is discotiuous, eve if we allow a umber of queries ad estimator values which are ifiite i exectatio herefore, sice the derivative of the hige loss is ot cotiuous, estimatig i a ubiased maer the gradiet of the hige loss with arbitrary oise aears to be imossible hus, if olie learig with oise ad hige loss is at all feasible, a rather differet aroach tha ours will eed to be take 33 Ubiasig Noise i the RKHS he third comoet of our aroach ivolves the ubiased estimatio of Ψx t, whe we oly have ubiased oisy coies of x t Here agai, we have a o-trivial roblem, because the feature maig Ψ is usually highly o-liear, so EΨ x t ] ΨE x t ] i geeral Moreover, Ψ is ot a scalar fuctio, so the techique of Subsectio 32 will ot work as-is Admittedly, the evet N 0 should receive zero robability, as it amouts to skiig the samlig altogether However, settig PN 0 0 aears to imrove the boud i this aer oly i the smaller order terms, while makig the aalysis i the aer more comlicated

6 o tackle this roblem, we costruct a exlicit feature maig, which eeds to be tailored to the kerel we wat to use o give a very simle examle, suose we use the homogeeous 2ddegree olyomial kerel, kr, s r, s 2 It is ot hard to verify that the fuctio Ψ : R d R d2, defied via Ψx x x, x x 2,, x d x d, is a exlicit feature maig for this kerel Now, if we query two ideedet oisy coies x, x of x, we have that the exectatio of the radom vector x x, x x 2,, x d x d is othig more tha Ψx hus, we ca costruct ubiased estimates of Ψx i the RKHS Of course, this examle ertais to a very simle RKHS with a fiite dimesioal reresetatio By a radomizatio trick somewhat similar to the oe i Subsectio 32, we ca adat this aroach to ifiite dimesioal RKHS as well I a utshell, we rereset Ψx as a ifiite-dimesioal vector, ad its oisy ubiased estimate is a vector which is o-zero o oly fiitely may etries, usig fiitely may oisy queries Moreover, ier roducts betwee these estimates ca be doe efficietly, allowig us to imlemet the learig algorithms, ad use the resultig redictor o test istaces 4 Mai Results 4 Algorithm We reset our algorithmic aroach i a modular form We start by itroducig the mai algorithm, which cotais several subrouties he we rove our two mai results, which boud the regret of the algorithm, the umber of queries to the oracle, ad the ruig time for two tyes of kerels: dot roduct ad Gaussia our results ca be exteded to other kerel tyes as well I itself, the algorithm is othig more tha a stadard olie gradiet descet algorithm with a stadard O regret boud hus, most of the roofs are devoted to a detailed discussio of how the subrouties are imlemeted icludig exlicit seudo-code I this sectio, we just describe oe subroutie, based o the techiques discussed i Sec 3 he other subrouties require a more detailed ad techical discussio, ad thus their imlemetatio is described as art of the roofs i Sec 5 I ay case, the ituitio behid the imlemetatios ad the techiques used are described i Sec 3 For simlicity, we will focus o a fiite-horizo settig, where the umber of olie rouds is fixed ad kow to the learer he algorithm ca easily be modified to deal with the ifiite horizo settig, where the learer eeds to achieve sub-liear regret for all simultaeously Also, for the remaider of this subsectio, we assume for simlicity that l is a classificatio loss, amely ca be writte as a fuctio of ly w, Ψx It is ot hard to adat the results below to the case where l is a regressio loss where l is a fuctio of w, Ψx y We ote that at each roud, the algorithm below costructs a object which we deote as Ψx t his object has two iterretatios here: formally, it is a elemet of a reroducig kerel Hilbert sace RKHS corresodig to the kerel we use, ad is equal i exectatio to Ψx t However, i terms of imlemetatio, it is simly a data structure cosistig of a fiite set of vectors from R d hus, it ca be efficietly stored i memory ad hadled eve for ifiite-dimesioal RKHS Algorithm Kerel Learig Algorithm with Noisy Iut Parameters: Learig rate η > 0, umber of rouds, samle arameter > Iitialize: α i 0 for all i,, Ψx i for all i,, // Ψx i is a data structure which ca store a variable umber of vectors i R d For t Defie w t t i α Ψx i i Receive A t, y t // he oracle A t rovides oisy estimates of x t Let Ψx t : Ma EstimateA t, // Get ubiased estimate of Ψx t i the RKHS Let g t : Grad Legth EstimateA t, y t, // Get ubiased estimate of l y t w t, Ψx t Let α t : g t η/ // Perform gradiet ste Let ñ t : t t i j α t,iα t,j Prod Ψx i, Ψx j // Comute squared orm, where Prod Ψx i, Ψx j returs Ψx i, Ψx j If ñ t > B w // If orm squared is larger tha B w, the roject Let α i : α Bw i ñ t for all i,, t Like Ψx t, w t+ has also two iterretatios: formally, it is a elemet i the RKHS, as defied

7 i the seudocode I terms of imlemetatio, it is defied via the data structures Ψx,, Ψx t ad the values of α,, α t at roud t o aly this hyothesis o a give istace x, we comute t i α t,iprod Ψx i, x, where Prod Ψx i, x is a subroutie which returs Ψx i, Ψx a seudocode is rovided as art of the roofs later o We ow tur to the mai results ertaiig to the algorithm he first result shows what regret boud is achievable by the algorithm for ay dot-roduct kerel, as well as characterize the umber of oracle queries er istace, ad the overall ruig time of the algorithm heorem Assume that the loss fuctio l has a aalytic derivative l a γ a for all a i its domai, ad let l +a γ a assumig it exists Assume also that the kerel kx, x ca be writte as Q x, x for all x, x R d Fially, assume that E x t 2 ] B x for ay x t retured by the oracle at roud t, for all t,, he, for all B w > 0 ad >, it is ossible to imlemet the subrouties of Algorithm such that: he exected umber of queries to each oracle A t is 2 he exected ruig time of the algorithm is O 3 + d 2 / 2 If we ru Algorithm with η B w ul + u, where u Bw QB x, the ] E ly t w t, Ψx t mi ly t w, Ψx t l + u u w : w 2 B w he exectatios are with resect to the radomess of the oracles ad the algorithm throughout its ru We ote that the distributio of the umber of oracle queries ca be secified exlicitly, ad it decays very raidly - see the roof for details Also, for simlicity, we oly boud the exected regret i the theorem above If the oise is bouded almost surely or with sub-gaussia tails rather tha just bouded variace, the it is ossible to obtai similar guaratees with high robability, by relyig o Azuma s iequality or variats thereof see for examle 4] We ow tur to the case of Gaussia kerels heorem 2 Assume that the loss fuctio l has a aalytic derivative l a γ a for all a i its domai, ad let l +a γ a assumig it exists Assume that the kerel kx, x is defied as ex x x 2 /σ 2 Fially, assume that E x t 2 ] B x for ay x t retured by the oracle at roud t, for all t,, he for all B w > 0 ad > it is ossible to imlemet the subrouties of Algorithm such that he exected umber of queries to each oracle A t is he exected ruig time of the algorithm is O 3 + d / If we ru Algorithm with η B w ul + u, where 3 B x + 2 B x u B w ex σ 2 the E ly t w t, Ψx t mi w : w 2 B w ] ly t w, Ψx t l + u u he exectatios are with resect to the radomess of the oracles ad the algorithm throughout its ru As i hm, ote that the umber of oracle queries has a fast decayig distributio Also, ote that with Gaussia kerels, σ 2 is usually chose to be o the order of the examle s squared orms hus, if the oise added to the examles is roortioal to their origial orm, we ca assume that B x /σ 2 O, ad thus u which aears i the boud is also bouded by a costat As reviously metioed, most of the subrouties are described i the roofs sectio, as art of the roof of hm Here, we oly show how to imlemet Grad Legth Estimate subroutie,

8 which returs the gradiet legth estimate g t he idea is based o the techique described i Subsectio 32 We rove that g t is a ubiased estimate of l y t w t, Ψx t, ad boud E g t 2 ] As discussed earlier, we assume that l is aalytic ad ca be writte as l a γ a Subroutie Grad Legth EstimateA t, y t, Samle oegative iteger accordig to P / + For j,, Let Ψx t j : Ma EstimateA t // Get ubiased estimate of Ψx t i the RKHS Retur g t : y t γ + t j i α t,iprod Ψx i, Ψx t j Lemma 2 Assume that E Ψx t ] Ψx t, ad that Prod Ψx, Ψx returs Ψx, Ψx for all x, x he for ay give w t α t, Ψx + + α t,t Ψxt it holds that E t g t ] y t l y t w t, Ψx t ad E t g t 2 ] 2 l + B w B Ψx where the exectatio is with resect to the radomess of Subroutie, ad l +a γ a Proof he result follows from Lemma, where g t corresods to the estimator θ, the fuctio f corresods to l, ad the radom variable X corresods to w t, Ψx t where Ψx t is radom ad w t is held fixed he term EX 2 ] i Lemma ca be uer bouded as E t wt, Ψx t 2 ] w t 2 E t Ψx t 2] B w B Ψx 42 Loss Fuctio Examles heorems ad 2 both deal with geeric loss fuctios l whose derivative ca be writte as γ a, ad the regret bouds ivolve the fuctios l +a γ a Below, we reset a few examles of loss fuctios ad their corresodig l + As metioed earlier, while the theorems i the revious subsectio are i terms of classificatio losses ie, l is a fuctio of y w, Ψx, virtually idetical results ca be rove for regressio losses ie, l is a fuctio of w, Ψx y, so we will give examles from both families Workig out the first two examles is straightforward he roofs of the other two aear i Sec 5 he loss fuctios are illustrated grahically i Fig Examle For the squared loss fuctio, l w, x, y w, x y 2, we have l + u 2 u Examle 2 For the exoetial loss fuctio, l w, x, y e y w,x, we have l + u e u Examle 3 Cosider a smoothed absolute loss fuctio l σ w, Ψx y, defied as a atiderivative of Erfsa for some s > 0 see roof for exact aalytic form he we have that l + u 2 + e s2 u s π u Examle 4 Cosider a smoothed hige loss ly w, Ψx, defied as a atiderivative of Erfsa /2 for some s > 0 see roof for exact aalytic form he we have that l + u 2 e s2 u s π u For ay s, the loss fuctio i the last two examles are covex, ad resectively aroximate the absolute loss w, Ψx y ad the hige loss max { 0, y w, Ψx } arbitrarily well for large eough s Fig shows these loss fuctios grahically for s Note that s eed ot be large i order to get a good aroximatio Also, we ote that both the loss itself ad its gradiet are comutatioally easy to evaluate Fially, we remid the reader that as discussed i Subsectio 32, erformig a ubiased estimate of the gradiet for o-differetiable losses directly such as the hige loss or absolute loss aears to be imossible i geeral O the fli side, if oe is willig to use a radom umber of queries with olyomial rather tha exoetial tails, the oe ca achieve much better samle comlexity results, by focusig o loss fuctios or aroximatios thereof which are oly differetiable to a bouded order, rather tha fully aalytic his agai demostrates the tradeoff betwee the samle size ad the amout of iformatio that eeds to be gathered o each traiig examle

9 Absolute Loss Smoothed Absolute Loss s 2 Hige Loss Smoothed Hige Loss s Figure : Absolute loss, hige loss, ad smooth aroximatios 43 Oe Noisy Coy is Not Eough he revious results might lead oe to woder whether it is really ecessary to query the same istace more tha oce I some alicatios this is icoveiet, ad oe would refer a method which works whe just a sigle oisy coy of each istace is made available I this subsectio we show that, ufortuately, such a method caot be foud Secifically, we rove that uder very mild assumtios, o method ca achieve sub-liear regret whe it has access to just a sigle oisy coy of each istace O the other had, for the case of squared loss ad liear kerels, our techiques ca be adated to work with exactly two oisy coies of each istace, 2 so without further assumtios, the lower boud that we rove here is ideed tight For simlicity, we rove the result for liear kerels ie, where kx, x x, x It is a iterestig oe roblem to show imroved lower bouds whe oliear kerels are used We also ote that the result crucially relies o the learer ot kowig the oise distributio, ad we leave to future work the ivestigatio of what haes whe this assumtio is relaxed heorem 3 Let W be a comact covex subset of R d, ad let l, : R R satisfies the followig: it is bouded from below; 2 it is differetiable at 0 with l 0, < 0 For ay learig algorithm which selects hyotheses from W ad is allowed access to a sigle oisy coy of the istace at each roud t, there exists a strategy for the adversary such that the sequece w, w 2, of redictors outut by the algorithm satisfies lim su max l w t, x t, y t l w, x t, y t > 0 with robability with resect to the radomess of the oracles Note that coditio is satisfied by virtually ay loss fuctio other tha the liear loss, while coditio 2 is satisfied by most regressio losses, ad by all classificatio calibrated losses, which iclude all reasoable losses for classificatio see 2] he most obvious examle where the coditios are ot satisfied is whe l, is a liear fuctio his is ot surrisig, because whe l, is liear, the learer is always robust to oise see the discussio at Sec 3 he ituitio of the roof is very simle: the adversary chooses beforehad whether the examles are draw iid from a distributio D, ad the erturbed by oise, or draw iid from some other distributio D without addig oise he distributios D, D ad the oise are desiged so that the examles observed by the learer are distributed i the same way irresective to which of the two samlig strategies the adversary chooses herefore, it is imossible for the learer accessig a sigle coy of each istace to be statistically cosistet with resect to both distributios simultaeously As a result, the adversary ca always choose a distributio o which the algorithm will be icosistet, leadig to costat regret he full roof is reseted i Sectio 53 2 I a utshell, for squared loss ad liear kerels, we just eed to estimate 2 w t, x t y tx t i a ubiased maer at each roud t his ca be doe by comutig 2 w t, x t y t x t, where x t, x t are two oisy coies of x t

10 5 Proofs Due to the lack of sace, some of the roofs are give i the the aedix 5 Prelimiary Result o rove hm ad hm 2, we eed a theorem which basically states that if all subrouties i algorithm behave as they should, the oe ca achieve a O regret boud his is rovided i the followig theorem, which is a adatatio of a stadard result of olie covex otimizatio see, eg, 7] he roof is give i Aedix D heorem 4 Assume the followig coditios hold with resect to Algorithm : For all t, Ψxt ad g t are ideedet of each other as radom variables iduced by the radomess of Algorithm as well as ideedet of ay Ψx i ad g i for i < t 2 For all t, E Ψx t ] Ψx t, ad there exists a costat B Ψ > 0 such that E Ψx t 2 ] B Ψ 3 For all t, E g t ] y t l y t w t, Ψx t, ad there exists a costat B g > 0 such that E g 2 t ] B g 4 For ay air of istaces x, x, Prod Ψx, Ψx Ψx, Ψx he if Algorithm is ru with η Bw B gb, the followig iequality holds Ψ E l y t w t, Ψx t mi l y t w, Ψx t ] B w B g B Ψ w : w 2 B w where the exectatio is with resect to the radomess of the oracles ad the algorithm throughout its ru 52 Proof of hm I this subsectio, we reset the roof of hm We first show how to imlemet the subrouties of Algorithm, ad rove the relevat results o their behavior he, we rove the theorem itself It is kow that for k, Q x, x to be a valid kerel, it is ecessary that Q x, x ca be writte as a aylor exasio β x, x, where β 0 see theorem 49 i 4] his makes these tyes of kerels ameable to our techiques We start by costructig a exlicit feature maig Ψ corresodig to the RKHS iduced by our kerel For ay x, x, we have that d kx, x β x, x β x i x i i d d β x k x k2 x k x k x k 2 x k k k d d k k β x k x k2 x k β x k x k2 x k his suggests the followig feature reresetatio: for ay x, Ψx returs a ifiite-dimesioal vector, idexed by ad k,, k {,, d}, with the etry corresodig to, k,, k beig β x k x k he dot roduct betwee Ψx ad Ψx is similar to a stadard dot roduct betwee two vectors, ad by the derivatio above equals kx, x as required We ow use a slightly more elaborate variat of our ubiased estimate techique, to derive a ubiased estimate of Ψx First, we samle N accordig to PN / + he, we query the oracle for x for N times to get x,, x N, ad formally defie Ψx as Ψx + d d β x k x k e,k,,k 2 k where e,k,,k reresets the uit vector i the directio idexed by, k,, k as exlaied above Sice the oracle queries are iid, the exectatio of this exressio is + d d β + E x k x ] d d k e,k,,k β x k x k e,k,,k k k k k which is exactly Ψx We formalize the eeded roerties of Ψx i the followig lemma k

11 Lemma 3 Assumig Ψx is costructed as i the discussio above, it holds that E Ψx] Ψx for ay x Moreover, if the oisy samles x t retured by the oracle A t satisfy E x t 2 ] B x, the E Ψx t 2] QB x where we recall that Q defies the kerel by kx, x Q x, x Proof he first art of the lemma follows from the discussio above As to the secod art, ote that by 2, E Ψx t 2] E β 2+2 d 2 2 x t,k x N t,k E β 2+2 x j 2 2 t k,k + β 2+2 ] E 2 ] x 2 t β E 2 x t where the secod-to-last ste used the fact that β 0 for all j β B x QB x Of course, exlicitly storig Ψx as defied above is ifeasible, sice the umber of etries is huge Fortuately, this is ot eeded: we just eed to store x t,, x N t he reresetatio above is used imlicitly whe we calculate dot roducts betwee Ψx ad other elemets i the RKHS, via the subroutie Prod We ote that while N is a radom quatity which might be ubouded, its distributio decays exoetially fast, so the umber of vectors to store is essetially bouded After the discussio above, the seudocode for Ma Estimate below should be self-exlaatory Subroutie 2 Ma EstimateA t, Samle oegative iteger N accordig to PN / + Query A t for N times to get x t,, x N t Retur x t,, x N t as Ψx t We ow tur to the subroutie Prod, which give two elemets i the RKHS, returs their dot roduct his subroutie comes i two flavors: either as a rocedure defied over Ψx, Ψx ad returig Ψx, Ψx Subroutie 3; or as a rocedure defied over Ψx, x Subroutie 4, where the secod elemet is a exlicitely give vector ad returig Ψx, Ψx his secod variat of Prod is eeded whe we wish to aly the leared redictor o a ew give istace x Subroutie 3 Prod Ψx, Ψx Let x,, x be the idex ad vectors comrisig Ψx Let x,, x be the idex ad vectors comrisig Ψx If retur 0, else retur β j xj, x j Lemma 4 Prod Ψx, Ψx returs Ψx Ψx Proof Usig the formal reresetatio of Ψx, Ψx i 2, we have that Ψx, Ψx is 0 wheever because the these two elemets are comosed of differet uit vectors with resect to a orthogoal basis Otherwise, we have that Ψx Ψx β β d k,,k d k x k x x k x k k x k d k N x k x k N x k N which is exactly what the algorithm returs, hece the lemma follows β N j x j, x j

12 he seudocode for calculatig the dot roduct Ψx, Ψx where x is kow is very similar, ad the roof is essetially the same Subroutie 4 Prod Ψx, x Let, x,, x be the idex ad vectors comrisig Ψx Retur β + j xj, x We are ow ready to rove hm First, regardig the exected umber of queries, otice that to ru Algorithm, we ivoke Ma Estimate ad Grad Legth Estimate oce at roud t Ma Estimate uses a radom umber B of queries distributed as PB / +, ad Grad Legth Estimate ivokes Ma Estimate a radom umber C of times, distributed as PC / + he total umber of queries is therefore C+ j B j, where B j for all j are iid coies of B he exected value of this exressio, usig a stadard result o the exected value of a sum of a radom umber of ideedet radom variables, is equal to + EC]EB j ], or + 2 d, I terms of ruig time, we ote that the exected ruig time of Prod is O + this because it erforms N multilicatios of ier roducts, each oe with ruig time Od, ad EN] he exected ruig time of Ma Estimate is O + he exected ruig time of Grad Legth Estimate is O d, which ca be writte as O + + d Sice Algorithm at each of rouds calls Ma Estimate oce, 2 Grad Legth Estimate oce, Prod for O 2 times, ad erforms O other oeratios, we get that the overall rutime is O d d Sice, we ca uer boud this by 2 O d 2 O 3 d + 2 he regret boud i the theorem follows from hm 4, with the exressios for costats followig from Lemma 2, Lemma 3, ad Lemma 4 53 Proof Sketch of hm 3 o rove the theorem, we use a more geeral result which leads to o-vaishig regret, ad the show that uder the assumtios of hm 3, the result holds he roof of the result is give i Aedix F heorem 5 Let W be a comact covex subset of R d ad ick ay learig algorithm which selects hyotheses from W ad is allowed access to a sigle oisy coy of the istace at each roud t If there exists a distributio over a comact subset of R d such that argmi E l w, x, ] ad argmi l w, Ex], 3 are disjoit, the there exists a strategy for the adversary such that the sequece w, w 2, W of redictors outut by the algorithm satisfies lim su max l w t, x t, y t l w, x t, y t > 0 with robability with resect to the radomess of the oracles Aother way to hrase this theorem is that the regret caot vaish, if give examles samled iid from a distributio, the learig roblem is more comlicated tha just fidig the mea of the data Ideed, the adversary s strategy we choose later o is simly drawig ad resetig examles from such a distributio Below, we sketch how we use hm 5 i order to rove hm 3 A full roof is rovided i Aedix E

13 We costruct a very simle oe-dimesioal distributio, which satisfies the coditios of hm 5: it is simly the uiform distributio o {3x, x}, where x is the vector, 0,, 0 hus, it is eough to show that argmi l3w, + l w, ad argmi lw, 4 w : w 2 B w w : w 2 B w are disjoit, for some aroriately chose B w Assumig the cotrary, the uder the assumtios o l, we show that the first set i Eq 4 is iside a bouded ball aroud the origi, i a way which is ideedet of B w, o matter how large it is hus, if we ick B w to be large eough, ad assume that the two sets i Eq 4 are ot disjoit, the there must be some w such that both l3w, + l w, ad lw, have a subgradiet of zero at w However, this ca be show to cotradict the assumtios o l, leadig to the desired result 6 Future Work here are several iterestig research directios worth ursuig i the oisy learig framework itroduced here For istace, doig away with ubiasedess, which could lead to the desig of estimators that are alicable to more tyes of loss fuctios, for which ubiased estimators may ot eve exist Also, it would be iterestig to show how additioal iformatio oe has about the oise distributio ca be used to desig imroved estimates, ossibly i associatio with secific losses or kerels Aother oe questio is whether our lower boud hm 3 ca be imroved whe oliear kerels are used Refereces ] J Aberethy, E Haza, ad A Rakhli Cometig i the dark: A efficiet algorithm for badit liear otimizatio I COL, ages , ] S Bhadari ad A Bose Existece of ubiased estimators i sequetial biomial exerimets Sakhyā: he Idia Joural of Statistics, 52:27 30, 990 3] N Bshouty, J Jackso, ad C amo Uiform-distributio attribute oise learability Iformatio ad Comutatio, 872: , ] N Cesa-Biachi, A Cocoi, ad C Getile O the geeralizatio ability of o-lie learig algorithms IEEE rasactios o Iformatio heory, 509: , Setember ] N Cesa-Biachi, E Dichterma, P Fischer, E Shamir, ad H Simo Samle-efficiet strategies for learig i the resece of oise Joural of the ACM, 465:684 79, 999 6] N Cesa-Biachi ad G Lugosi Predictio, learig, ad games Cambridge Uiversity Press, ] A Flaxma, A auma Kalai, ad H McMaha Olie covex otimizatio i the badit settig: gradiet descet without a gradiet I Proceedigs of SODA, ages , ] S Goldma ad R Sloa Ca ac learig algorithms tolerate radom attribute oise? Algorithmica, 4:70 84, 995 9] M Kears ad M Li Learig i the resece of malicious errors SIAM Joural o Comutig, 224: , 993 0] N Littlestoe Redudat oisy attributes, attribute errors, ad liear threshold learig usig Wiow I Proceedigs of COL, ages 47 56, 99 ] D Nettleto, A Orriols-Puig, ad A Forells A study of the effect of differet tyes of oise o the recisio of suervised learig techiques Artificial Itelligece Review, 200 2] M Jorda P Bartlett ad J McAuliffe Covexity, classificatio ad risk bouds Joural of the America Statistical Associatio, 0473:38 56, March ] L Paiski Estimatio of etroy ad mutual iformatio Neural Comutatio, 56:9 253, ] B Schölkof ad A Smola Learig with Kerels MI Press, ] R Sigh Existece of ubiased estimates Sakhyā: he Idia Joural of Statistics, 26:93 96, 964 6] I Steiwart ad A Christma Suort Vector Machies Sriger, ] M Zikevich Olie covex rogrammig ad geeralized ifiitesimal gradiet ascet I Proceedigs of ICML, ages , 2003

14 A Alterative Notios of Regret I the olie settig, oe may cosider otios of regret other tha Oe choice is l w t, Ψ x t, y t mi l w, Ψ x t, y t but this is too easy, as it reduces to stadard olie learig with resect to examles which hae to be oisy Aother kid of regret we may wat to miimize is l w t, Ψ x t, y t mi l w t, Ψx t, y t 5 his is the kid of regret which is relevat for actually redictig the values y t well based o the oisy istaces Ufortuately, i geeral this is too much to hoe for o see why, assume we deal with a liear kerel so that Ψx x, ad assume lw, x, y w, x y 2 Now, suose that the adversary icks some w 0 i W, which might be eve kow to the learer, ad at each roud t rovides the examle w / w, It is easy to verify that Eq 5 i this case equals w t, x t 2 0 Recall that the learer chooses w t before x t is revealed herefore, if the oise which leads to x t has ositive variace, it will geerally be imossible for the learer to choose w t such that w t, x t is arbitrarily close to herefore, the equatio above caot grow sub-liearly with B Proof of hm 2 he aalysis i this subsectio is similar to the oe of Subsectio 52, focusig o Gaussia kerels Namely, we assume here that the kerel kx, x is equal to e x x 2 /σ 2 for some σ 2 > 0 We start by costructig a exlicit feature maig Ψ corresodig to the RKHS iduced by our kerel For ay x, x, we have that kx, x e x x 2 /σ 2 e x 2 /σ 2 e x 2 /σ 2 e 2 x,x /σ 2 e x 2 /σ 2 e x 2 /σ 2 2 x, x σ 2! e x 2 /σ 2 e x 2 /σ 2 d k d k 2/σ 2 x k x k x k! x k his suggests the followig feature reresetatio: for ay x, Ψx returs a ifiite-dimesioal vector, idexed by ad k,, k {,, d}, with the etry corresodig to, k,, k beig e x 2 /σ 2 2/σ 2! x k x k he dot roduct betwee Ψx ad Ψx is similar to a stadard dot roduct betwee two vectors, ad by the derivatio above equals kx, x as required he idea of derivig a ubiased estimate of Ψx is the followig: first, we samle N, N 2 ideedetly accordig to PN PN 2 2 / + he, we query the oracle for x for 2N + N 2 times to get x,, x 2N+N2, ad formally defie Ψx as Ψx N N+N2+2 2 N2 N!N 2!σ 2N+2N2 2 N x 2j, x 2j j d x 2N+ k k,,k N2 x 2N+N2 k N2 e N2,k,,k N2 6 where e N2,k,,k N2 reresets the uit vector i the directio idexed by N 2, k,, k N2 as exlaied above Sice the oracle calls are iid, it is ot hard to verify that the exectatio of the exressio

15 above is + + x,!σ 2 x 2+ 2 d 2 2+ x 0 2!σ 22 k x k2 e 2,k,,k 2 20 k,,k 2 x 2 /σ 2 2/σ 2 d 2 x k x k2 e 2,k! 0 2!,,k 2 20 k,,k 2 d e x 2 /σ 2 2/σ 2 2 x k x k2 e 2,k 2!,,k 2 20 k,,k 2 which is exactly Ψx as defied above o actually store Ψx i memory, we simly kee ad x,, x 2N+N2 he reresetatio above is used imlicitly whe we calculate dot roducts betwee Ψx ad other elemets i the RKHS, via the subroutie Prod We formalize the eeded roerties of Ψx i the followig lemma Lemma 5 Assumig the costructio of Ψx as i the discussio above, it holds that E t Ψx] Ψx for all x Moreover, if the oisy samle x t retured by the oracle A t satisfies E x t 2 ] B x, the E Ψx t 2] 2 e B x+2 B x/σ 2 Proof he first art of the lemma follows from the discussio above As to the secod art, ote that by 6, we have that Ψx t 2 2N+2N N2 N N!N 2!σ 2N+2N2 2 2 x 2j, x 2j 2 d x 2N+ k x 2N+N2 2N+2N N2 N!N 2!σ 2N+2N2 2 2 j k,,k N2 N x 2j, x 2j 2 N 2 x N+j 2 j 2N+2N N2 N!N 2!σ 2N+2N2 2 2 B B2N N2 x x j k N2 2 he exectatio of this exressio over N, N 2 is equal to !σ 2 2 B2 x !σ 22 2 B2 x 20 2 B 2 x 4 2 B x 2!σ !σ B x /σ B x /σ 2 2 2! 0 2! 20 2 B x /σ B x /σ e B x+2 B x/σ 2! 0 2! 20 After the discussio above, the seudocode for Ma Estimate below should be self-exlaatory Subroutie 5 Ma EstimateA t, Samle N accordig to PN / + Samle N 2 accordig to PN 2 2 / 2+ Query A t for 2N + N 2 times to get x t Retur x t,, x 2N+N2 t as Ψx t,, x 2N+N2 t

16 We ow tur to the subroutie Prod, which give two elemets i the RKHS, returs their dot roduct his subroutie comes i two flavors: either as a rocedure defied over Ψx, Ψx ad returig Ψx, Ψx Subroutie 6; or as a rocedure defied over Ψx, x Subroutie 7, where the secod elemet is a exlicitly give vector ad returig Ψx, Ψx his secod variat of Prod is eeded whe we wish to aly the hyothesis o a ew kow istace x Subroutie 6 Prod Ψx, Ψx Let x,, x 2+2 be the vectors comrisig Ψx Let x,, x be the vectors comrisig Ψx If retur 0, else retur!! 2! 2 σ j x2j, x 2j j x 2j, x 2j 2 j x2+j, x 2 +j he roof of the followig lemma is a straightforward algebraic exercise, similar to the roof of Lemma 4 Lemma 6 Prod Ψx, Ψx returs Ψx, Ψx he seudocode for calculatig the dot roduct Ψx, Ψx where x is kow is very similar, ad the roof is essetially the same Subroutie 7 Prod Ψx, x Let x,, x 2+2 be the vectors comrisig Ψx Retur ! 2! 2 σ e x 2 /σ 2 2 x 2j, x 2j x 2+j, x j j We are ow ready to rove hm 2 First, regardig the exected umber of queries, otice that to ru Algorithm, we ivoke Ma Estimate ad Grad Legth Estimate oce at roud t Ma Estimate uses a radom umber 2B + B 2 of queries, where B, B 2 are ideedet ad distributed as PB PB 2 / + Grad Legth Estimate ivokes Ma Estimate a radom umber C of times, where PC / + he total umber of queries is therefore C+ j 2B j, + B j,2, where B j,, B j,2 are iid coies of B, B 2 resectively he exected value of this exressio, usig a stadard result o the exected value of a sum of a radom umber of radom variables, is equal to + EC]2EB j, ] + EB j,2 ], or I terms of ruig time, the aalysis is comletely idetical to the oe erformed i the roof of hm, ad the exected ruig time is the same u to costats he regret boud i the theorem follows from hm 4, with the exressios for costats followig from Lemma 2, Lemma 5, ad Lemma 6 C Proof of Examles 3 ad 4 Examles 3 ad 4 use the error fuctio Erfa i order to costruct smooth aroximatios of the hige loss ad the absolute loss see Fig he error fuctio is useful for our uroses, sice it is aalytic i all of R, ad smoothly iterolates betwee for a 0 ad for a 0 hus, it ca be used to aroximate derivative of losses which are iecewise liear, such as the hige loss la max{0, a} ad the absolute loss la a o aroximate the absolute loss, we use the atiderivative of Erfsa his fuctio reresets a smooth uer boud o the absolute loss, which becomes tighter as s icreases It ca be verified that the atiderivative with the costat free arameter fixed so the fuctio has the desired behavior is la a Erfsa + e s2 a 2 σ π

17 While this loss fuctio may seem to have slightly comlex form, we ote that our algorithm oly eeds to calculate the derivative of this loss fuctio at various oits amely Erfsa for various values of a, which ca be easily doe By a aylor exasio of the error fuctio, we have that herefore, l +a i this case is at most l a 2 π 2 sa 2+ π!2 + 2 as π sa 2+!2 + sa 2+ +! 2 as e σ2 a 2 π We ow tur to deal with Examle 4 his time, we use the atiderivative of Erfsa /2 his fuctio smoothly iterolates betwee for a ad 0 for a 0 herefore, its atiderivative with resect to x reresets a smooth uer boud o the hige loss, which becomes tighter as s icreases It ca be verified that the atiderivative with the costat free arameter fixed so the fuctio has the desired behavior is la a Erfsa 2 By a aylor exasio of the error fuctio, we have that l a 2 + π hus, l +a i this case ca be uer bouded by 2 + π D Proof of hm 4 sa 2+! as π a 2 + e s2 2 πs sa 2+!2 + sa 2+ +! 2 + as e s2 a 2 π Our algorithm corresods to Zikevich s algorithm 7] i a fiite horizo settig, where we assume the sequece of examles is g Ψx,, g Ψx, the cost fuctio is liear, ad the learig rate at roud t is η/ By a straightforward adatatio of the stadard regret boud for that algorithm see 7], we have that for ay w such that w 2 B w, w t, g t Ψxt w, g t Ψxt B w 2 η + η g t Ψxt 2 We ow take exectatio of both sides i the iequality above he exectatio of the right-had side is simly ] B w E 2 η + η ] 2 E t g t Et Ψx t 2] Bw 2 η + ηb gb Ψ As to the left-had side, ote that ] E w t, g t Ψxt E E t w t, g t Ψxt ] ] E w t, y t l y t w t, Ψx t ] Ψx t Also, ] E w, g t Ψxt w, l y t w t, Ψx t Ψx t Pluggig i these exectatios ad choosig η Bw B gb, we get that for ay w such that w 2 Ψ B w, E w t, y t l y t w t, Ψx t Ψx t w, l y t w t, Ψx t ] Ψx t B w B g B Ψ

Confidence Intervals

Confidence Intervals Cofidece Itervals Berli Che Deartmet of Comuter Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Referece: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chater 5 & Teachig Material Itroductio

More information

Online Learning of Noisy Data with Kernels

Online Learning of Noisy Data with Kernels Online Learning of Noisy Data with Kernels Nicolò Cesa-Bianchi Università degli Studi di Milano cesa-bianchi@dsiunimiit Shai Shalev Shwartz The Hebrew University shais@cshujiacil Ohad Shamir The Hebrew

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

tests 17.1 Simple versus compound

tests 17.1 Simple versus compound PAS204: Lecture 17. tests UMP ad asymtotic I this lecture, we will idetify UMP tests, wherever they exist, for comarig a simle ull hyothesis with a comoud alterative. We also look at costructig tests based

More information

13.1 Shannon lower bound

13.1 Shannon lower bound ECE598: Iformatio-theoretic methods i high-dimesioal statistics Srig 016 Lecture 13: Shao lower boud, Fao s method Lecturer: Yihog Wu Scribe: Daewo Seo, Mar 8, 016 [Ed Mar 11] I the last class, we leared

More information

Hybridized Heredity In Support Vector Machine

Hybridized Heredity In Support Vector Machine Hybridized Heredity I Suort Vector Machie May 2015 Hybridized Heredity I Suort Vector Machie Timothy Idowu Yougmi Park Uiversity of Wiscosi-Madiso idowu@stat.wisc.edu yougmi@stat.wisc.edu May 2015 Abstract

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

ECE534, Spring 2018: Solutions for Problem Set #2

ECE534, Spring 2018: Solutions for Problem Set #2 ECE534, Srig 08: s for roblem Set #. Rademacher Radom Variables ad Symmetrizatio a) Let X be a Rademacher radom variable, i.e., X = ±) = /. Show that E e λx e λ /. E e λx = e λ + e λ = + k= k=0 λ k k k!

More information

ECE534, Spring 2018: Final Exam

ECE534, Spring 2018: Final Exam ECE534, Srig 2018: Fial Exam Problem 1 Let X N (0, 1) ad Y N (0, 1) be ideedet radom variables. variables V = X + Y ad W = X 2Y. Defie the radom (a) Are V, W joitly Gaussia? Justify your aswer. (b) Comute

More information

Classification of DT signals

Classification of DT signals Comlex exoetial A discrete time sigal may be comlex valued I digital commuicatios comlex sigals arise aturally A comlex sigal may be rereseted i two forms: jarg { z( ) } { } z ( ) = Re { z ( )} + jim {

More information

A Note on Sums of Independent Random Variables

A Note on Sums of Independent Random Variables Cotemorary Mathematics Volume 00 XXXX A Note o Sums of Ideedet Radom Variables Pawe l Hitczeko ad Stehe Motgomery-Smith Abstract I this ote a two sided boud o the tail robability of sums of ideedet ad

More information

THE INTEGRAL TEST AND ESTIMATES OF SUMS

THE INTEGRAL TEST AND ESTIMATES OF SUMS THE INTEGRAL TEST AND ESTIMATES OF SUMS. Itroductio Determiig the exact sum of a series is i geeral ot a easy task. I the case of the geometric series ad the telescoig series it was ossible to fid a simle

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Putnam Training Exercise Counting, Probability, Pigeonhole Principle (Answers)

Putnam Training Exercise Counting, Probability, Pigeonhole Principle (Answers) Putam Traiig Exercise Coutig, Probability, Pigeohole Pricile (Aswers) November 24th, 2015 1. Fid the umber of iteger o-egative solutios to the followig Diohatie equatio: x 1 + x 2 + x 3 + x 4 + x 5 = 17.

More information

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts Basics of Iferece Lecture 21: Sta230 / Mth230 Coli Rudel Aril 16, 2014 U util this oit i the class you have almost exclusively bee reseted with roblems where we are usig a robability model where the model

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Chapter 6: BINOMIAL PROBABILITIES

Chapter 6: BINOMIAL PROBABILITIES Charles Bocelet, Probability, Statistics, ad Radom Sigals," Oxford Uiversity Press, 016. ISBN: 978-0-19-00051-0 Chater 6: BINOMIAL PROBABILITIES Sectios 6.1 Basics of the Biomial Distributio 6. Comutig

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Notes on the prime number theorem

Notes on the prime number theorem Notes o the rime umber theorem Keji Kozai May 2, 24 Statemet We begi with a defiitio. Defiitio.. We say that f(x) ad g(x) are asymtotic as x, writte f g, if lim x f(x) g(x) =. The rime umber theorem tells

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

A Central Limit Theorem for Belief Functions

A Central Limit Theorem for Belief Functions A Cetral Limit Theorem for Belief Fuctios Larry G. Estei Kyougwo Seo November 7, 2. CLT for Belief Fuctios The urose of this Note is to rove a form of CLT (Theorem.4) that is used i Estei ad Seo (2). More

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467a: Crytograhy ad Comuter Security Notes 16 (rev. 1 Professor M. J. Fischer November 3, 2008 68 Legedre Symbol Lecture Notes 16 ( Let be a odd rime,

More information

Round-off Errors and Computer Arithmetic - (1.2)

Round-off Errors and Computer Arithmetic - (1.2) Roud-off Errors ad Comuter Arithmetic - (1.) 1. Roud-off Errors: Roud-off errors is roduced whe a calculator or comuter is used to erform real umber calculatios. That is because the arithmetic erformed

More information

2 Scalable algorithms, Scheduling, and a glance at All Prefix Sum

2 Scalable algorithms, Scheduling, and a glance at All Prefix Sum CME 323: Distributed Algorithms ad Otimizatio, Srig 2017 htt://staford.edu/~rezab/dao. Istructor: Reza Zadeh, Matroid ad Staford. Lecture 2, 4/5/2017. Scribed by Adreas Satucci. 2 Scalable algorithms,

More information

Distribution of Sample Proportions

Distribution of Sample Proportions Distributio of Samle Proortios Probability ad statistics Aswers & Teacher Notes TI-Nsire Ivestigatio Studet 90 mi 7 8 9 10 11 12 Itroductio From revious activity: This activity assumes kowledge of the

More information

New Definition of Density on Knapsack Cryptosystems

New Definition of Density on Knapsack Cryptosystems Africacryt008@Casablaca 008.06.1 New Defiitio of Desity o Kasac Crytosystems Noboru Kuihiro The Uiversity of Toyo, Jaa 1/31 Kasac Scheme rough idea Public Key: asac: a={a 1, a,, a } Ecrytio: message m=m

More information

Confidence intervals for proportions

Confidence intervals for proportions Cofidece itervals for roortios Studet Activity 7 8 9 0 2 TI-Nsire Ivestigatio Studet 60 mi Itroductio From revious activity This activity assumes kowledge of the material covered i the activity Distributio

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

COMPUTING FOURIER SERIES

COMPUTING FOURIER SERIES COMPUTING FOURIER SERIES Overview We have see i revious otes how we ca use the fact that si ad cos rereset comlete orthogoal fuctios over the iterval [-,] to allow us to determie the coefficiets of a Fourier

More information

ON SUPERSINGULAR ELLIPTIC CURVES AND HYPERGEOMETRIC FUNCTIONS

ON SUPERSINGULAR ELLIPTIC CURVES AND HYPERGEOMETRIC FUNCTIONS ON SUPERSINGULAR ELLIPTIC CURVES AND HYPERGEOMETRIC FUNCTIONS KEENAN MONKS Abstract The Legedre Family of ellitic curves has the remarkable roerty that both its eriods ad its suersigular locus have descritios

More information

( ) = is larger than. the variance of X V

( ) = is larger than. the variance of X V Stat 400, sectio 6. Methods of Poit Estimatio otes by Tim Pilachoski A oit estimate of a arameter is a sigle umber that ca be regarded as a sesible value for The selected statistic is called the oit estimator

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

[ 47 ] then T ( m ) is true for all n a. 2. The greatest integer function : [ ] is defined by selling [ x]

[ 47 ] then T ( m ) is true for all n a. 2. The greatest integer function : [ ] is defined by selling [ x] [ 47 ] Number System 1. Itroductio Pricile : Let { T ( ) : N} be a set of statemets, oe for each atural umber. If (i), T ( a ) is true for some a N ad (ii) T ( k ) is true imlies T ( k 1) is true for all

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

SYMMETRIC POSITIVE SEMI-DEFINITE SOLUTIONS OF AX = B AND XC = D

SYMMETRIC POSITIVE SEMI-DEFINITE SOLUTIONS OF AX = B AND XC = D Joural of Pure ad Alied Mathematics: Advaces ad Alicatios olume, Number, 009, Pages 99-07 SYMMERIC POSIIE SEMI-DEFINIE SOLUIONS OF AX B AND XC D School of Mathematics ad Physics Jiagsu Uiversity of Sciece

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH

BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH BIOSAISICAL MEHODS FOR RANSLAIONAL & CLINICAL RESEARCH Direct Bioassays: REGRESSION APPLICAIONS COMPONENS OF A BIOASSAY he subject is usually a aimal, a huma tissue, or a bacteria culture, he aget is usually

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 3 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe

More information

3.1. Introduction Assumptions.

3.1. Introduction Assumptions. Sectio 3. Proofs 3.1. Itroductio. A roof is a carefully reasoed argumet which establishes that a give statemet is true. Logic is a tool for the aalysis of roofs. Each statemet withi a roof is a assumtio,

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

UNIFORM RATES OF ESTIMATION IN THE SEMIPARAMETRIC WEIBULL MIXTURE MODEL. BY HEMANT ISHWARAN University of Ottawa

UNIFORM RATES OF ESTIMATION IN THE SEMIPARAMETRIC WEIBULL MIXTURE MODEL. BY HEMANT ISHWARAN University of Ottawa The Aals of Statistics 1996, Vol. 4, No. 4, 1571585 UNIFORM RATES OF ESTIMATION IN THE SEMIPARAMETRIC WEIBULL MIXTURE MODEL BY HEMANT ISHWARAN Uiversity of Ottawa This aer resets a uiform estimator for

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

John H. J. Einmahl Tilburg University, NL. Juan Juan Cai Tilburg University, NL

John H. J. Einmahl Tilburg University, NL. Juan Juan Cai Tilburg University, NL Estimatio of the margial exected shortfall Laures de Haa, Poitiers, 202 Estimatio of the margial exected shortfall Jua Jua Cai Tilburg iversity, NL Laures de Haa Erasmus iversity Rotterdam, NL iversity

More information

Almost all hyperharmonic numbers are not integers

Almost all hyperharmonic numbers are not integers Joural of Number Theory 7 (207) 495 526 Cotets lists available at ScieceDirect Joural of Number Theory www.elsevier.com/locate/jt Almost all hyerharmoic umbers are ot itegers Haydar Göral a, Doğa Ca Sertbaş

More information

Unit 5. Hypersurfaces

Unit 5. Hypersurfaces Uit 5. Hyersurfaces ================================================================= -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002 ECE 330:541, Stochastic Sigals ad Systems Lecture Notes o Limit Theorems from robability Fall 00 I practice, there are two ways we ca costruct a ew sequece of radom variables from a old sequece of radom

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Elliptic Curves Spring 2017 Problem Set #1

Elliptic Curves Spring 2017 Problem Set #1 18.783 Ellitic Curves Srig 017 Problem Set #1 These roblems are related to the material covered i Lectures 1-3. Some of them require the use of Sage; you will eed to create a accout at the SageMathCloud.

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

A REFINEMENT OF JENSEN S INEQUALITY WITH APPLICATIONS. S. S. Dragomir 1. INTRODUCTION

A REFINEMENT OF JENSEN S INEQUALITY WITH APPLICATIONS. S. S. Dragomir 1. INTRODUCTION TAIWANESE JOURNAL OF MATHEMATICS Vol. 14, No. 1,. 153-164, February 2010 This aer is available olie at htt://www.tjm.sysu.edu.tw/ A REFINEMENT OF JENSEN S INEQUALITY WITH APPLICATIONS FOR f-divergence

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

An operator equality involving a continuous field of operators and its norm inequalities

An operator equality involving a continuous field of operators and its norm inequalities Available olie at www.sciecedirect.com Liear Algebra ad its Alicatios 49 (008) 59 67 www.elsevier.com/locate/laa A oerator equality ivolvig a cotiuous field of oerators ad its orm iequalities Mohammad

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 10 October Minimaxity and least favorable prior sequences

Lecture 10 October Minimaxity and least favorable prior sequences STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Confidence Intervals for the Difference Between Two Proportions

Confidence Intervals for the Difference Between Two Proportions PASS Samle Size Software Chater 6 Cofidece Itervals for the Differece Betwee Two Proortios Itroductio This routie calculates the grou samle sizes ecessary to achieve a secified iterval width of the differece

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

a. How might the Egyptians have expressed the number? What about?

a. How might the Egyptians have expressed the number? What about? A-APR Egytia Fractios II Aligmets to Cotet Stadards: A-APR.D.6 Task Aciet Egytias used uit fractios, such as ad, to rereset all other fractios. For examle, they might exress the umber as +. The Egytias

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

ENGI 4421 Discrete Probability Distributions Page Discrete Probability Distributions [Navidi sections ; Devore sections

ENGI 4421 Discrete Probability Distributions Page Discrete Probability Distributions [Navidi sections ; Devore sections ENGI 441 Discrete Probability Distributios Page 9-01 Discrete Probability Distributios [Navidi sectios 4.1-4.4; Devore sectios 3.4-3.6] Chater 5 itroduced the cocet of robability mass fuctios for discrete

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

CONSTRUCTING TRUNCATED IRRATIONAL NUMBERS AND DETERMINING THEIR NEIGHBORING PRIMES

CONSTRUCTING TRUNCATED IRRATIONAL NUMBERS AND DETERMINING THEIR NEIGHBORING PRIMES CONSTRUCTING TRUNCATED IRRATIONAL NUMBERS AND DETERMINING THEIR NEIGHBORING PRIMES It is well kow that there exist a ifiite set of irratioal umbers icludig, sqrt(), ad e. Such quatities are of ifiite legth

More information

PUTNAM TRAINING PROBABILITY

PUTNAM TRAINING PROBABILITY PUTNAM TRAINING PROBABILITY (Last udated: December, 207) Remark. This is a list of exercises o robability. Miguel A. Lerma Exercises. Prove that the umber of subsets of {, 2,..., } with odd cardiality

More information

Nuclear Physics Worksheet

Nuclear Physics Worksheet Nuclear Physics Worksheet The ucleus [lural: uclei] is the core of the atom ad is comosed of articles called ucleos, of which there are two tyes: rotos (ositively charged); the umber of rotos i a ucleus

More information

A Note on Bilharz s Example Regarding Nonexistence of Natural Density

A Note on Bilharz s Example Regarding Nonexistence of Natural Density Iteratioal Mathematical Forum, Vol. 7, 0, o. 38, 877-884 A Note o Bilharz s Examle Regardig Noexistece of Natural Desity Cherg-tiao Perg Deartmet of Mathematics Norfolk State Uiversity 700 Park Aveue,

More information

MA131 - Analysis 1. Workbook 3 Sequences II

MA131 - Analysis 1. Workbook 3 Sequences II MA3 - Aalysis Workbook 3 Sequeces II Autum 2004 Cotets 2.8 Coverget Sequeces........................ 2.9 Algebra of Limits......................... 2 2.0 Further Useful Results........................

More information

ON SOME NEW SEQUENCE SPACES OF NON-ABSOLUTE TYPE RELATED TO THE SPACES l p AND l I. M. Mursaleen and Abdullah K. Noman

ON SOME NEW SEQUENCE SPACES OF NON-ABSOLUTE TYPE RELATED TO THE SPACES l p AND l I. M. Mursaleen and Abdullah K. Noman Faculty of Scieces ad Mathematics, Uiversity of Niš, Serbia Available at: htt://www.mf.i.ac.rs/filomat Filomat 25:2 20, 33 5 DOI: 0.2298/FIL02033M ON SOME NEW SEQUENCE SPACES OF NON-ABSOLUTE TYPE RELATED

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Hypothesis Testing. H 0 : θ 1 1. H a : θ 1 1 (but > 0... required in distribution) Simple Hypothesis - only checks 1 value

Hypothesis Testing. H 0 : θ 1 1. H a : θ 1 1 (but > 0... required in distribution) Simple Hypothesis - only checks 1 value Hyothesis estig ME's are oit estimates of arameters/coefficiets really have a distributio Basic Cocet - develo regio i which we accet the hyothesis ad oe where we reject it H - reresets all ossible values

More information

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

L S => logf y i P x i ;S

L S => logf y i P x i ;S Three Classical Tests; Wald, LM(core), ad LR tests uose that we hae the desity y; of a model with the ull hyothesis of the form H ; =. Let L be the log-likelihood fuctio of the model ad be the MLE of.

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

A Construction That Produces Wallis-Type Formulas

A Construction That Produces Wallis-Type Formulas Advaces i Pure Mathematics 03 3 579-585 htt://dxdoiorg/0436/am0336074 Published Olie Setember 03 (htt://scirorg/joural/am) A Costructio That Produces Wallis-Tye Formulas Joshua M Fitzhugh David L Farsorth

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

6.883: Online Methods in Machine Learning Alexander Rakhlin

6.883: Online Methods in Machine Learning Alexander Rakhlin 6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform

More information