Empirical Bernstein Inequality for Martingales : Application to Online Learning
|
|
- Russell Greer
- 6 years ago
- Views:
Transcription
1 Empirical Berstei Iequality for Martigales : Applicatio to Olie Learig Thomas Peel, Sadrie Athoie, Liva Ralaivola To cite this versio: Thomas Peel, Sadrie Athoie, Liva Ralaivola. Empirical Berstei Iequality for Martigales : Applicatio to Olie Learig <hal > HAL Id: hal Submitted o 5 Nov 203 HAL is a multi-discipliary ope access archive for the deposit ad dissemiatio of scietific research documets, whether they are published or ot. The documets may come from teachig ad research istitutios i Frace or abroad, or from public or private research ceters. L archive ouverte pluridiscipliaire HAL, est destiée au dépôt et à la diffusio de documets scietifiques de iveau recherche, publiés ou o, émaat des établissemets d eseigemet et de recherche fraçais ou étragers, des laboratoires publics ou privés.
2 Empirical Berstei Iequality for Martigales : Applicatio to Olie Learig Thomas Peel,2 Sadrie Athoie Liva Ralaivola 2 Aix-Marseille Uiversité - CNRS LATP, UMR 7353 Marseille, Frace 2 Aix-Marseille Uiversité - CNRS LIF, UMR 7279 Marseille, Frace Abstract I this article we preset a ew empirical Berstei iequality for bouded martigale differece sequeces. This iequality refies the oe by Freedma [975 ad is the used i order to boud the average risk of the hypotheses durig a olie learig process. We show theoretical ad empirical evideces of the tightess of our result compared with the state of the art boud provided by Cesa- Biachi ad Getile [2008. INTRODUCTION The motivatio behid this work comes from the wish to aalyze the risk of the models (or hypotheses) produced by a olie learig algorithm. Such a algorithm works icremetally o a sequece of idepedet ad idetically distributed (i.i.d.) radom variables. At each step, it receives a example that is used i order to update the curret model parameters. Oce this update is doe, the performace of the ew hypothesis is measured by evaluatig its loss o the ext example of the sequece ad so o. By averagig these losses, oe ca defie a statistic ˆR called empirical istataeous risk. The risk of a model is simply the expectatio of its loss o a ew usee example give the sequece of data used i its costructio. I their recet works, Cesa-Biachi et al. [2004 ad Cesa-Biachi ad Getile [2008 show how the statistic ˆR ca be used for selectig a hypothesis with a low risk. The key tool i their aalyses is the use of cocetratio iequalities for martigales (Azuma-Hoeffdig, Berstei). Ideed, the depedecies existig betwee the hypotheses that are iheret to olie learig processes prevet the use of stadard cocetratio iequalities that require idepedece. Berstei (secod-order) iequalities are kow to be tighter tha their first-order couterparts. However, the variace is i geeral ukow ad eed to be upper bouded. Recet works i the batch settig have proposed a empirical (data-depedet) versio of the Berstei iequality [Maurer ad Potil, 2009, Peel et al., 200 where a estimator of the variace is used as upper boud. However, these iequalities are ot applicable to the olie learig settig. I this paper, we propose a ew Berstei iequality for bouded martigale differece sequeces (Theorem 2) that takes advatage of the statistic ˆV, a istataeous estimator of the variace. This iequality is the used i order to refie the tail boud by Cesa-Biachi ad Getile [2008. Briefly, we show that uder the same assumptios they make, the average risk of the hypotheses produced by a olie learig algorithm is bouded with high probability by ˆR + β l l, where β is a fuctio of ˆV we will detail later. This boud ca be applied to ay olie algorithm ad as a example we show how to use it to characterize the average risk of the hypotheses produced by Pegasos [Shalev-Shwartz et al., 20, a stochastic method for solvig the SVM optimizatio problem. We wat to emphasize that the scope of our ew empirical Berstei iequality for martigales goes far beyod ay applicatio to olie learig processes. The paper is orgaized as follows. I Sectio 2 we recall a few fudametal otios about martigales ad the classical cocetratio iequalities associated with this kid of radom processes. Sectio 3 presets the mai result of this paper, a cocetratio iequality that takes advatage of a secod order empirical iformatio i the martigale settig. This oe is the applied i Sectio 4 to get a boud o the mea geeralizatio error made by the hypotheses leared durig a olie learig process. This boud substatially improves the results metioed above. We ed this
3 Empirical Berstei Iequality for Martigales : Applicatio to Olie Learig paper with Sectio 5, a direct cosequece of the previous iequalities that let us boud the mea risk of the weight vectors geerated durig a ru of the Pegasos algorithm. 2 PRELIMINARIES This sectio briefly remids basic otios about the martigale theory ad the classical cocetratio iequalities associated with this kid of stochastic processes. 2. Martigale ad Martigale Differece Sequece Defiitio (Martigale). A sequece {M : 0 < } of radom variables is said to be a martigale with respect to the sequece of radom variables {X : < } if the sequece {M 0,...,M } has two basic properties. The first oe is that for each there is a fuctio f : R R such that M = f (X,X 2,...,X ). The secod property is that the sequece {M } satisfies for all : E[ M < () E[M X,...,X = M. (2) Give this defiitio of a martigale, we ow defie a martigale differece sequece. Defiitio 2 (Martigale differece sequece). We say that a sequece of radom variables {Y : 0 < } is a martigale differece sequece (mds) if the sequece {Y } satisfies the followig properties for all : E[ Y < (3) E[Y Y,...,Y = 0. (4) By costructio, this implies that if the sequece {M } is a martigale the the sequece {Y = M M } is a martigale differece sequece. We ow itroduce two well-kow cocetratio iequalities about the sum of the icremets of a mds that we will use i the ext sectios. 2.2 Azuma-Hoeffdig Iequality The Azuma-Hoeffdig iequality [Hoeffdig, 963, Azuma, 967 gives a result about the cocetratio of the values of a martigale with bouded icremets aroud his iitial value M 0. Theorem (Azuma-Hoeffdig iequality). Let {M } be a martigale ad defie {Y = M M } the associated martigale differece sequece such that Y i c i for all i. The, for all ǫ > 0 [ P Y i = M M 0 ǫ i= ( exp ǫ 2 ) 2 i= c. i 2 (5) This result makes it possible to exted the Hoeffdig iequality [Hoeffdig, 963 to the case where the radom variables of iterest are ot ecessarily idepedet. Corollary. Let X,...,X be a sequece of radom variables such that for all i we have E[X i X,...,X i X i c i. Set S = i= X i, the for all ǫ > 0 [ P E[X i X,...,X i S ǫ i= ( exp ǫ 2 ) 2 i= c. (6) i 2 Proof. A direct applicatio of Theorem to the martigale differece sequece {Y } such that Y i = E[X i X,...,X i X i gives the result. 2.3 Berstei Iequality for Martigales The iequality we recall i the followig lemma is a cosequece of the Berstei iequality for martigales give i Freedma [975. This lemma exteds the classical Berstei iequality [Beett, 962 which requires idepedece betwee the radom variables X i uder cosideratio. This limitatio is overcome by lookig at the martigale differece sequece {Y = E[X X,...,X X }. Lemma (Berstei iequality for martigales). Suppose X,...,X is a sequece of radom variables such that 0 X i. Defie the martigale differece sequece {Y = E[X X,...,X X } ad ote K the sum of the coditioal variaces K = V[X X,...,X. (7) Let S = i= X i, the for all ǫ,v 0, [ P E[X X,...,X S ǫ,k k i= ( exp ǫ 2 2k +2ǫ/3 ). (8) As we shall see, this lemma is cetral i our aalysis as it was i the work by Cesa-Biachi ad Getile [2008.
4 Thomas Peel,2, Sadrie Athoie, Liva Ralaivola 2 3 EMPIRICAL BERNSTEIN INEQUALITY FOR MARTINGALES Secod order Berstei iequalities are kow to be tighter tha their first order couterparts thaks to the variace term. However, i practice, this term ofte ca ot be evaluated ad it is commo to upper boud it by the expectatio (makig the assumptio that the radom variables uder iterest are bouded by ) i order to compute the whole iequality. We propose aother approach based o the use of a istataeous estimator of the variace istead of the usual approach. By doig so, we hope to get a tighter iequality without ay a priori assumptio o the uderlyig distributio of the radom variables. This sectio presets the mai result of the paper, a refied versio of Berstei iequality for martigales recalled above where the sum of coditioal variaces is upper bouded usig a istataeous estimator. We first itroduce the iequality reversal lemma, which allows us to trasform tail iequalities ito upper bouds (or cofidece itervals). This lemma has bee used by Peel et al. [200 to prove their empirical Berstei iequality for U-Statistics. Lemma 2 (Iequality reversal lemma). Let X be a radom variable ad a,b>0,c,d 0 such that } ε > 0, P X [ X ε aexp { bε2, (9) c+dε the, with probability at least c X b l a + d b l a. (0) Proof. Solvig forεsuch that the right had side of (9) is equal to gives: ε = (dl a 2b + d 2 l 2 a +4bcl a ). Usig a+b a + b gives a upper boud o ε ad provides the result. We use the otatio f {Zt} i order to idicate a fuctio determied by the sequece of radom variables {Z t } = {Z,...,Z t } i.e. the expressio of f {Zt} is fixed by the sequece {Z t }. The ext theorem is the mai result of this paper. Theorem 2 (Empirical Berstei iequality for martigales). Let Z,...,Z be a sequece of radom variables followig the same probability distributio D such that Z t+,z t+2 are coditioally idepedet give {Z t }, for all t. Suppose {f {Zt}} is a family of fuctios which take their values i [0,, the for all 0 < we have with probability at least E [ f {Zt}(Z t+ ) Z,...,Z t where ad f {Zt}(Z t+ )+ β l i= ˆV = 2 β = ˆV l, () ( ) 2 2 l, (2) ( f{zt}(z t+ ) f {Zt}(Z t+2 ) ) 2. (3) I a utshell, the message carried by this theorem is that it is possible to use a istataeous variace estimator to quatify the deviatio of the sum i= f {Z t}(z t+ ) from its expected value. I order to prove the previous cocetratio iequality, we eed a itermediate result about the coditioal variace estimator itroduced i Equatio (3). I essece, the followig lemma allows us to quatify the deviatio of this estimator from the sum V of the coditioal variaces: V = V [ f {Zt}(Z) Z,...,Z t. (4) Lemma 3. Let Z,...,Z be a sequece of radom variables followig the same probability distributio D such that Z t+,z t+2 are coditioally idepedet give {Z t }, for all t. Suppose {f {Zt}} is a family of fuctios which take their values i [0,, the for all 0 <, P [ V ˆV + 2 l ( ). (5) Proof. We begi this proof by defiig the sequece of radom variables {M } such that for all t, M t = 2( f{zt}(z t+ ) f {Zt}(Z t+2 ) ) 2, ad the associated martigale differece sequece {A = E[M Z,...,Z M }. Usig the fact that the Z t follow the same distributio ad that Z t+, Z t+2 are coditioally idepedet we get that E[M t Z,...,Z t = V [ f {Zt}(Z) Z,...,Z t.
5 Empirical Berstei Iequality for Martigales : Applicatio to Olie Learig It follows that A t = V [ f {Zt}(Z) Z,...,Z t ˆV = V ˆV. Notig that M t [0, 2 because f takes its values i [0, etails E[M t Z,...,Z t [0, 2 ad furthermore each term of the sequece {A } is bouded: 2 A t 2. Cosequetly {A } is a bouded martigale differece sequece o which we ca apply the Azuma-Hoeffdig iequality (Theorem ) to obtai P [ V ˆV ǫ exp ) ( 2ǫ2. We coclude the proof by usig Lemma 2. Thaks to this first result, we ca ow prove Theorem 2. Proof. (Theorem 2) Defie the sequece of radom variables {M } such that M i = f {Zi}(Z), ad the associated martigale differece sequece {A = E[M Z,...,Z M }. Remark that for β as i Equatio (2) ad s fixed [ [ P A t s =P A t s,v β [ +P A t s,v < β. We eed to upper boud the two parts of the right had side of the previous equatio i order to get the desired boud o the left had side. Remark that P[ A t s,v β P[V β. We use Lemma 3 to boud P[V β ad obtai [ P A t s,v β 2. (6) The, by usig the Berstei iequality for martigales (Lemma ) o the martigale differece sequece {A } we have [ P A t s,v < b ( exp s 2 2b+2s/3 ), (7) which we ca write alteratively [ P A t bl + 2 ( ) 2 3 l,v < b 2, (8) thaks to Lemma 2. We coclude the proof by settig b = β i (8) ad s = β l l, i Equatio (6). I the upcomig sectio, we use Theorem 2 i a olie learig settig. More precisely, we employ our result with the itetio of characterizig the mea of the risks R(h t ) associated with the hypotheses leared durig such a process. 4 APPLICATION TO ONLINE LEARNING Before statig the mai theorem of this sectio, we recall the olie learig settig ad defie a ew istataeous estimator of the coditioal variace well suited for a olie learig procedure. 4. Olie Learig ad Istataeous Coditioal Variace Estimator There is o formal defiitio of a olie learig process, eve i referece works as Littlestoe et al. [995 or Shalev-Shwartz [2007. Oe geerally defies it as follows. Cosider a dataset Z = {z i } i= = {(x i,y i )} i= of idepedet ad idetically distributed radom variables with respect to a ukow probability distributio D o the product space X Y. A olie learig algorithm workig with the set Z produces a set {h 0,...,h } of hypotheses where each h t : X Ỹ aims at predictig the class of a ew example x draw from D. From a iitial hypothesis h 0 ad the first datum (x,y ) the algorithm produces a ew hypothesis h. This ew hypothesis is a fuctio of the radom variable z = (x,y ) (ad the hypothesis h 0 ). It the uses the ext example (x 2,y 2 ) ad the hypothesis h to geerate a secod hypothesis h 2 ad so o. At the ed of the learig process, the algorithm outputs the set {h 0,...,h } where each hypothesis h t is costructed usig the previous hypothesis h t ad the example (x t,y t ). Thus each hypothesis h t depeds o the sequece of radom
6 Thomas Peel,2, Sadrie Athoie, Liva Ralaivola 2 variables {z,...,z t }. We use a bouded loss fuctio l : Ỹ Y R+ i order to evaluate the performace of a hypothesis. The risk of the hypothesis h t, deoted by R(h t ) = E[l(h t (X),Y) z,...,z t, is simply the expectatio of the loss fuctio l coditioally to the radom variables {z,...,z t }. Obviously, this quatity is ukow sice D is ukow. I this article, we assume that the loss fuctio is such thatl [0,Ỹ Y. It is importat to otice that this assumptio does ot limit the scope of the results preseted hereafter. A commo wish i olie learig is to characterize the mea risk R(h t ) = E[l(h t (X),Y) z,...,z t, (9) associated with the hypotheses produced by a algorithm usig a olie estimator ˆR such that ˆR = ˆR (Z ) = l(h t (x t+ ),y t+ ). (20) The hypothesis h is discarded for purely techical reasos. The quatity ˆR is ofte referred to as the average istataeous risk. It is cetral i may olie learig aalysis (see by example Cesa-Biachi et al. [2004). Each term of the previous sum is a estimator of the risk R(h t ) associated with the hypothesis h t (coditioally to the examples z,...,z t ) : E[l(h t (x t+ ),y t+ ) z,...,z t = R(h t ). The term istataeous comes from the fact that ˆR oly relies o the example (x t+,y t+ ) appearig at iteratio t + i order to evaluate the risk of h t. A state of the art result due to Cesa-Biachi ad Getile [2008 liks ˆR to R(h t): Propositio. Let h 0,...,h be the set of hypotheses geerated by a olie learig algorithm usig the bouded loss fuctio l [0,Ỹ Y. The, for all 0 <, we have with probability at least R(h t ) ˆR +2 ˆR ( ) l ˆR +3 ( ) + 36 l ˆR +3. (2) R(h t) Remark. The Gibbs classifier [McAllester, 999 is a stochastic classifier obtaied by selectig radomly a hypothesis amog a set of hypotheses, give a proba- bility distributio o these hypotheses. ca thus be see as the risk of the Gibbs classifier for a uiform distributio o the set {h 0,...,h }. The key of the result exposed i the previous propositio lies i the use of a secod order cocetratio iequality for martigales (proposed by Freedma [975) which itroduces the sum V of the coditioal variaces of the loss of each hypothesis: V = V[l(h t (x t+ ),y t+ ) z,...,z t. As R, this quatity ca ot be computed sice the distributio D is ukow. Cesa-Biachi ad Getile [2008 proposed to upper boud this sum usig a stratificatio process i order to get their iequality. I this sectio we improve the previous boud by employig Theorem 2 together with a olie estimator ˆV of the sum V, which allows for a better cotrol of the former. The average empirical istataeous variace ˆV is simply defied as ˆV = 2( ) 2 ( l(ht (x t+ ),y t+ ) l(h t (x t+2 ),y t+2 ) ) 2. (22) Agai, we discard the hypotheses h et h from this quatity for techical reasos. Each term of this sum is a estimator of the coditioal variace of l(h t (x),y): [ (l(ht E (x t+ ),y t+ ) l(h t (x t+2 ),y t+2 ) ) 2 z,...,z t = 2V[l(h t (x),y) z,...,z t. (23) ˆV may be easily computed durig a olie learig process ad plays a cetral role i the theorem we preset here. 4.2 Empirical Berstei Iequalities for Olie Learig I the followig theorem, we use Theorem 2 ad the istataeous estimators ˆR et ˆV i order to boud R(h t), the mea of the risks of the hypotheses leared by a olie algorithm. Theorem 3 (Empirical Berstei iequality for olie learig). Let h 0,...,h be the set of hypotheses geerated from the sample Z = {z i } i= = {(x i,y i )} i= of i.i.d. radom variables by a olie learig algorithm usig the bouded loss fuctio l [0,Ỹ Y. The, for all 0 < we have with probability at least : R(h t ) ˆR + where β = ( )ˆV + β l l, (24) ( ) 2 l. (25) 2
7 Empirical Berstei Iequality for Martigales : Applicatio to Olie Learig Proof. (Theorem 3) The proof is direct. Cosider the set Z = {z i } i= = {(x i,y i )} i= of i.i.d. radom variables ad the family of fuctios {l(h i ( ), )} i=0 where each fuctio l(h t ( ), ) oly depeds o the variables z,...,z t by defiitio of h t. Notig that z t+,z t+2 are idepedet with respect to z,...,z t (by defiitio of Z ), we simply apply Theorem 2 ad adjust the idexes to obtai the result. We ow wat to emphasize the compariso with the boud by Cesa-Biachi ad Getile [2008 (Equatio (2)). Our result firstly improves the costats ivolved i the boud which is very appreciable whe the boud is computed with a small umber of hypotheses (whe is small, the last term i the boud ca ot be eglected). I order to aalyze the behavior of our result whe we have a sufficiet umber of hypotheses to omit the last term, we have to pay attetio to β l l ( 2 ) ˆV +( ( )) l 3/4 2, (26) where we used the fact that a+b a+ b to get the upper boud. Thus, omittig the costat terms, our boud teds to ˆR at least i O ˆV + +, 3/4 whe the oe by Cesa-Biachi ad Getile [2008 teds to ˆR i O l(ˆr ) ˆR + l(ˆr ). I order to study the differece betwee the two rates of covergece, we eed to compare the two terms ˆV ad ˆR. ˆV = 2( ) 2 ( l(ht (x t+ ),y t+ ) l(h t (x t+2 ),y t+2 ) ) 2 2( ) 2 + ( 2 l(h t (x t+ ),y t+ ) 2 ) l(h t (x t+2 ),y t+2 ) 2 2( ) 2 + ( 2 l(h t (x t+ ),y t+ ) ) l(h t (x t+2 ),y t+2 ). The last iequality is obtaied by usig l [0,Ỹ Y. Suppose that the error made by each hypothesis h t o the example z t+2 is ot too differet from the error made by the same hypothesis o z t+ : l(h t (x t+2 ),y t+2 ) l(h t (x t+ ),y t+ ). I this case, the previous right had side is almost 2 l(h t (x t+ ),y t+ ) ( ) ˆR thus it follows that ˆV ˆR. A settig studied by Cesa-Biachi ad Getile [2008 is whe the empirical cumulative risk ˆR is i O() i.e. ˆR is bouded. Their result thus reaches a asymptotic behavior i O( ) (the terms ivolvig l(ˆr ) vaishes as a costat). With the assumptio that ˆV is io() as well, our boud shows a rate of covergece slightly worse i O( ). However, as soo as the cumulative 3/4 risk ˆR icreases with, the boud by Cesa-Biachi ad Getile [2008 coverges at the rate O( l/) whereas ours reaches a O( /) rate. Case of a Covex Loss Fuctio Whe a olie algorithm uses a covex loss fuctio l, we ca use Theorem 3 i order to characterize the risk associated with the mea hypothesis h: h = h t. (27) Whe the decisio space Ỹ associated with the classifiers h t : X Ỹ is covex the the hypothesis h belogs to the same fuctio class as each of the h t, h : X Ỹ. The mea hypothesis is thus a determiistic classifier, by oppositio to the Gibbs classifier defied earlier, which shares the same boud o its risk. Corollary 2. Let h 0,...,h be the set of all the hypotheses geerated by a olie learig algorithm usig the covex loss fuctio l such that l [0,Ỹ Y. The, for all 0 < with probability at least where R( h) ˆR + β l β = ( )ˆV l, (28) ( ) 2 l. 2 Proof. Usig Jese s iequality ad liearity of the
8 Thomas Peel,2, Sadrie Athoie, Liva Ralaivola 2 expectatio, it is easy to show that [ R( h) = E l( h t (X),Y) E[l(h t (X),Y) = R(h t ). To coclude the proof, we just eed to combie this result with Theorem 3. 5 BOUNDING THE AVERAGE RISK OF PEGASOS I this sectio, we use the previous corollary i order to derive a boud o the mea risk of the hypotheses geerated by the Pegasos [Shalev-Shwartz et al., 20 algorithm. 5. Pegasos Pegasos is a algorithm desiged to solve the primal SVM problem. Recall that give a sample Z = {z i } i= = {(x i,y i )} i= the SVM objective fuctio is give by: F(w) := λ 2 w l hige (w,x i,y i ), (29) i= where l hige (w,x,y) = max{0, y w,x }. Pegasos works i a olie fashio by doig a stochastic subgradiet descet o the SVM objective fuctio. At time t, Pegasos radomly selects a example Z it = (x ii,y it ) ad aims at miimizig the approximatio f(w t,z it ) = λ 2 w t 2 2 +l hige(w t,x it,y it ), of the SVM objective fuctio. It cosiders the followig sub-gradiet, take at poit w t, of the previous fuctio which is give by t = w t f(w t,z it ) = λw t [yit w t,x it < y i t x it, ad it updates the curret weight vector w t to w t+ by w t+ w t η t t usig a step η t = /(λt). So, we get at each iteratio the vector ( w t+ ) w t +η t t [yit w t,x it < y i t x it. A projectio step (optioal) that we detail i the sequel eds up the iteratiot. Pegasos stops whet = T, Algorithm Pegasos Require: {(x i,y i )} i=,λ 0 ad T 0 Esure: w T+ w 0 0 for t 0 to T do Pick radomly i t {,...,} Defie η t = λt if y it w t,x it < the w t+ ( η t λ)w t +η t y it x it else w t+ ( η t λ)w t ed if w t+ = mi [ / λ, w t+ w t+ 2 ed for where T is a umber of iteratio give as a parameter. Thus Pegasos ca be see as a olie algorithm workig with the sequece of examples Z i,...,z it costructed by pickig radomly at each iteratio a example from Z. Algorithm sums up the differet steps of Pegasos. 5.2 Boudig the Mea Risk of the Hypotheses Geerated by Pegasos I order to apply Theorem 3 we eed the loss fuctio to be bouded. It ca be show that w = argmi w F(w) satisfies w 2 / λ. Thus, we ca limit the search space to the ball of radius / λ by icorporatig a projectio step as metioed above [ w t+ / λ = mi, w t+ w t+. 2 With the assumptio that x 2 M, we ca boud the hige loss fuctio: l hige (w,x,y) + x 2 w 2 + M λ = C. Thereby, the loss fuctio used by Pegasos ca be adjusted to satisfy the assumptio of Theorem 3 ad we ca use it to prove the followig corollary. Corollary 3. Let w 0,...,w T be the sequece of weight vectors geerated by the Pegasos algorithm from a sample Z where x i 2 M, i. The for all 0 <, we have with probability at least, R(w t ) ˆR + C where β = ( )ˆV C 2 + β l + 2C 3 l, l 2. (30)
9 Empirical Berstei Iequality for Martigales : Applicatio to Olie Learig λ = Cesa-Biachi (2008) boud Empirical Berstei boud (this work) 0 3 λ = Cesa-Biachi (2008) boud Empirical Berstei boud (this work) λ = Cesa-Biachi (2008) boud Empirical Berstei boud (this work) 0 2 λ = Cesa-Biachi (2008) boud Empirical Berstei boud (this work) Figure : Compariso of the Bouds From Propositio ad Corollary 3 Computed for the Pegasos Algorithm o a Toy Liearly Separable Dataset. 5.3 Proof of Cocept I this sectio we wat to highlight experimetally the performace of our empirical Berstei iequality applied to olie learig. I order to do that, we compare the boud provided by Corollary 3 for the Pegasos algorithm to the oe exposed i Propositio. We use a liearly separable toy dataset ad compare the covergece of the empirical risk to the mea risk of the hypotheses w 0,...,w T. We geerate radom vectors x i [, 2 to which we assig the class y i = sig( w,x i ) {+, } for a vector w [, 2 also radomly geerated. We work with a learig sample cotaiig poits ad report i Figure the values of the right had sides appearig i Propositio [Cesa-Biachi ad Getile, 2008 ad i Corollary 3 computed with a cofidece of 95% ( = 0.05). We ra the experimet 20 times for may values of the parameter λ ad averaged the results. We ca see that our iequality is far tighter tha the oe by Cesa-Biachi ad Getile [2008 durig the first iteratios, as it was souded i the theoretical compariso doe i Sectio 3. The gap betwee the two iequalities tightes whe the umber of hypotheses cosidered icreases but remais i our favor. 6 CONCLUSION AND OUTLOOKS I this article, we preset a ew empirical Berstei cocetratio iequality for martigales. We applied this result to the olie learig settig i order to boud the mea risk of the hypotheses leared durig such learig processes. Because we itroduce of a ew istataeous variace estimator, our iequality is well suited for the olie learig settig ad improves the state of the art. This improvemet is maily oticeable whe the umber of hypotheses cosidered is small as show i the empirical sectio of this work. There are may outlooks opeed by this work. First of all, we ca thik about a ew olie learig algorithm that aims at miimizig our empirical Berstei boud as it is doe i the batch settig [Variace Pealizig AdaBoost, Shivaswamy ad Jebara, 20, by example. The, it will be of iterest to derive ew kid of bouds for olie algorithms takig advatage of our result (by example o the excess risk as it is doe i the work by Kakade ad Tewari [2009). The last perspective that we wat to metio is the compariso of our boud with the very recet PAC-Bayes-Empirical- Berstei Iequality by Tolstikhi ad Seldi [203.
10 Thomas Peel,2, Sadrie Athoie, Liva Ralaivola 2 Refereces Kazuoki Azuma. Weighted Sums of Certai Depedet Radom Variables. Tohoku Mathematical Joural, 9(3): , 967. George Beett. Probability Iequalities for the Sum of Idepedet Radom Variables. Joural of the America Statistical Associatio, 57(297):33 45, 962. Nicolò Cesa-Biachi ad Claudio Getile. Improved Risk Tail Bouds for O-Lie Algorithms. IEEE Trasactios o Iformatio Theory, 54(): , Nicolò Cesa-Biachi, Alex Cocoi, ad Claudio Getile. O the Geeralizatio Ability of O-Lie Learig Algorithms. IEEE Trasactios o Iformatio Theory, 50(9): , David A. Freedma. O Tail Probabilities for Martigales. The Aals of Probability, 3():00 8, 975. Wassily Hoeffdig. Probability Iequalities for Sums of Bouded Radom Variables. Joural of the America Statistical Associatio, 58(30):3 30, 963. Sham M. Kakade ad Ambuj Tewari. O the Geeralizatio Ability of Olie Strogly Covex Programmig Algorithms. I Advaces i Neural Iformatio Processig Systems 2 - NIPS 08, pages , Nicholas Littlestoe, Philip Log, ad Mafred Warmuth. O-lie Learig of Liear Fuctios. Computatioal Complexity, 5(): 23, 995. Adreas Maurer ad Massimiliao Potil. Empirical Berstei Bouds ad Sample Variace Pealizatio. I Proceedigs of the 22d Aual Coferece o Learig Theory - COLT 09, David A. McAllester. PAC-Bayesia Model Averagig. I Proceedigs of the 2th Aual Coferece o Computatioal Learig Theory - COLT 99, pages 64 70, 999. Thomas Peel, Sadrie Athoie, ad Liva Ralaivola. Empirical Berstei Iequalities for U-Statistics. I Advaces i Neural Iformatio Processig Systems 23 - NIPS 0, pages 903 9, 200. Shai Shalev-Shwartz. Olie learig: Theory, algorithms, ad applicatios. PhD thesis, Hebrew Uiversity, Shai Shalev-Shwartz, Yoram Siger, Natha Srebro, ad Adrew Cotter. Pegasos: Primal Estimated Sub-Gradiet Solver for SVM. Mathematical Programmig, 27():3 30, 20. Paagadatta K. Shivaswamy ad Toy Jebara. Variace Pealizig AdaBoost. I Advaces i Neural Iformatio Processig Systems 24 - NIPS, pages , 20. Ilya Tolstikhi ad Yevgey Seldi. PAC-Bayes- Empirical-Berstei Iequality. I Advaces i Neural Iformatio Processig Systems - NIPS 3, 203.
Optimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationImprovement of Generic Attacks on the Rank Syndrome Decoding Problem
Improvemet of Geeric Attacks o the Rak Sydrome Decodig Problem Nicolas Arago, Philippe Gaborit, Adrie Hauteville, Jea-Pierre Tillich To cite this versio: Nicolas Arago, Philippe Gaborit, Adrie Hauteville,
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationOn the behavior at infinity of an integrable function
O the behavior at ifiity of a itegrable fuctio Emmauel Lesige To cite this versio: Emmauel Lesige. O the behavior at ifiity of a itegrable fuctio. The America Mathematical Mothly, 200, 7 (2), pp.75-8.
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationA Simple Proof of the Shallow Packing Lemma
A Simple Proof of the Shallow Packig Lemma Nabil Mustafa To cite this versio: Nabil Mustafa. A Simple Proof of the Shallow Packig Lemma. Discrete ad Computatioal Geometry, Spriger Verlag, 06, 55 (3), pp.739-743.
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationOptimization Results for a Generalized Coupon Collector Problem
Optimizatio Results for a Geeralized Coupo Collector Problem Emmauelle Aceaume, Ya Busel, E Schulte-Geers, B Sericola To cite this versio: Emmauelle Aceaume, Ya Busel, E Schulte-Geers, B Sericola. Optimizatio
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationThe Goldbach conjectures
The Goldbach cojectures Jamel Ghaouchi To cite this versio: Jamel Ghaouchi. The Goldbach cojectures. 2015. HAL Id: hal-01243303 https://hal.archives-ouvertes.fr/hal-01243303 Submitted o
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationTURBULENT FUNCTIONS AND SOLVING THE NAVIER-STOKES EQUATION BY FOURIER SERIES
TURBULENT FUNCTIONS AND SOLVING THE NAVIER-STOKES EQUATION BY FOURIER SERIES M Sghiar To cite this versio: M Sghiar. TURBULENT FUNCTIONS AND SOLVING THE NAVIER-STOKES EQUATION BY FOURIER SERIES. Iteratioal
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationGini Index and Polynomial Pen s Parade
Gii Idex ad Polyomial Pe s Parade Jules Sadefo Kamdem To cite this versio: Jules Sadefo Kamdem. Gii Idex ad Polyomial Pe s Parade. 2011. HAL Id: hal-00582625 https://hal.archives-ouvertes.fr/hal-00582625
More informationInvariant relations between binary Goldbach s decompositions numbers coded in a 4 letters language
Ivariat relatios betwee biary Goldbach s decompositios umbers coded i a letters laguage Deise Vella-Chemla To cite this versio: Deise Vella-Chemla. Ivariat relatios betwee biary Goldbach s decompositios
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013 Fuctioal Law of Large Numbers. Costructio of the Wieer Measure Cotet. 1. Additioal techical results o weak covergece
More informationDistribution of Random Samples & Limit theorems
STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLecture 15: Learning Theory: Concentration Inequalities
STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that
More informationChapter 3. Strong convergence. 3.1 Definition of almost sure convergence
Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationLecture 3 The Lebesgue Integral
Lecture 3: The Lebesgue Itegral 1 of 14 Course: Theory of Probability I Term: Fall 2013 Istructor: Gorda Zitkovic Lecture 3 The Lebesgue Itegral The costructio of the itegral Uless expressly specified
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More information4. Partial Sums and the Central Limit Theorem
1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 3
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture 3 Tolstikhi Ilya Abstract I this lecture we will prove the VC-boud, which provides a high-probability excess risk boud for the ERM algorithm whe
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationEntropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP
Etropy ad Ergodic Theory Lecture 5: Joit typicality ad coditioal AEP 1 Notatio: from RVs back to distributios Let (Ω, F, P) be a probability space, ad let X ad Y be A- ad B-valued discrete RVs, respectively.
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationBecause it tests for differences between multiple pairs of means in one test, it is called an omnibus test.
Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationCHAPTER 10 INFINITE SEQUENCES AND SERIES
CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece
More informationTesting the number of parameters with multidimensional MLP
Testig the umber of parameters with multidimesioal MLP Joseph Rykiewicz To cite this versio: Joseph Rykiewicz. Testig the umber of parameters with multidimesioal MLP. ASMDA 2005, 2005, Brest, Frace. pp.561-568,
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECURE 4 his lecture is partly based o chapters 4-5 i [SSBD4]. Let us o give a variat of SGD for strogly covex fuctios. Algorithm SGD for strogly covex
More informationChapter 2 The Monte Carlo Method
Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful
More informationMAT1026 Calculus II Basic Convergence Tests for Series
MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationCoefficient of variation and Power Pen s parade computation
Coefficiet of variatio ad Power Pe s parade computatio Jules Sadefo Kamdem To cite this versio: Jules Sadefo Kamdem. Coefficiet of variatio ad Power Pe s parade computatio. 20. HAL Id: hal-0058658
More informationComparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading
Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationDISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION
DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION Csaba Szepesvári Uiversity of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 10-12-14, 2006 OUTLINE 1 DISCRETE PREDICTION PROBLEMS 2 RANDOMIZED
More information