Boostng Rvstd Massmo Santn Dpartmnto d Scnz dll'informazon Unvrsta dgl Stud d Mlano Va Comlco, 39/41-20135 Mlano (Italy) Tchncal Rport 205-97 Abstract In ths papr som boostng and on-ln allocaton stratgs ar rvstd n ordr to provd nw proofs nsprd by Chrno's boundng tchnqus takn from common Probablty Thory. Evn f som of th rsults prsntd hr ar wakr than th bst known today, th gnralzd styl of th proofs hopfully ors a bttr undrstandng of known rsults and suggsts nw stratgs. Ths papr s a Mnor Thss for th PhD program of \Unvrsta Statal d Mlano". Th rfr of ths papr s Dr. Ncolo Csa-Banch. 1 Intorducton Th topcs tratd n ths papr can b nformally summarzd as follows: On-ln allocaton: for a crtan numbr of rounds a xd amount of a gvn rsourc s to b allocatd among a st of possbl chocs judgng only by th outcoms of prvous rounds. Show that, vn gnorng any a pror knowldg about th prformanc of ach ndvdual choc, an allocaton stratgy xsts whch nvr prforms much wors than th bst choc. PhD studnt at th \Unvrsta Statal d Mlano". E-mal: <santn@ds.unm.t>
Boostng: a st of approxmat prdctons ar to b combnd n a smpl way n ordr to obtan a mor accurat on. Show that, n th bnary classcaton cas, takng a wghtd majorty ovr a st of prdctons (ach on a lttl smartr than a random guss), th accuracy can b xponntally ncrasd wth rspct to th numbr of combnd prdctons. Dspt th drncs btwn thm, ths problms can b tratd n th sam probablstc framwork and a vry standard tchnqu can b adaptd to prov both th xstnc of an \optmal" on-ln allocaton stratgy and a boostng stratgy for prdcton algorthms. 1.1 Chrno's boundng To ntroduc Chrno's boundng tchnqu [Hof63], whch s cntral to ths papr, th proof of a vry smpl nqualty s now sktchd. Lt X n wth (n = 1; : : : ; N) a famly of..d. r.v.'s dnd on th p.s. h; F; P such that E[X n ] = 0 and X n 2 [0; 1]; thn, for any >), P " NX n=1 X n N " #?N" E P X n =?N" NY E Xn?2N" 2 whr th rst stp s du to th Markov's nqualty (lmma A.1), th scond to th stochastc ndpndnc of th r.v.'s and th last to a standard bound on momnt gnratng functon (lmma A.3). Ths rsult s usually paraphrasd P sayng that, undr sutabl hypothss, th probablty X that th mprcal man n xcds by an amount " th man E[X], dcrass xponntally fastr both wth th amount " and th sampl sz N N. n=1 Th crucal da of on-ln allocaton and boostng stratgs hr dscussd s how to buld a famly of r.v.'s rprsntng th \loss" of a crtan choc or th \accuracy" of a crtan prdcton, n such a way that vn f ths r.v.'s arn't stochastcally ndpndnt t rmans possbl to \xchang th xpctaton wth th product" as n th prvous drvaton [CB97b]. Snc th sum of ths r.v.'s wll b rlatd to th prformanc of th on-ln allocaton stratgy or th accuracy of th boostd prdcton, Chrno's boundng tchnqu bcoms a way to rlat ovrall prformanc to sngl choc or prdcton prformanc. Th focus of th nxt scton s on on-ln allocaton, thn boostng for classcaton prdctons s ntroducd. Th last two scton dal wth boostng for rgrsson, startng wth a rducton from rgrsson to classcaton and thn wth a mor drct approach. Each of ths sctons ar organzd n thr parts: a dnton of th topc, a concs collcton of formal proofs and a nal dscusson of th rsults wth som rfrnc to th xstng 2
ltratur. An appndx wth som tchncal lmmas ndd to complt th proofs n th prvous sctons s also provdd. Rmark that n th followng th trm \stratgy" s purposly usd nstad of th mor prcs trm \algorthm" snc th prsntd proofs and constructons ar dscussd wth no partcular strss on computatonal ssus; t s howvr a straghtforward task to rndr as algorthms most of th stratgs hr ntroducd. 2 On-ln allocaton 2.1 Dntons Gvn a nt st of chocs, a xd allocaton s a dstrbuton 1 on such that th mags sum up to 1 and a boundd loss s smply a functon P 7! [0; 1]; gvn a xd allocaton p and a boundd loss l, th surd loss s th amount p(!)l(!).!2 Consdrng P a squnc of rounds ndxd by t, f l t s a squnc of boundd losss th!-loss l t t(!) s th ovrall loss surd assumng a xd allocaton wth valu 1 on! constantly on all th rounds, th total loss s just th sum of all rounds surd losss and th nt loss s th drnc btwn th total loss and th mnmum!-loss ovr all th chocs. An on-ln allocaton stratgy s a way of choosng a squnc of xd allocatons p t wth rspct to an arbtrary squnc of boundd losss l t n such a way that th choc of p t dpnds only on l t 0 wth t 0 < t (but l t 0 s allowd to dpnd on any of th p t ). Th gnral am of an on-ln allocaton stratgy s to provd som bound on th nt loss snc th dnton of th stratgy allows a compltly advrsaral squnc of losss thus makng any bound on th total loss mannglss. Th formulaton of th problm allows to consdr a p.m. ovr a dscrt p.s. as a xd allocaton and a boundd r.v. as a boundd loss; s also asy to chck that th gvn dnton of surd loss concds n ths rspct wth an xpctaton. Th followng rsults ar thrfor statd n ths mor abstract sttng and thn dscussd n trm of on-ln allocaton stratgs. 2.2 Formal proofs Consdr a p.s. h; F; P. Lt L t :! [0; 1] b a famly 2 of boundd r.v.'s on h; F; P. 1 a functon 7! [0; 1] such that th mags sum up to 1. 2 n th followng th ndx t s ntndd to vary n 1; : : : ; T and th sam holds for (not xplctly ndxd) sums and products. 3
Gvn a ral postv paramtr, consdr th famly of p.m.'s on h; F such that P 1 = P and P t+1 = t P t?lt wth normalzng 1= t = E Pt?L t Obsrv that L t ar r.v.'s also on h; F; P t ; dn = X E Pt [L t ] and! = X L t (!) Thorm 2.1 If A 2 F and P [A] 6= 0, thn Proof.? max! 1!2A ln 1 P [A] + T 8?+T 2 =8 = Q?E Pt [L t]+ 2 =8 Q EPt?L t = EP P?Lt P!2A?P L t(!) P [!]? max!2ap Lt(!) P!2A P [!] =? max!2a! P [A] whr th \lft" drvatons follow by dnton of, lmmas A.2 and A.3 and th \rght" ons by postvty of?p L t(!) and by dnton of!. 2 Corollary 2.1 If p = P [! ] 6= 0, thn? (1=) ln(1=p) + T =8. Proof. By smply takng A = f! :! g n th prvous thorm; obsrv that, n ths cas, th \rght" part of th proof bcoms a standard Markov's nqualty. 2 Corollary 2.2 If p = P [argmn!2! ] 6= 0, thn? mn!2! (1=) ln(1=p) + T =8. Proof. By smply takng A = fargmn!2! g n th prvous thorm. 2 Corollary 2.3 If P [!] 1=jj and = p 8 ln jj=t, thn? mn!2! T p 2 ln jj=t Proof. Dvdng by T th nqualty of th prvous corollary, and mnmzng th trm (1=) ln jj + T =8 by smply drntatng wth rspct to. 2 4
2.3 Dscusson In th followng an ntrprtaton of th corollars s gvn as an xplanaton of th vry gnral nqualty of th thorm; not that probablty plays only th rol of a masur and consquntly all th followng conclusons (xcpt th rst, rlatv to corollary 2.1) ar dtrmnstc n th strct sns. Assum that P modls a stochastc bhavor of th famly of chocs (whch can b n prncpl a contnuous famly nstad of a nt on as n th sttng assumd hr), thn p can b ntrprtd as th probablty, wth rspct to th stochastc bhavor of th chocs, that th!-loss s boundd by som. Th nqualty of corollary 2.1 thn rlats th drnc btwn th total loss and such bound wth th logarthm of 1=p ssntally 3 n th sam way as n [Vov]. Th corollary 2.2 gvs a bound ssntally 4 of th form of th on prsntd n [FS97]. Accordng to th dnd sttng, concds wth th total loss and! wth th!-loss of th choc!, so th corollary 2.3 assrts that th on-ln allocaton stratgy corrspondng to th squnc of allocatons P t (startd wth a xd allocaton p that s qual for ach choc), s such that th avrag nt loss gos to 0 wth T as O( ln jj=t ); ths last rsult concds wth th on prsntd n [FS97]. 3 Bnary classcaton 3.1 Dntons Classcaton rfrs n gnral to th problm of prdctng a labl n a nt st L for ach lmnt of a st I of nstancs accordng to som rlatonshp btwn nstancs and labls that can b thought as an (unknown) dtrmnstc mappng I 7! L or as a jont probablty dstrbuton ovr I L; a prdcton s thn a mappng I 7! L whos rror s a masur of th dscrpancy btwn prdctd and ntndd labl (accordng th unknown mappng or th jont dstrbuton). Whn th cardnalty of L s 2, th classcaton s sad to b bnary. Goal of a boostng stratgy s to combn a famly of smpl prdctons n ordr to obtan a nal prdcton wth smallr rror wth rspct to th smpl ons. For computablty rasons, th attnton s usually rstrctd to a sampl,.. a nt numbr of pars (; l) drawn from I L accordng som dstrbuton; furthrmor som gnralzaton rsults ar usually provdd to rlat th rror on th sampl to th rror wth rspct to th whol st of nstancs. For ths rasons th followng formal proofs ar rstrctd to a nt st rprsntng th nstanc part of th sampl and a dtrmnstc mappng 7! f?1; +1g rprsntng 3 th drnc bng rlatv to th constants. 4 n [FS97] drnt, and n som sns optmal, constants ar prsnt. 5
th labl part of th sampl; s straghtforward to vrfy that varous bnary classcaton sttngs occurrng n ltratur can b rducd to th on prsntd hr. Th p.m. P assocatd to can b usd thr to somhow rproduc ovr th sampl th probablty gvn for th whol st of nstancs, or just consdrd unform to rlat to common mprcal stmat ovr th sampl. 3.2 Formal proofs Consdr a p.s. h; F; P, a functon c :! f?1; +1g and a famly of functons h t :! f?1; +1g, dn h :! R as whr t s a famly of ral paramtrs. h(!) = X t h t (!) Lt L t a famly of r.v.'s on h; F; P dnd as L t (!) = h t (!)c(!) s asy to chck that th rang of L t s f?1; +1g and that ( +1 h t (!) = c(!) L t (!) =?1 h t (!) 6= c(!) Consdr th famly of p.m.'s on h; F such that P 1 = P and P t+1 = t P t?tlt wth normalzng 1= t = E Pt? tl t Obsrv that L t ar r.v.'s also on h; F; P t ; dn " t = P t [h t 6= c] thn E Pt [L t ] = P t [L t = +1]? P t [L t =?1] = 1? 2P t [h t 6= c] = 1? 2" t Thorm 3.1 If t = 1? 2" t, thn P [hc 0] Q?2(1=2?"t)2. Proof. P [hc 0] = P = P h c X t h t 0 hx?t L t 0 E P P?tL t = Y E Pt? tl t Y?tE P t [L t]+ 2 t =2 whr last thr drvatons follow from lmmas A.1, A.2 and A.3 rspctvly. Hnc ths last product can b mnmzd drntatng wth rspct to t ach postv factor sparatly, obtanng th valu t = E Pt [L t ] = 1? 2" t whr th mnma ar attand. 2 6
Corollary 3.1 If > 0 xsts such that " t 1=2? and f(!) = sgn(h(!)), thn?2t 2 P [f 6= c] Proof. From th dnton of f t follows that P [f 6= c] = P [fc 0]; by straghtforward computaton, from th prvous thorm, t follows that P [fc 0]?2T 2. 2 Corollary 3.2 If t = 1?2" t 0, P t = 1 and 0, thn P [hc ] Q?2((1=2?"t)?=2)2. Proof. Th proof s ssntally th sam as for thorm 3.1, rplacng th 0 at th rght sd of th rst trm wth P t and thn prformng th sam drvatons. 2 3.3 Dscusson Evn f addrssng apparntly drnt stuatons, th proofs of thorms 3.1 and 2.1 shar a formal smlarty whch had bn justd [FS96b] by a knd of dualty btwn on-ln allocaton and boostng stratgs. Ths dualty has alrady bn xplotd to transform th wghtd majorty algorthm [LW94] to a boostng stratgy [FS97]. If th famly of mappngs h t s consdrd as a st of smpl prdctons and h as th nal on, thn " t rprsnts th rror wth rspct to probablty P t and th statmnt of th thorm rlats th postvty of h (and ultmatly th rror of sgn(h) as a prdcton) to th rrors of smpl prdctons. To bttr undrstand th gvn nqualty t s usually assumd a mor optmstc stuaton: th corollary 3.1 assrts that f all th smpl prdctons hav rror boundd away from 1=2 wth rspct to any P t, thn th rror of th nal prdcton f gos to 0 xponntally fast wth th numbr of combnd smpl prdctons. Consdr now a probablty P ovr I and lt rprsnt a sampl drawn ndpndntly at random form I accordng to P. In ordr to provd gnralzaton rsults th famly of smpl prdctons has to provd som (wak) structural proprty. A famly h t of smpl prdctons shattrs a subst S I f for any T S an h t xsts such that T = h?1 t (+1); th Vapnk-Chrvonnks dmnson [VC71] of th famly s th cardnalty of th largst (possbly nnt) shattrd subst of I. Undr th hypothss of corollary 3.2, f P (!) 1=jj and d < 1 s th Vapnk- Chrvonnks dmnson of th famly of smpl prdctons, thn wth probablty mor than 1? ovr th random choc of th sampl, t s possbl [SFBL97] to prov that P [hc 0] P [hc ] + O 1 p d ln 2 (jj=d) + ln(1=) jj 2 7 1=2!
by combnng ths nqualty wth th rsult of corollary 3.2 and at th sam tm assumng th hypothss of corollary 3.1, follows asly P [f 6= c]?2t (?=2)2 + O 1 p d ln 2 (jj=d) + ln(1=) jj 2 1=2! whch gvs an asymptotc bound on th gnralzaton rror n trm of sampl sz and numbr of combnd smpl prdctons. 4 Rgrsson, va rducton to classcaton 4.1 Dntons Rgrsson as tratd n ths papr s vry smlar to classcaton, th only drnc bng that th labl st s a nnt st ndowd wth a noton of dstanc takn as a masur of th dscrpancy btwn th prdctd and th ntndd labl for any nstanc, thus dnng a noton of rror smlar to th on ntroducd for th classcaton cas. For th sam computablty rasons as bfor, th st s a nt sampl as n th class- caton cas; vn n ths cas s straghtforward to vrfy that varous rgrsson sttngs occurrng n ltratur can b rducd to th on prsntd hr. Gvn a thrshold > 0 th rgrsson prdctons can b transformd n bnary prdctons n such a way that th bnary prdcton answrs +1 f th rgrsson on answrs a labl narr than to th ntndd on and?1 n th oppost cas. To prdct wth accuracy th corrct labl s thn quvalnt to prdct th constant labl +1. Onc th bnary classcaton boostng stratgy has rturnd th famly of paramtrs t a nal rgrsson prdcton can b obtand by a rlatvly smpl functon of ths valus. 4.2 Formal proofs Consdr a p.s. h; F; P, a functon c :! and a famly of functons h t :!. Lt d :! R + an arbtrary dstanc on ; gvn > 0 dn and h :! as!; = ft : d(h t (!); ) g h(!) = argmax 2 X!; t whr t s a famly of postv ral paramtrs. Hnc, for any t holds X X t?!;!;h(!) t 0 8
D Lt ; ~ ~F; ~PE b a p.s. whr ~ = and 5 ~P = P ; lt ~c : ~! f?1; +1g such that ~c(~!) +1 and lt h ~ t : ~! f?1; +1g b a famly of functons dnd as ( ~h t (~!) = h ~ +1 d(h t (!); ) t (!; ) =?1 d(h t (!); ) > and lt ~ h(~!) = P t ~ ht (~!), that s Th famly ~h(~!) = ~ h(!; ) = X!; t? X C!; D ~ ; ~F; ~P t E of p.m.'s s such that P t [d(h t (!); ) > ] = ~P t h ~h(!; ) 6= ~c(!; ) thrfor t = 1? 2 ~P t h ~h 6= ~c 0 f and only f P t [d(h t (!); ) > ] 1=2. Obsrv that d(; 0 ) > 2 mpls!; \!; 0 = ; or quvalntly C!; =!; 0 [ whr s a (possbly mpty) st of ndcs t; snc all th t ar postv s asy to chck that d(h(!); ) > 2 mpls X 0 ~h(!; ) = t? @ X 1 X t A? t 0!;!;h(!) t + X Thorm 4.1 If " t = P t [d(h t (!); ) > ] 1=2, thn P [d(h(!); ) > 2] Y?2(1=2?"t)2 t Proof. It follows asly from thorm 3.1 by th ctd rducton and obsrvng that P [d(h(!); ) > 2] ~P h ~h 0 = ~P h ~h~c 0 2 Corollary 4.1 If > 0 xsts such that P t [d(h t (!); ) > ] 1=2?, thn 5 whr dnots an unform masur on.?2t 2 P [d(h(!); ) > 2] 9
4.3 Dscusson Rducng rgrsson to classcaton for applyng boostng stratgs had bn rst suggstd n [FS97] for th cas = [0; 1] R. Instad, th prsnt approach s ssntally th sam dscussd n [BCP] whr = [0; 1] n R n for n > 1 and vn n th cas of n = 1, xprmntal analyss [FS96a] sms to show that th stratgy prsntd hr prforms bttr than th on n [FS97]. Th corollary 4.1 assrts that f th probablty of bng lss accurat than of th smpl prdctons s boundd away from 1=2 wth rspct to any P t, thn th probablty that th nal prdcton would b lss accurat than 2 gos to 0 xponntally fast wth th numbr of combnd smpl prdctons. Obsrv that all othr rsults rlat xpctatons to probablty whl hr s statd a rlaton xclusvly btwn probablts. 5 Rgrsson, a drct approach 5.1 Dntons Th followng approach s somhow mor gnral than th prvous on snc hr th rror s gvn n trm of an abstract loss functon whch nds only to b n som sns \convx" wth rspct to th composton chosn for combnng th smpl prdctons; f th labl st s a normd vctor spac on R th nducd dstanc (whch s convx n th usual sns) and an mprcal avrag of th prdctons, togthr satss th rqust of th prsntd sttng. 5.2 Formal proofs Consdr a p.s. h; F; P, a functon c :! and a famly of functons h t :!. Lt : T! and :! [l; l + ] R such that 6 for any! and dn h :! as T ((h 1 (!); : : : ; h T (!)); c) X (h t (!); c(!)) h(!) = (h 1 (!); : : : ; h T (!)) Lt L t a famly of r.v.'s on h; F; P dnd as L t (!) = (h t (!); c(!)) 6 obsrv that n th cas = R n th choc of (h 1 (!); : : : ; h T (!)) = 1=T P h t (!) and convx satss th dnton. 10
Gvn a ral postv paramtr, consdr th famly of p.m.'s on h; F such that P 1 = P and P t+1 = t P t Lt wth normalzng 1= t = E Pt L t Obsrv that L t ar r.v.'s also on h; F; P t ; dn = 1 X E Pt [L t ] T Thorm 5.1 If = 4= 2, thn P [(h; c) + ]?2T 2 = 2. Proof. P [(h; c) + ] P X 1 T (h t; c) + h X = P?T ( + ) + L t 0 P E P?T (+)+ L t Y P =?T (+) E Pt Lt?T (+) Y E P t [L t]+ 2 2 =8 whr th rst drvaton follows from th dnton of and last thr drvatons follow from lmmas A.1, A.2 and A.3 rspctvly. Hnc, gvn th dnton of, th last xprsson smpls to?t (+)+T 2 2 =8+ P E Pt [L t] = T 2 2 =8?T that can b mnmzd drntatng wth rspct to, obtanng th valu = 4= 2 whr th mnmum s attand. 2 5.3 Dscusson Rspct to th prvous stratgy, undr sutabl rstrcton on, th combnaton of th smpl prdctons hr can b as smpl as an avragd summaton whl th loss functon can stll rman a (boundd) dstanc; n ths sttng, th statmnt of thorm 5.1 assrts that [CB97a] th probablty that th loss of th nal prdcton xcds by any xd amount th avragd total loss gos to 0 wth th numbr T of th avragd smpl prdctons. A lss obvous task s how to compar (n th most gnral sttng) th rsults of thorms 4.1 and 5.1 snc vn f both gv an xponntally dcrasng uppr bound to th probablty of rror of th nal prdcton, wth rspct to th smpl prdctons, th formr s basd on probablty of rror, whl th lattr on th xpctaton of rror. 11
A Som tchncal lmmas Hr som tchncal lmmas whch ar ndd n th prvous proofs ar gvn. Evn f t s somtms unncssary rstrctv, hr th st s assumd to b nt, consquntly all th r.v.'s ar smpl. Lmma A.1 Lt X b a r.v. on h; F; P, thn P [X 0] = P X 1 E[ X ]. Proof. By smpl applcaton of Markov's nqualty. 2 Th nxt lmma gvs th p.m. transformaton whch th nsprng da of ths papr suggstd n [CB97b], as wll as th man tool of all th proofs. Lmma A.2 Lt X t b a famly of r.v.'s on h; F; P and P t a famly of p.m.'s on h; F dnd as P 1 = P and P t+1 = t P t X t wth normalzng 1= t = E Pt [X t ] thn X t s a famly of r.v.'s on h; F; P t and E P hy Xt = Y E Pt [X t ] Proof. By dnton of P t s asy to chck that X t ar r.v.'s dnd also on h; F; P t, thn E P hy Xt = E P Y Pt+1 t P t = E P PT +1 P 1 Y 1=t = Y E Pt [X t ] Lmma A.3 Lt X b a r.v.on h; F; P such that x X x + and E[X] = < 1, thn E X +2 2 =8 Proof. Assum that = 0, thn by convxty of th xponntal functon E X x + x? x (x+) = g(u) whr u = and g(u) =?pu + log(1? p + p u ) wth p =?x=. Is asy to vrfy that g(0) = g 0 (0) = 0 and that g 00 (u) 1=4. Hnc by Taylor's xpanson, for sutabl, g(u) = g(0) + ug 0 (0) + u2 2 g00 () u2 8 If now 6= 0, X? has 0 man and by th prvous nqualty 2? E[ X ] = E[ (X?) ] 2 2 =8 2 12
Rfrncs [BCP] [CB97a] [CB97b] [FS96a] [FS96b] A. Brton, P. Campadll, and M. Parod. A boostng algorthm for rgrsson. [To appar n ICANN'97]. N. Csa-Banch. A boostng algorthm for rgrsson. [Unpublshd manuscrpt], 17 Jun 1997. N. Csa-Banch. Concntraton of masur for sums of dpndnt random varabls. [Unpublshd manuscrpt], 6 Jun 1997. Y. Frund and R. E. Schapr. Exprmnts wth a nw boostng algorthm. In Proc. 13th Intrnatonal Confrnc on Machn Larnng, pags 148{146. Morgan Kaufmann, 1996. Y. Frund and R. E. Schapr. Gam thory, on-ln prdcton and boostng. In Proc. 9th Annu. Conf. on Comput. Larnng Thory, pags 325{332. ACM Prss, Nw York, NY, 1996. [FS97] Y. Frund and R. E. Schapr. A dcson-thortc gnralzaton of on-ln larnng and an applcaton to boostng. Journal of Computr and Systm Scncs, 55(1):119{139, August 1997. [Hof63] [LW94] W. Hodng. Probablty nqualts for sums of boundd random varabls. Journal of th Amrcan Statstal Assocaton, 58:13{30, 1963. N. Lttlston and M. K. Warmuth. Th wghtd majorty algorthm. Informaton and Computaton, 108(2):212{261, 1 Fbruary 1994. [SFBL97] R. E. Schapr, Y. Frund, P. Bartltt, and W. Sun L. Boostng th margn: a nw xplanaton for th ctvnss of votng mthods. In Proc. 14th Intrnatonal Confrnc on Machn Larnng, pags 322{330. Morgan Kaufmann, 1997. [VC71] V. N. Vapnk and A. Y. Chrvonnks. On th unform convrgnc of rlatv frquncs of vnts to thr probablts. Thory of Probablty and ts Applcatons, 16(2):264{280, 1971. [Vov] V. Vovk. Drandomzng stochastc prdcton stratgs. [To appar n COLT'97]. 13