AT&T Labs Research, Shannon Laboratory, 180 Park Avenue, Room A279, Florham Park, NJ , USA

Machne Learnng, 43, 65 91, 001 c 001 Kluwer Acadec Publshers. Manufacured n The Neherlands. Drfng Gaes ROBERT E. SCHAPIRE schapre@research.a.co AT&T Labs Research, Shannon Laboraory, 180 Park Avenue, Roo A79, Florha Park, NJ 0793-0971, USA Edor: Yora Snger Absrac. We nroduce and sudy a general, absrac gae played beween wo players called he shepherd and he adversary. The gae s played n a seres of rounds usng a fne se of chps whch are oved abou n R n. On each round, he shepherd assgns a desred drecon of oveen and an porance wegh o each of he chps. The adversary hen oves he chps n any way ha need only be weakly correlaed wh he desred drecons assgned by he shepherd. The shepherd s goal s o cause he chps o be oved o low-loss posons, where he loss of each chp a s fnal poson s easured by a gven loss funcon. We presen a shepherd algorh for hs gae and prove an upper bound on s perforance. We also prove a lower bound showng ha he algorh s essenally opal for a large nuber of chps. We dscuss copuaonal ehods for effcenly pleenng our algorh. We show ha our general drfng-gae algorh subsues soe well suded boosng and on-lne learnng algorhs whose analyses follow as easy corollares of our general resul. Keywords: boosng, on-lne learnng algorhs 1. Inroducon We nroduce a general, absrac gae played beween wo players called he shepherd 1 and he adversary. The gae s played n a seres of rounds usng a fne se of chps whch are oved abou n R n. On each round, he shepherd assgns a desred drecon of oveen o each of he chps, as well as a nonnegave wegh easurng he relave porance ha each chp be oved n he desred drecon. In response, he adversary oves each chp however wshes, so long as he relave oveens of he chps proeced n he drecons chosen by he shepherd are a leas δ, on average. Here, he average s aken wh respec o he porance weghs ha were seleced by he shepherd, and δ 0 s a gven paraeer of he gae. Snce we hnk of δ as a sall nuber, he adversary need ove he chps n a fashon ha s only weakly correlaed wh he drecons desred by he shepherd. The adversary s also resrced o choose relave oveens for he chps fro a gven se B R n. The goal of he shepherd s o force he chps o be oved o low-loss posons, where he loss of each chp a s fnal poson s easured by a gven loss funcon L. A ore foral descrpon of he gae s gven n Secon. We presen n Secon 4 a new algorh called OS for playng hs gae n he role of he shepherd, and we analyze he algorh s perforance for any paraeerzaon of he gae eeng ceran naural condons. Under he sae condons, we also prove n Secon 5 ha our algorh s he bes possble when he nuber of chps becoes large.

66 R. E. SCHAPIRE As spelled ou n Secon 3, he drfng gae s closely relaed o boosng, he proble of fndng a hghly accurae classfcaon rule by cobnng any weak classfers or hypoheses. The drfng gae and s analyss are generalzaons of Freund s (1995) aory-voe gae whch was used o derve hs boos-by-aory algorh. Ths laer algorh s opal n a ceran sense for boosng bnary probles usng weak hypoheses whch are resrced o akng bnary predcons. However, he boos-byaory algorh has never been generalzed o ulclass probles, nor o a seng n whch weak hypoheses ay absan or gve graded predcons beween wo classes. The general drfng gae ha we sudy leads edaely o new boosng algorhs for hese sengs. By our resul on he opaly of he OS algorh, hese new boosng algorhs are also bes possble, assung as we do n hs paper ha he fnal hypohess s resrced n for o a sple aory voe. We do no know f he derved algorhs are opal whou hs resrcon. In Secon 6, we dscuss copuaonal ehods for pleenng he OS algorh. We gve a useful heore for handlng gaes n whch he loss funcon enoys ceran onooncy properes. We also gve a ore general echnque usng lnear prograng for pleenng OS n any sengs, ncludng he drfng gae ha corresponds o ulclass boosng. In hs laer case, he algorh runs n polynoal e when he nuber of classes s held consan. In Secon 7, we dscuss he analyss of several drfng gaes correspondng o prevously suded learnng probles. For he drfng gaes correspondng o bnary boosng wh or whou absanng weak hypoheses, we show how o pleen he algorh effcenly. We also show ha here are paraeerzaons of he drfng gae under whch OS s equvalen o a splfed verson of he AdaBoos algorh (Freund & Schapre, 1997; Schapre & Snger, 1999), as well as Cesa-Banch e al. s (1996) BW algorh and Llesone and Waruh s (1994) weghed aory algorh for cobnng he advce of expers n an on-lne learnng seng. Analyses of hese algorhs follow as easy corollares of he analyss we gve for general drfng gaes.. Drfng gaes We begn wh a foral descrpon of he drfng gae. An oulne of he gae s shown n fgure 1. There are wo players n he gae called he shepherd and he adversary. The gae s played n T rounds usng chps. On each round, he shepherd specfes a wegh vecor w Rn for each chp. The drecon of hs vecor, v = w / w p, specfes a desred drecon of drf, whle he lengh of he vecor w p specfes he relave porance of ovng he chp n he desred drecon. In response, he adversary chooses a drf vecor z for each chp. The adversary s consraned o choose each z fro a fxed se B Rn. Moreover, he z s us sasfy w z δ w p (1)

DRIFTING GAMES 67 paraeers: for = 1,...,T: nuber of rounds T denson of space n se B R n of pered relave oveens nor l p where p 1 nu average drf δ 0 loss funcon L : R n R nuber of chps shepherd chooses wegh vecor w Rn for each chp adversary chooses drf vecor z B for each chp so ha w z δ w p =1 =1 ( ) T he fnal loss suffered by he shepherd s 1 L z =1 =1 Fgure 1. The drfng gae. or equvalenly w p v z δ () p w where δ 0 s a fxed paraeer of he gae. (Here and hroughou he paper, when clear fro conex, denoes = 1 ; lkewse, we wll shorly use he noaon for T = 1.) In words, v z s he aoun by whch chp has oved n he desred drecon. Thus, he lef hand sde of Eq. () represens a weghed average of he drfs of he chps proeced n he desred drecons where chp s proeced drf s weghed by w p/ w p.we requre ha hs average proeced drf be a leas δ. The poson of chp a e, denoed by s, s sply he su of he drfs of ha chp up o ha pon n e. Thus, s 1 = 0 and s+1 = s + z. The fnal poson of chp a he end of he gae s s T +1. A he end of T rounds, we easure he shepherd s perforance usng a funcon L of he fnal posons of he chps; hs funcon s called he loss funcon. Specfcally, he shepherd s goal s o nze 1 L ( s T +1 ). Suarzng, we see ha a gae s specfed by several paraeers: he nuber of rounds T ; he denson n of he space; a nor p on R n ; a se B R n ; a nu drf consan δ 0; a loss funcon L; and he nuber of chps. Snce he lengh of wegh vecors w are easured usng an l p -nor, s naural o easure drf vecors z usng a dual l q -nor where 1/p + 1/q = 1. When clear fro conex, we wll generally drop p and q subscrps and wre sply w or z.

68 R. E. SCHAPIRE As an exaple of a drfng gae, suppose ha he gae s played on he real lne and ha he shepherd s goal s o ge as any chps as possble no he nerval [, 7]. Suppose furher ha he adversary s consraned o ove each chp lef or rgh by one un, and ha, on each round, 10% of he chps (as weghed by he shepherd s chosen dsrbuon over chps) us be oved n he shepherd s desred drecon. Then for hs gae, n = 1, B ={ 1,+1}and δ = 0.1. Any nor wll do (snce we are workng n us one denson), and he loss funcon s { 0 f s [, 7] L(s) = 1 oherwse. We wll reurn o hs exaple laer n he paper. Drfng gaes bear a ceran reseblence o he knd of gaes suded n Blackwell s (1956) celebraed approachably heory. However, s unclear wha he exac relaonshp s beween hese wo ypes of gaes and wheher one ype s a specal case of he oher. 3. Relaon o boosng In hs secon, we descrbe how he general gae of drf relaes drecly o boosng. In he sples boosng odel, here s a boosng algorh ha has access o a weak learnng algorh ha calls n a seres of rounds. There are gven labeled exaples (x 1, y 1 ),...,(x,y )where x X and y { 1,+1}. On each round, he booser chooses a dsrbuon D () over he exaples. The weak learner hen us generae a weak hypohess h : X { 1,+1}whose error s a os 1/ γ wh respec o dsrbuon D. Tha s, Pr D [y h (x )] 1 γ. (3) Here, γ > 0 s known a pror o boh he booser and he weak learner. Afer T rounds, he booser oupus a fnal hypohess whch we here assue s a aory voe of he weak hypoheses: ( ) H(x) = sgn h (x). (4) For our purposes, he goal of he booser s o nze he fracon of errors of he fnal hypohess on he gven se of exaples: 1 { : y H(x )}. (5) We can recas boosng as us descrbed as a specal-case drfng gae; a slar gae, called he aory-voe gae, was suded by Freund (1995) for hs case. The chps are denfed wh exaples, and he gae s one-densonal so ha n = 1. The drf of a chp z s +1 f exaple s correcly classfed by h and 1 oherwse; ha s, z = y h (x )

DRIFTING GAMES 69 and B ={ 1,+1}. The wegh w s forally pered o be negave, soehng ha does no ake sense n he boosng seng; however, for he opal shepherd descrbed n he nex secon, hs wegh wll always be nonnegave for hs gae (by Theore 7), so we henceforh assue ha w 0. The dsrbuon D () corresponds o w / w. Then he condon n Eq. (3) s equvalen o or [ w w w z γ ( 1 z )] 1 γ w. (6) Ths s he sae as Eq. (1) f we le δ = γ. Fnally, f we defne he loss funcon o be hen { 1 fs 0 L(s) = 0 fs > 0 1 L ( s T +1 ) (7) (8) s exacly equal o Eq. (5). Our an resul on playng drfng gaes yelds n hs case exacly Freund s boos-byaory algorh (1995). There are nuerous varans of hs basc boosng seng o whch Freund s algorh has never been generalzed and analyzed. For nsance, we have so far requred weak hypoheses o oupu values n { 1, +1}. I s naural o generalze hs odel o allow weak hypoheses o ake values n { 1, 0, +1} so ha he weak hypoheses ay absan on soe exaples, or o ake values n [ 1, +1] so ha a whole range of values s possble. These correspond o sple odfcaons of he drfng gae descrbed above n whch we sply change B o { 1, 0, +1} or [ 1, +1]. As before, we requre ha Eq. (6) hold for all weak hypoheses and we aep o desgn a boosng algorh whch nzes Eq. (8). For boh of hese cases, we are able o derve analogs of he boos-by-aory algorh whch we prove are opal n a parcular sense. Anoher drecon for generalzaon s o he non-bnary ulclass case n whch labels y belong o a se Y ={1,...,n},n>. Followng generalzaons of he boosng algorh AdaBoos o he ulclass case (Freund & Schapre, 1997; Schapre & Snger, 1999), we allow he booser o assgn weghs boh o exaples and labels. Tha s, on each round, he booser devses a dsrbuon D (,l)over exaples and labels l Y. The weak learner hen copues a weak hypohess h : X Y { 1,+1}whch us be correc on a non-rval fracon of he exaple-label pars. Tha s, f we defne { +1 f y = l χ y (l) = 1 oherwse

70 R. E. SCHAPIRE hen we requre Pr (,l) D [h (x,l) χ y (l)] 1 γ. (9) The fnal hypohess, we assue, s agan a pluraly voe of he weak hypoheses: H(x) = arg ax y Y h (x, y). (10) We can cas hs ulclass boosng proble as a drfng gae as follows. We have n densons, one per class. I wll be convenen for he frs denson always o correspond o he correc label, wh he reanng n 1 densons correspondng o ncorrec labels. To do hs, le us defne a ap π l : R n R n whch sply swaps coordnaes 1 and l, leavng he oher coordnaes unouched. The wegh vecors w correspond o he dsrbuon D, odulo swappng of coordnaes, a correcon of sgn and noralzaon: [ ( )] π y w l D (,l)= w The nor used here o easure wegh vecors s l 1 -nor. Also, wll follow fro Theore 7 ha, for opal play of hs gae, he frs coordnae of w s always nonnegave and all oher coordnaes are nonposve. The drf vecors z are derved as before fro he weak hypoheses: z = π y ( h (x, 1),...,h (x,n) ). I can be verfed ha he condon n Eq. (9) s equvalen o Eq. (1) wh δ = γ. For bnary weak hypoheses, B ={ 1,+1} n. The fnal hypohess H akes a sake on exaple (x, y) f and only f h (x, y) ax l:l y h (x, l). Therefore, we can coun he fracon of sakes of he fnal hypohess n he drfng gae conex as where 1 L ( s T +1 ) { 1 fs1 ax{s,...,s n } L(s) = 0 oherwse. (11)

DRIFTING GAMES 71 Thus, by gvng an algorh for he general drfng gae, we also oban a generalzaon of he boos-by-aory algorh for ulclass probles. The algorh can be pleened n hs case n polynoal e for a consan nuber of classes n, and he algorh s provably bes possble n a parcular sense. We noe also ha a splfed for of he AdaBoos algorh (Freund & Schapre, 1997; Schapre & Snger, 1999) can be derved as an nsance of he OS algorh sply by changng he loss funcon L n Eq. (7) o an exponenal L(s) = exp( ηs) for soe η>0. More deals on hs gae are gven n Secon 7.. Besdes boosng probles, he drfng gae also generalzes he proble of learnng on-lne wh a se of expers (Cesa-Banch e al., 1997; Llesone & Waruh, 1994). In parcular, he BW algorh of Cesa-Banch e al. (1996) and he weghed aory algorh of Lesone and Waruh (1994) can be derved as specal cases of our an algorh for a parcular naural paraeerzaon of he drfng gae. Deals are gven n Secon 7.3. 4. The algorh and s analyss We nex descrbe our algorh for playng he general drfng gae of Secon. Lke Freund s boos-by-aory algorh (1995), he algorh we presen here uses a poenal funcon whch s cenral boh o he workngs of he algorh and s analyss. Ths funcon can be hough of as a guess of he loss ha we expec o suffer for a chp a a parcular poson and a a parcular pon n e. We denoe he poenal of a chp a poson s on round by φ (s). The fnal poenal s he acual loss so ha φ T = L. The poenal funcons φ for earler e seps are defned nducvely: φ 1 (s) = n sup w R n z B (φ (s + z) + w z δ w p ). (1) We wll show laer ha, under naural condons, he nu above acually exss. Moreover, he nzng vecor w s he one used by he shepherd for he algorh we now presen. We call our shepherd algorh OS for opal shepherd. The wegh vecor w chosen by OS for chp s any vecor w whch nzes sup z B ( φ ( s + z ) + w z δ w p ). Reurnng o he exaple a he end of Secon, fgure shows he poenal funcon φ and he weghs ha would be seleced by OS as a funcon of he poson of each chp for varous choces of. For hs fgure, T = 0. We wll need soe naural assupons o analyze hs algorh. The frs assupon saes erely ha he allowed drf vecors n B are bounded; for convenence, we assue hey have nor a os one. Assupon 1. sup z B z q 1. We nex assue ha he loss funcon L s bounded.

7 R. E. SCHAPIRE Fgure. Plos of he poenal funcon (op curve n each fgure) and he weghs seleced by OS (boo curves) as a funcon of he poson of a chp n he exaple gae a he end of Secon for varous choces of and wh T = 0. The vercal doed lnes show he boundary of he goal nerval [, 7]. Curves are only eanngful a neger values.

DRIFTING GAMES 73 Assupon. There exs fne L n and L ax such ha L n L(s) L ax for all s R n. In fac, hs assupon need only hold for all s wh s q T snce posons ousde hs range are never reached, gven Assupon 1. Fnally, we assue ha, for any drecon v, s possble o choose a drf whose proecon ono v s ore han δ by a consan aoun. Assupon 3. There exss a nuber µ>0such ha for all w R n here exss z B wh w z (δ + µ) w. Lea 1. Gven Assupons 1, and 3, for all = 0,...,T: 1. he nu n Eq. (1) exss; and. L n φ (s) L ax for all s R n. Proof: By backwards nducon on. The base cases are rval. Le us fx s and le F(z) = φ (s + z). Le H(w) = sup(f(z) + w z δ w ). z B Usng Assupon 1, for any w, w : H(w ) H(w) sup (F(z) + w z δ w ) (F(z) + w z δ w ) z B = sup (w w ) z + δ( w w ) z B (1+δ) w w. Therefore, H s connuous. Moreover, for w R n, by Assupons and 3 (as well as our nducve hypohess), Snce H(w) L n + (δ + µ) w δ w =L n + µ w. (13) H(0) L ax, (14) follows ha H(w) >H(0)f w >(L ax L n )/µ. Thus, for copung he nu of H,we only need consder pons n he copac se { w : w L } ax L n. µ Snce a connuous funcon over a copac se has a nu, hs proves Par 1. Par follows edaely fro Eqs. (13) and (14).

74 R. E. SCHAPIRE We nex prove an upper bound on he loss suffered by a shepherd eployng he OS algorh agans any adversary. Ths s he an resul of hs secon. We wll shorly see ha hs bound s essenally bes possble for any algorh. I s poran o noe ha hese heores ell us uch ore han he alos obvous pon ha he opal hng o do s whaever s bes n a nax sense. These heores prove he nonrval fac ha (nearly) nax behavor can be obaned whou he sulaneous consderaon of all of he chps a once. Raher, we can copue each wegh vecor w erely as a funcon of he poson of chp, whou consderaon of he posons of any of he oher chps. Theore. Under he condon of Assupons 1 3, he fnal loss suffered by he OS algorh agans any adversary s a os φ 0 (0) where he funcons φ are defned above. Proof: Followng Freund s analyss (1995), we show ha he oal poenal never ncreases. Tha s, we prove by nducon ha ( ) ( ) φ s +1 φ 1 s. (15) Ths ples, hrough repeaed applcaon of Eq. (15), ha 1 L ( s T +1 ) 1 = ( ) φ T s T +1 1 ( ) φ 0 s 1 = φ0 (0) as claed. The defnon of φ 1 gven n Eq. (1) ples ha for w chosen by he OS algorh, and for all z B and all s R n : φ (s + z) + w z δ w φ 1 (s). Therefore, ( ) ( φ s +1 = φ s + z ) ( φ 1 ( s ) w z + δ w ) φ 1 ( s ) where he las nequaly follows fro Eq. (1). Reurnng agan o he exaple a he end of Secon, fgure 3 shows a plo of he bound φ 0 (0) as a funcon of he oal nuber of rounds T. I s raher curous ha he bound s no onoonc n T (even dscounng he agged naure of he curve caused by he dfference beween even and odd lengh gaes). Apparenly, for hs gae, havng ore e o ge he chps no he goal regon can acually hur he shepherd.

DRIFTING GAMES 75 Fgure 3. A plo of he loss bound φ 0 (0) as a funcon of he oal nuber of rounds T for he exaple gae a he end of Secon. The agged naure of he curve s due o he dfference beween a gae wh an odd or an even nuber of seps. 5. A lower bound In hs secon, we prove ha he OS algorh s essenally opal n he sense ha, for any shepherd algorh, here exss an adversary capable of forcng a loss achng he upper bound of Theore 3 n he l of a large nuber of chps. Specfcally, we prove he followng heore, he an resul of hs secon: Theore. Le A be any shepherd algorh for playng a drfng gae sasfyng Assupons 1 3 where all paraeers of he gae are fxed, excep he nuber of chps. Le φ be as defned above. Then for any ɛ>0, here exss an adversary such ha for suffcenly large, he loss suffered by algorh A s a leas φ 0 (0) ɛ. To prove he heore, we wll need wo leas. The frs gves an absrac resul on copung a nax of he knd appearng n Eq. (1). The second lea uses he frs o prove a characerzaon of φ n a for aenable o use n he proof of Theore 3.

76 R. E. SCHAPIRE Lea 4. Le S be any nonepy, bounded subse of R. Le C be he convex hull of S. Then nf sup{y + αx : (x, y) S} =sup{y : (0, y) C}. α R Proof: Le C be he closure of C. Frs, for any α R, sup{y + αx : (x, y) S} =sup{y + αx : (x, y) C} = sup{y + αx : (x, y) C}. (16) The frs equaly follows fro he fac ha, f (x, y) C hen (x, y) = N p (x, y ) =1 for soe posve neger N, p [0, 1], p = 1, (x, y ) S. Bu hen y + αx = N =1 p (y + αx ) ax (y + αx ). The second equaly n Eq. (16) follows sply because he supreu of a connuous funcon on any se s equal o s supreu over he closure of he se. For hs sae reason, sup{y : (0, y) C} =sup{y : (0, y) C}. (17) Because C s closed, convex and bounded, and because he funcon y +αx s connuous, concave n (x, y) and convex n α, we can reverse he order of he nf sup (see, for nsance, Corollary 37.3. of Rockafellar (1970)). Tha s, nf sup α R (x,y) C Clearly, f x 0 hen (y + αx) = sup nf (y + αx) =. α R (x,y) C α R Thus, he rgh hand sde of Eq. (18) s equal o sup{y : (0, y) C}. nf (y + αx). (18) Cobnng wh Eqs. (16) and (17) edaely gves he resul.

DRIFTING GAMES 77 Lea 5. Under he condon of Assupons 1 3, and for φ as defned above, N φ 1 (s) = nf sup d φ (s + z ) v : v =1 =1 where he supreu s aken over all posve negers N, all z 1,...,z N B and all nonnegave d 1,...,d N sasfyng d = 1 and d v z = δ. Proof: To splfy noaon, le us fx and s. Le F and H be as defned n he proof of Lea 1. For v =1, le G(v) = sup N d F(z ) (19) =1 where agan he supreu s aken over d s and z s as n he saeen of he lea. Noe ha by Assupon 3, hs supreu canno be vacuous. Throughou hs proof, we use v o denoe a vecor of nor one, whle w s a vecor of unresrced nor. Our goal s o show ha nf v G(v) = nf H(w). (0) Le us fx v oenarly. Le w S ={(v z δ, F(z)) : z B}. Then S s bounded by Assupons 1 3 (and par of Lea 1), so we can apply Lea 4 whch gves nf Noe ha sup α R z B (F(z) + α(v z δ)) = G(v). (1) nf H(αv) = nf α 0 nf sup α 0 z B sup α R z B sup α R z B nf (F(z) + αv z αδ) (F(z) + αv z αδ) (F(z) + αv z α δ) = nf H(αv) α R (where he second nequaly uses α α ). Cobnng wh Eq. (1) gves nf v nf α 0 H(αv) nf v G(v) nf nf H(αv). v α R

78 R. E. SCHAPIRE Snce he lef and rgh ers are boh equal o nf w H(w), hs ples Eq. (0) and coplees he proof. Proof of Theore 3: We wll show ha, for suffcenly large, on round, he adversary can choose he z s so ha 1 ( ) φ s +1 1 Repeaedly applyng Eq. () ples ha 1 L ( s T +1 ) 1 = φ 1 ( s ) ɛ T. () ( ) φ T s T +1 1 ( ) φ 0 s 1 ɛ = φ0 (0) ɛ provng he heore. Fx. We use a rando consrucon o show ha here exs z s wh he desred properes. For each wegh vecor w chosen by he shepherd, le d 1,...,d N [0, 1] and z 1,...,z N B be such ha d = 1, d w z = δ w and d φ ( s + z ) φ 1 ( s ) ɛ T. Such d s and z s us exs by Lea 5. Usng Assupon 3, le z 0 be such ha w z 0 (δ + µ) w. Fnally, le Z be a rando varable ha s z 0 wh probably α and z wh probably (1 α)d (ndependen of he oher Z s). Here, α = ɛ 4T (L ax L n ). Le v = w / w, and le a = w / w. By Assupon 1, v Z 1. Also, E[v Z ] (1 α)δ + α(δ + µ) = δ + αµ. Thus, by Hoeffdng s nequaly (1963), [ ] Pr a v Z <δ exp ( α µ ) e α µ /. (3) a

DRIFTING GAMES 79 Le S = (1/) φ (s + Z ). Then E[S] 1 = 1 1 [( ( ) φ 1 s ɛ ( ) )(1 ] α) + αφ s + z 0 T [ ( ) ( ( ) ( ))] φ 1 s + α φ s + z 0 φ 1 s ɛ (1 α) T φ 1 ( s ) α(lax L n ) ɛ T. (4) By Hoeffdng s nequaly (1963), snce L n φ (s + Z ) L ax, Pr[S < E[S] α(l ax L n )] e α. (5) Now le be so large ha e α + e α µ / < 1. Then by Eqs. (3) and (5), here exss a choce of z s such ha w z = a v z δ and such ha 1 ( ) φ s +1 1 = by Eq. (4) and our choce of α. ( φ s + z ) E[S] α(l ax L n ) 1 ( ) φ 1 s ɛ T 6. Copuaonal ehods In hs secon, we dscuss general copuaonal ehods for pleenng he OS algorh. 6.1. Unae loss funcons We frs noe ha, for loss funcons L wh ceran onooncy properes, he quadran n whch he nzng wegh vecors are o be found can be deerned a pror. Ths ofen splfes he search for na. To be ore precse, for σ { 1,+1} n and x, y R n,

80 R. E. SCHAPIRE le us wre x σ y f σ x σ y for all 1 n. We say ha a funcon f : R n R s unae wh sgn vecor σ { 1,+1} n f f (x) f (y) whenever x σ y. Lea 6. If he loss funcon L s unae wh sgn vecor σ { 1,+1} n,hen so s φ (as defned above) for = 0,...,T. Proof: By backwards nducon on. The base case s edae. Le x σ y. Then for any z B and w R n, x + z σ y + z, and so φ (x + z) + w z δ w φ (y+z)+w z δ w by nducve hypohess. Therefore, φ 1 (x) φ 1 (y), and so φ 1 s also unae. For he an heore of hs subsecon, we need one ore assupon: Assupon 4. If z B and f z s such ha z = z for all, hen z B. Theore 7. Under he condon of Assupons 1 4, f L s unae wh sgn vecor σ { 1,+1} n,hen for any s R n, here s a vecor w whch nzes sup(φ (s + z) + w z δ w ) z B and for whch w σ 0. Proof: Le F and H be as n he proof of Lea 1. By Lea 6, F s unae. Le w R n have soe coordnae for whch σ w > 0 so ha w σ 0. Le w be such ha { w w f = w f =. We show ha H(w ) H(w). Le z B. Ifσ z >0 hen F(z) + w z δ w F(z)+w z δ w. If σ z 0 hen le z be defned analogously o w. By Assupon 4, z B. Then z σ z and so F(z) F(z ). Thus, F(z ) + w z δ w F(z)+w z δ w. Hence, H(w ) H(w). Applyng hs arguen repeaedly, we can derve a vecor w wh w σ 0 and such ha H( w) H(w). Ths proves he heore. Noe ha he loss funcons for all of he gaes n Secon 3 are unae (and also sasfy Assupons 1 4). The sae wll be rue of all of he gaes dscussed n Secon 7. Thus,

DRIFTING GAMES 81 for all of hese gaes, we can deerne a pror he sgns of each of he coordnaes of he nzng vecors used by he OS algorh. 6.. A general echnque usng lnear prograng In any cases, we can use lnear prograng o pleen OS. In parcular, le us assue ha we easure wegh vecors w usng he l 1 nor (.e., p = 1). Also, le us assue ha B s fne. Then gven φ and s, copung φ 1 (s) = n w R n ax z B (φ (s + z) + w z δ w ) can be rewren as an opzaon proble: varables: nze: subec o: w R n, b R b z B : φ (s + z) + w z δ w b. The nzng value b s he desred value of φ 1 (s). Noe ha, wh respec o he varables w and b, hs proble s alos a lnear progra, f no for he nor operaor. However, when L s unae wh sgn vecor σ, and when he oher condons of Theore 7 hold, we can resrc w so ha w σ 0. Ths allows us o wre w 1 = n σ w. =1 Addng w σ 0 as a consran (or raher, a se of n consrans), we now have derved a lnear progra wh n + 1 varables and B + n consrans. Ths can be solved n polynoal e. Thus, for nsance, hs echnque can be appled o he ulclass boosng proble dscussed n Secon 3. In hs case, B ={ 1,+1} n. So, for any s, φ 1 (s) can be copued fro φ n e polynoal n n whch ay be reasonable for sall n. In addon, φ us be copued a each reachable poson s n an n-densonal neger grd of radus,.e., for all s {, +1,..., 1,} n. Ths nvolves copuaon of φ a ( + 1) n pons, gvng an overall runnng e for he algorh whch s polynoal n (T + 1) n. Agan, hs ay be reasonable for very sall n. I s an open proble o fnd a way o pleen he algorh ore effcenly. 7. Dervng old and new algorhs In hs secon, we show how a nuber of old and new boosng and on-lne learnng algorhs can be derved and analyzed as nsances of he OS algorh for appropraely chosen drfng gaes.

8 R. E. SCHAPIRE 7.1. Boos-by-aory and varans We begn wh he drfng gae descrbed n Secon 3 correspondng o bnary boosng wh B ={ 1,+1}. For hs gae, φ 1 (s) = n w 0 ax{φ (s 1) w δw, φ (s + 1) + w δw} where we know fro Theore 7 ha only nonnegave values of w need o be consdered. I can be argued ha he nu us occur when.e., when φ (s 1) w δw = φ (s + 1) + w δw, w = φ (s 1) φ (s + 1). (6) Ths gves φ 1 (s) = 1 + δ φ (s + 1) + 1 δ φ (s 1). Solvng gves φ (s) = T 0 k (T s)/ ( T k )( ) 1 + δ k 1 δ (where we follow he convenon ha ( n k ) = 0fk <0ork >n). Weghng exaples usng Eq. (6) gves exacly Freund s (1995) boos-by-aory algorh (he boosng by resaplng verson). When B ={ 1,0,+1}, a slar bu ore nvolved analyss gves { φ 1 (s) =ax (1 δ)φ (s) + δφ (s + 1), 1 + δ φ (s + 1) + 1 δ } φ (s 1) and he correspondng choce of w s φ (s) φ (s + 1) or (φ (s 1) φ (s + 1))/, dependng on wheher he axu n Eq. (7) s realzed by he frs or second quany. We do no know how o solve he recurrence n Eq. (7) so ha he bound φ 0 (0) gven n Theore can be pu n explc for. Neverheless, hs bound can easly be evaluaed nuercally, and he algorh can ceranly be pleened effcenly n s presen for. We have hus far been unable o solve he recurrence for he case ha B = [ 1, +1], even o a pon a whch he algorh can be pleened. However, hs case can be approxaed by he case n whch B ={/N : = N,...,N} for a oderae value (7)

DRIFTING GAMES 83 Fgure 4. A coparson of he bound φ 0 (0) for he drfng gaes assocaed wh AdaBoos (Secon 7.) and boos-by-aory (Secons 3 and 7.1). For AdaBoos, η s se as n Eq. (8). For boos-by-aory, he bound s ploed when B s { 1, +1}, { 1, 0, +1} and [ 1, +1]. (The laer case s approxaed by B = {/100 : = 100,...,100}.) The bound s ploed as a funcon of he nuber of rounds T. The drf paraeer s fxed o δ = 0.. (The agged naure of he B ={ 1,+1}curve s due o he fac ha gaes wh an even nuber of rounds n whch es coun as a loss for he shepherd so ha L(0) = 1 are harder han gaes wh an odd nuber of rounds.) of N. In he laer case, he poenal funcon and assocaed weghs can be copued nuercally. For nsance, lnear prograng can be used as dscussed n Secon 6.. Alernavely, can be shown ha Lea 5 cobned wh Theore 7 ples ha φ 1 (s) = ax{pφ (s + z 1 ) + (1 p)φ (s + z ) : z 1, z B, p [0, 1], pz 1 + (1 p)z = δ} whch can be evaluaed usng a sple search over all pars z 1, z (snce B s fne). Fgure 4 copares he bound φ 0 (0) for he drfng gaes assocaed wh boos-byaory and varans n whch B s { 1, +1}, { 1, 0, +1} and [ 1, +1] (usng he approxaon ha was us enoned), as well as AdaBoos (dscussed n he nex secon). These bounds are ploed as a funcon of he nuber of rounds T. 7.. AdaBoos and varans As enoned n Secon 3, a splfed, non-adapve verson of AdaBoos can be derved as an nsance of OS. To do hs, we sply replace he loss funcon (Eq. (7)) n he bnary boosng gae of Secon 3 wh an exponenal loss funcon L(s) = e ηs where η>0s a paraeer of he gae. As a specal case of he dscusson below, wll follow ha φ (s) = κ T e ηs

84 R. E. SCHAPIRE where κ s he consan κ = 1 δ eη + 1 + δ e η. Also, he wegh gven o a chp a poson s on round s ( e κ T η e η ) e ηs whch s proporonal o e ηs (n oher words, he weghng funcon s effecvely unchanged fro round o round). Ths weghng s he sae as he one used by a non-adapve verson of AdaBoos n whch all weak hypoheses are gven equal wegh. Snce e ηs s an upper bound on he loss funcon of Eq. (7), Theore ples an upper bound on he fracon of sakes of he fnal hypohess of φ 0 (0) = κ T. When η = 1 ( ) 1 + δ ln 1 δ (8) so ha κ s nzed, hs gves an upper bound of (1 δ ) T/ = (1 4γ ) T/ whch s equvalen o a non-adapve verson of Freund and Schapre s (1997) analyss. We nex consder a ore general drfng gae n n densons whose loss funcon s a su of exponenals L(s) = k b exp( η u s) (9) =1 where he b s, η s and u s are paraeers wh b > 0, η > 0, u 1 = 1 and u σ 0 for soe sgn vecor σ. For hs gae, B = [ 1, +1] n and p = 1. Many (non-adapve) varans of AdaBoos correspond o specal cases of hs gae. For nsance, AdaBoos.M (Freund & Schapre, 1997), a ulclass verson of AdaBoos, essenally uses he loss funcon L(s) = n = e (η/)(s 1 s ) where we follow he ulclass seup of Secon 3 so ha n s he nuber of classes, and he frs coponen n he drfng gae s denfed wh he correc class. (As before, we

DRIFTING GAMES 85 only consder a non-adapve gae n whch η>0 s a fxed, unable paraeer.) Lkewse, AdaBoos.MH (Schapre & Snger, 1999), anoher ulclass verson of AdaBoos, uses he loss funcon L(s) = e ηs 1 + n e ηs. = Noe ha boh loss funcons upper bound he rue loss for ulclass boosng gven n Eq. (11). Moreover, boh funcons clearly have he for gven n Eq. (9). We cla ha, for he general gae wh loss funcon as n Eq. (9), φ (s) = b κ T exp( η u s) (30) where κ = 1 δ eη + 1 + δ e η. Proof of Eq. (30) s by backwards nducon on. For fxed and s, le w = ( e b κ T η e η ) u exp( η u s). We wll show ha hs s he nzng wegh vecor ha ges used by OS for a chp a poson s a e. Le b = b κ T exp( η u s). Noe ha φ (s + z) + w z = b b ( ( e η e η ) ) exp( η u z) + u z ( e η + e η ) (31) snce ( e e ηx η + e η ) ( e η e η ) x for all η R and x [ 1, +1] by convexy of e ηx. Also, by our assupons on b, u and η, we can copue w 1 = ( e b η e η ). (3)

86 R. E. SCHAPIRE Thus, cobnng Eqs. (31) and (3) gves φ 1 (s) sup(φ (s + z) + w z δ w 1 ) z B = b κ b κ T +1 exp( η u s). Ths gves he needed upper bound on φ 1 (s). For he lower bound, usng Theore 7 (snce L s unae wh sgn vecor σ), we have φ 1 (s) n ax (φ (s + z) + w z δ w 1 ) w σ 0 z { σ, σ} { = n ax b c 0 e η + c δc, } b eη c δc where we have used u σ = 1 and w σ = w 1 (snce u σ 0 and w σ 0). We also have denfed c wh w 1. Solvng he n ax expresson gves he desred lower bound. Ths coplees he proof of Eq. (30). 7.3. On-lne learnng algorhs In hs secon, we show how Cesa-Banch e al. s (1996) BW algorh for cobnng exper advce can be derved as an nsance of OS. We wll also see how her algorh can be generalzed, and how Llesone and Waruh s (1994) weghed aory algorh can also be derved and analyzed. Suppose ha we have access o expers. On each round, each exper provdes a predcon ξ { 1,+1}. A aser algorh cobnes her predcons no s own predcon ψ { 1,+1}. An oucoe y { 1,+1}s hen observed. The aser akes a sake f ψ y, and slarly for exper f ξ y. The goal of he aser s o nze how any sakes akes relave o he bes exper. We wll consder aser algorhs whch use a weghed aory voe o for her predcons; ha s, ψ = sgn ( =1 w ξ ). The proble s o derve a good choce of weghs w. We also assue ha he aser algorh s conservave n he sense ha rounds on whch he aser s predcons are correc are effecvely gnored (so ha he weghs w only depend upon prevous rounds on whch sakes were ade).

DRIFTING GAMES 87 Le us suppose ha here s one exper ha akes a os k sakes. We wll (re)derve an algorh (naely, BW) and a bound on he nuber of sakes ade by he aser, gven hs assupon. Snce we resrc our aenon o conservave algorhs, we can assue whou loss of generaly ha a sake occurs on every round and sply proceed o bound he oal nuber of rounds. To se up he proble as a drfng gae, we denfy one chp wh each of he expers. The proble s one densonal so n = 1. The weghs w seleced by he aser are he sae as hose chosen by he shepherd. Snce we assue ha he aser akes a sake on each round, we have for all ha y w ξ 0. (33) Thus, f we defne he drf z o be y ξ, hen w z 0. Seng δ = 0, we see ha Eq. (33) s equvalen o Eq. (1). Also, B ={ 1,+1}. Le M be he nuber of sakes ade by exper on rounds 1,..., 1. Then by defnon of z, s = M + 1. Le he loss funcon L be { 1 fs k T L(s) = 0 oherwse. (34) Then L(s T +1 ) = 1 f and only f exper akes a oal of k or fewer sakes n T rounds. Thus, our assupon ha he bes exper akes a os k sakes ples ha 1 L ( s T +1 ). (35) On he oher hand, Theore ples ha 1 L ( s T +1 ) φ0 (0). (36) By an analyss slar o he one gven n Secon 7.1, can be seen ha φ 1 (s) = 1 (φ (s + 1) + φ (s 1)).

88 R. E. SCHAPIRE Solvng hs recurrence gves ( ) T φ (s) = T k +s where ( ) n = k k ( ) n. k =0 In parcular, φ 0 (0) = T ( T k ). (37) Cobnng Eqs. (35) (37) gves ( ) 1 T T. (38) k In oher words, he nuber of sakes T of he aser algorh us sasfy Eq. (38) and so us be a os { ( )} q ax q N : q lg + lg, k he sae bound gven by Cesa-Banch e al. (1996). The weghng funcon obaned s also equvalen o hers snce, by a slar arguen o ha used n Secon 7.1, OS gves w = 1 ( ( φ s 1 ) ( φ s + 1 )) ( ) T = T 1 k +s 1 ( ) T = T 1 k M. Noe ha hs arguen can be generalzed o he case n whch he exper s predcons are no resrced o { 1, +1} bu nsead ay be all of [ 1, +1], or a subse of hs nerval, such as{ 1, 0, +1}. The perforance of each exper hen s easured on each round usng absolue loss 1 ξ y raher han wheher or no ade a sake. In hs case, as n he analogous exenson of boos-by-aory gven n Secon 3, we only need o replace B by [ 1, +1] or { 1, 0, +1}. The resulng bound on he nuber of sakes of he aser s

DRIFTING GAMES 89 hen he larges T for whch 1/ φ 0 (0) (noe ha φ 0 (0) depends plcly on T ). The resulng aser algorh sply uses he weghs copued by OS for he approprae drfng gae. I s an open proble o deerne f hs generalzed algorh enoys srong opaly properes slar o hose of BW (Cesa-Banch e al., 1996). Llesone and Waruh s (1994) weghed aory algorh can also be derved as an nsance of OS. To do hs, we sply replace he loss funcon L n he gae above wh L(s) = exp( η(s k + T)) for soe paraeer η>0. Ths loss funcon upper bounds he one n Eq. (34). We assue ha expers are pered o oupu predcons n [ 1, +1] so ha B = [ 1, +1]. Fro he resuls of Secon 7. appled o hs drfng gae, where φ (s) = κ T exp( η(s k + T)) κ = eη + e η. Therefore, because one exper suffers loss a os k, 1 φ 0(0) = κ T e η(k T). Equvalenly, he nuber of sakes T s a os ηk + ln ln ( ), 1+e η exacly he bound gven by Llesone and Waruh (1994). The algorh s also he sae as hers snce he wegh gven o an exper (chp) a poson s a e s ( e w η e η ) = exp ( η ( s k + T )) exp ( ηm ). 8. Open probles Ths paper represens he frs work on general drfng gaes. As such, here are any open probles. We have presened closed-for soluons of he poenal funcon for us a few specal cases. Are here oher cases n whch such closed-for soluons are possble? In parcular, can he boosng gaes of Secon 3 correspondng o B ={ 1,0,+1}and B = [ 1, +1] be pu no closed-for?

90 R. E. SCHAPIRE For gaes n whch a closed for s no possble, s here neverheless a general ehod of characerzng he loss bound φ 0 (0), say, as he nuber of rounds T ges large? Sde producs of our work nclude new versons of boos-by-aory for he ulclass case, as well as bnary cases n whch he weak hypoheses have range { 1, 0, +1} or [ 1, +1]. However, he opaly proof for he drfng gae only carres over o he boosng seng f he fnal hypohess has he resrced fors gven n Eqs. (4) and (10). Are he resulng boosng algorhs also opal (for nsance, n he sense proved by Freund (1995) for boos-by-aory) whou hese resrcons? Lkewse, can he exensons of he BW algorh n Secon 7.3 be shown o be opal? Can hs algorh be exended usng drfng gaes o he ulclass case, or o he case n whch he aser s allowed o oupu predcons n [ 1, +1] (sufferng absolue loss)? The OS algorh s non-adapve n he sense ha δ us be known ahead of e. To wha exen can OS be ade adapve? For nsance, can Freund s (001) recen echnque for akng boos-by-aory adapve be carred over o he general drfng-gae seng? Slarly, wha happens f he nuber of rounds T s no known n advance? Fnally, are here oher neresng drfng gaes for enrely dfferen learnng probles such as regresson or densy esaon? Acknowledgens Many hanks o Yoav Freund for very helpful dscussons whch led o hs research. Noes 1. In an earler verson of hs paper, he shepherd was called he drfer, a er ha was found by soe readers o be confusng. The nae of he an algorh has also been changed fro Shepherd o OS.. Of course, he real goal of a boosng algorh s o fnd a hypohess wh low generalzaon error. In hs paper, we only focus on he splfed proble of nzng error on he gven ranng exaples. References Blackwell, D. (1956). An analog of he nax heore for vecor payoffs. Pacfc Journal of Maheacs, 6:1, 1 8. Cesa-Banch, N., Freund, Y., Haussler, D., Helbold, D. P., Schapre, R. E., & Waruh, M. K. (1997). How o use exper advce. Journal of he Assocaon for Copung Machnery, 44:3, 47 485. Cesa-Banch, N., Freund, Y., Helbold, D. P., & Waruh, M. K. (1996). On-lne predcon and converson sraeges. Machne Learnng, 5, 71 110. Freund, Y. (1995). Boosng a weak learnng algorh by aory. Inforaon and Copuaon, 11:, 56 85. Freund, Y. (001). An adapve verson of he boos by aory algorh. Machne Learnng, 43:3, 93 318. Freund, Y. & Schapre, R. E. (1997). A decson-heorec generalzaon of on-lne learnng and an applcaon o boosng. Journal of Copuer and Syse Scences, 55:1, 119 139. Hoeffdng, W. (1963). Probably nequales for sus of bounded rando varables. Journal of he Aercan Sascal Assocaon, 58:301, 13 30. Llesone, N. & Waruh, M. K. (1994). The weghed aory algorh. Inforaon and Copuaon, 108, 1 61.

DRIFTING GAMES 91 Rockafellar, R. T. (1970). Convex Analyss. Prnceon, NJ: Prnceon Unversy Press. Schapre, R. E. & Snger, Y. (1999). Iproved boosng algorhs usng confdence-raed predcons. Machne Learnng, 37:3, 97 336. Receved Ocober 8, 1999 Revsed Ocober 8, 1999 Acceped June 1, 000 Fnal anuscrp July 31, 000