A decision-theoretic generalization of on-line learning. and an application to boosting. AT&T Bell Laboratories. 600 Mountain Avenue

Size: px
Start display at page:

Download "A decision-theoretic generalization of on-line learning. and an application to boosting. AT&T Bell Laboratories. 600 Mountain Avenue"

Transcription

1 A decson-heorec generalzaon of on-lne learnng and an applcaon o boosng Yoav Freund Rober E. Schapre AT&T Bell Laboraores 600 Mounan Avenue Room f2b-428, 2A-424g Murray Hll, NJ fyoav, schapreg@research.a.com Sepember 20, 995 Absrac In he rs par of he paper we consder he problem of dynamcally apporonng resources among a se of opons n a wors-case on-lne framework. The model we sudy can be nerpreed as a broad, absrac exenson of he well-suded on-lne predcon model o a general decson-heorec seng. We show ha he mulplcave weghupdae rule of Llesone and Warmuh [5] can be adaped o hs model yeldng bounds ha are slghly weaker n some cases, bu applcable o a consderably more general class of learnng problems. We show how he resulng learnng algorhm can be appled o a varey of problems, ncludng gamblng, mulple-oucome predcon, repeaed games and predcon of pons n R n. In he second par of he paper we apply he mulplcave wegh-updae echnque o derve a new boosng algorhm. Ths boosng algorhm does no requre any pror knowledge abou he performance of he weak learnng algorhm. We also sudy generalzaons of he new boosng algorhm o he problem of learnng funcons whose range, raher han beng bnary, s an arbrary ne se or a bounded segmen of he real lne. Inroducon A gambler, frusraed by perssen horse-racng losses and envous of hs frends' wnnngs, decdes o allow a group of hs fellow gamblers o make bes on hs behalf. He decdes he wll wager a xed sum of money n every race, bu ha he wll apporon hs money among hs frends based on how well hey are dong. Ceranly, f he knew psychcally ahead of me whch of hs frends would wn he mos, he would naurally have ha frend handle all hs wagers. Lackng such clarvoyance, however, he aemps o allocae each race's wager n such a way ha hs oal wnnngs for he season wll be reasonably close o wha he would have won had he be everyhng wh he luckes of hs frends. In hs paper, we descrbe a smple algorhm for solvng such dynamc allocaon problems, and we show ha our soluon can be appled o a grea assormen of learnng problems. An exended absrac of hs work appeared n he Proceedngs of he Second European Conference on Compuaonal Learnng Theory, Barcelona, March, 995. Ths draf was submed for journal publcaon.

2 Perhaps he mos surprsng of hese applcaons s he dervaon of a new algorhm for \boosng,".e., for converng a \weak" PAC learnng algorhm ha performs jus slghly beer han random guessng no one wh arbrarly hgh accuracy. We formalze our on-lne allocaon model as follows. The allocaon agen A has N opons or sraeges o choose from; we number hese usng he negers ; : : :; N. A each me sep = ; 2; : : :; T, he allocaor A decdes on a dsrbuon p over he sraeges; ha s p 0 s he amoun allocaed o sraegy, and P N = p =. Each sraegy hen suers some loss ` whch s deermned by he (possbly adversaral) \envronmen." The loss suered by A s hen P N= p ` = p `,.e., he average loss of he sraeges wh respec o A's chosen allocaon rule. We call hs loss funcon he mxure loss. In hs paper, we always assume ha he loss suered by any sraegy s bounded so ha, whou loss of generaly, ` 2 [0; ]. Besdes hs condon, we make no assumpons abou he form of he loss vecors `, or abou he manner n whch hey are generaed; ndeed, he adversary's choce for ` may even depend on he allocaor's chosen mxure p. The goal of he algorhm A s o mnmze s cumulave loss relave o he loss suered by he bes sraegy. Tha s, A aemps o mnmze s ne loss where L A? mn L L A = = p ` s he oal cumulave loss suered by algorhm A on he rs T rals, and L = s sraegy 's cumulave loss. In Secon 2, we show ha Llesone and Warmuh's [5] \weghed majory" algorhm can be generalzed o handle hs problem, and we prove a number of bounds on he ne loss. For pt nsance, one of our resuls shows ha he ne loss of our algorhm can be bounded by O ln N or, pu anoher way, ha he average per ral ne loss s decreasng a he rae Op (ln N)=T. Thus, as T ncreases, hs derence decreases o zero. Our resuls for he on-lne allocaon model can be appled o a wde varey of learnng problems, as we descrbe n Secon 3. In parcular, we generalze he resuls of Llesone and Warmuh [5] and Cesa-Banch e al. [2] for he problem of predcng a bnary sequence usng he advce of a eam of \expers." Whereas hese auhors proved wors-case bounds for makng on-lne randomzed decsons over a bnary decson and oucome space wh a f0; gvalued dscree loss, we prove (slghly weaker) bounds ha are applcable o any bounded loss funcon over any decson and oucome spaces. Our bounds express explcly he rae a whch he loss of he learnng algorhm approaches ha of he bes exper. Relaed generalzaons of he exper predcon model were suded by Vovk [9], Kvnen and Warmuh [4], and Haussler, Kvnen and Warmuh []. Lke us, hese auhors focused prmarly on mulplcave wegh-updae algorhms. Chung [3] also presened a generalzaon, gvng he problem a game-heorec reamen. = ` 2

3 Boosng Reurnng o he horse-racng sory, suppose now ha he gambler grows weary of choosng among he expers and nsead wshes o creae a compuer program ha wll accuraely predc he wnner of a horse race based on he usual nformaon (number of races recenly won by each horse, beng odds for each horse, ec.). To creae such a program, he asks hs favore exper o explan hs beng sraegy. No surprsngly, he exper s unable o arculae a grand se of rules for selecng a horse. On he oher hand, when presened wh he daa for a specc se of races, he exper has no rouble comng up wh a \rule-of-humb" for ha se of races (such as, \Be on he horse ha has recenly won he mos races" or \Be on he horse wh he mos favored odds"). Alhough such a rule-of-humb, by self, s obvously very rough and naccurae, s no unreasonable o expec o provde predcons ha are a leas a lle b beer han random guessng. Furhermore, by repeaedly askng he exper's opnon on deren collecons of races, he gambler s able o exrac many rules-of-humb. In order o use hese rules-of-humb o maxmum advanage, here are wo problems faced by he gambler: Frs, how should he choose he collecons of races presened o he exper so as o exrac rules-of-humb from he exper ha wll be he mos useful? Second, once he has colleced many rules-of-humb, how can hey be combned no a sngle, hghly accurae predcon rule? Boosng refers o hs general problem of producng a very accurae predcon rule by combnng rough and moderaely naccurae rules-of-humb. In he second par of he paper, we presen and analyze a new boosng algorhm nspred by he mehods we used for solvng he on-lne allocaon problem. Formally, boosng proceeds as follows: The booser s provded wh a se of labeled ranng examples (x ; y ); : : :; (x N ; y N ), where y s he label assocaed wh nsance x ; for nsance, n he horse-racng example, x mgh be he observable daa assocaed wh a parcular horse race, and y he oucome (wnnng horse) of ha race. On each round = ; : : :; T, he booser devses a dsrbuon D over he se of examples, and requess (from an unspeced oracle) a weak hypohess (or rule-of-humb) h wh low error wh respec o D (ha s, = Pr D [h (x ) 6= y ]). Thus, dsrbuon D speces he relave mporance of each example for he curren round. Afer T rounds, he booser mus combne he weak hypoheses no a sngle predcon rule. Unlke he prevous boosng algorhms of Freund [8, 9] and Schapre [6], he new algorhm needs no pror knowledge of he accuraces of he weak hypoheses. Raher, adaps o hese accuraces and generaes a weghed majory hypohess n whch he wegh of each weak hypohess s a funcon of s accuracy. For bnary predcon problems, we prove n Secon 4 ha he error of hs nal hypohess (wh respec o he gven se of examples) s bounded by exp(?2 P T = 2 ) where = =2? s he error of he h weak hypohess. Snce a hypohess ha makes enrely random guesses has error =2, measures he accuracy of he h weak hypohess relave o random guessng. Thus, hs bound shows ha f we can conssenly nd weak hypoheses ha are slghly beer han random guessng, hen he error of he nal hypohess drops exponenally fas. Noe ha he bound on he accuracy of he nal hypohess mproves when any of he weak hypoheses s mproved. Ths s n conras wh prevous boosng algorhms whose performance bound depended only on he accuracy of he leas accurae weak hypohess. A he same me, f he weak hypoheses all have he same accuracy, he performance of he new algorhm s very close o ha acheved by he bes of he known boosng algorhms. In Secon 5, we gve wo exensons of our boosng algorhm o mul-class predcon 3

4 Algorhm Hedge() Parameers: 2 [0; ] nal wegh vecor w 2 [0; ] N wh P N = w = number of rals T Do for = ; 2; : : :; T. Choose allocaon p = w P N= w 2. Receve loss vecor ` 2 [0; ] N from envronmen. 3. Suer loss p `. 4. Se he new weghs vecor o be w + = w ` Fgure : The on-lne allocaon algorhm. problems n whch each example belongs o one of several possble classes (raher han jus wo). We also gve an exenson o regresson problems n whch he goal s o esmae a real-valued funcon. 2 The on-lne allocaon algorhm and s analyss In hs secon, we presen our algorhm, called Hedge(), for he on-lne allocaon problem. The algorhm and s analyss are drec generalzaons of Llesone and Warmuh's weghed majory algorhm [5]. The pseudo-code for Hedge() s shown n Fgure. The algorhm manans a wegh vecor whose value a me s denoed w = w ; : : :; w N. A all mes, all weghs wll be nonnegave. All of he weghs of he nal wegh vecor w mus be nonnegave and sum o one, so ha P N = w =. Besdes hese condons, he nal wegh vecor may be arbrary, and may be vewed as a \pror" over he se of sraeges. Snce our bounds are sronges for hose sraeges recevng he greaes nal wegh, we wll wan o choose he nal weghs so as o gve he mos wegh o hose sraeges whch we expec are mos lkely o perform he bes. Naurally, f we have no reason o favor any of he sraeges, we can se all of he nal weghs equally so ha w = =N. Noe ha he weghs on fuure rals need no sum o one. Our algorhm allocaes among he sraeges usng he curren wegh vecor, afer normalzng. Tha s, a me, Hedge() chooses he dsrbuon vecor p = w + w P N= : () w Afer he loss vecor ` has been receved, he wegh vecor w s updaed usng he mulplcave rule = w ` : (2) 4

5 More generally, can be shown ha our analyss s applcable wh only mnor modcaon o an alernave updae rule of he form w + = w U (` ) where U : [0; ]! [0; ] s any funcon, parameerzed by 2 [0; ] sasfyng for all r 2 [0; ]. 2. Analyss r U (r)? (? )r The analyss of Hedge() mmcs drecly ha gven by Llesone and Warmuh [5]. The man dea s o derve upper and lower bounds on P N = w T + whch, ogeher, mply an upper bound on he loss of he algorhm. We begn wh an upper bound. Lemma For any sequence of loss vecors `; : : :; `T,! ln = w T +?(? )L Hedge() : Proof: By a convexy argumen, can be shown ha r? (? )r (3) for 0 and r 2 [0; ]. Combned wh Equaons () and (2), hs mples = w + = = Applyng repeaedly for = ; : : :; T yelds X N X N w ` w (? (? )` ) = w = = = w T + TY = (? (? )p `) exp?(? ) =! p ` (? (? )p `): (4) snce + x e x for all x. The lemma follows mmedaely. Thus, P L Hedge()? ln N = wt + : (5)? Noe ha, from Equaon (2), w T + = w TY = Ths s all ha s needed o complee our analyss. 5! ` = w L : (6)

6 Theorem 2 For any sequence of loss vecors `; : : :; `T, and for any 2 f; : : :; Ng, we have L Hedge()? ln(w )? L ln : (7)? More generally, for any nonempy se S f; : : :; Ng, we have L Hedge()? ln(p 2S w )? (ln ) max 2S L : (8)? Proof: We prove he more general saemen (8) snce Equaon (7) follows n he specal case ha S = fg. From Equaon (6), = w T + X 2S w T + X = w L max 2S L X w : 2S The heorem now follows mmedaely from Equaon (5). The smpler bound (7) saes ha Hedge() does no perform \oo much worse" han he bes sraegy for he sequence. The derence n loss depends on our choce of and on he nal wegh w of each sraegy. If each wegh s se equally so ha w = =N, hen hs bound becomes L Hedge() mn L ln(=) + ln N : (9)? Snce depends only logarhmcally on N, hs bound s reasonable even for a very large number of sraeges. The more complcaed bound (8) s a generalzaon of he smpler bound ha s especally applcable when he number of sraeges s nne. Naurally, for uncounable collecons of sraeges, he sum appearng n Equaon (8) can be replaced by an negral, and he maxmum by a supremum. The bound gven n Equaon (9) can be wren as L Hedge() c mn L + a ln N; (0) where c = ln(=)=(?) and a = =(?). Vovk [8] analyzes predcon algorhms ha have performance bounds of hs form, and proves gh upper and lower bounds for he achevable values of c and a. Usng Vovk's resuls, we can show ha he consans a and c acheved by Hedge() are opmal. Theorem 3 Le B be an algorhm for he on-lne allocaon problem wh an arbrary number of sraeges. Suppose ha here exs posve real numbers a and c such ha for any number of sraeges N and for any sequence of loss vecors `; : : :; `T Then for all 2 (0; ), eher L B c mn L + a ln N: 2S The proof s gven n he appendx. c ln(=)? or a (? ) : 6

7 2.2 How o choose So far, we have analyzed Hedge() for a gven choce of, and we have proved reasonable bounds for any choce of. In pracce, we wll ofen wan o choose so as o maxmally explo any pror knowledge we may have abou he specc problem a hand. The followng lemma wll be helpful for choosng usng he bounds derved above. Lemma 4 Suppose 0 L ~ L and 0 < R ~ R. Le = g( ~ L= ~ R) where g(z) = =( + p 2=z). Then?L ln + R? L + q 2 ~ L ~ R + R: Proof: (Skech) I can be shown ha? ln (? 2 )=(2) for 2 (0; ]. Applyng hs approxmaon and he gven choce of yelds he resul. Lemma 4 can be appled o any of he bounds above snce all of hese bounds have he form gven n he lemma. For example, suppose we have N sraeges, and we also know a pror bound L ~ on he loss of he bes sraegy. Then, combnng Equaon (9) and Lemma 4, we have q L Hedge() mn L + 2~ L ln N + ln N () for = g(~ L= ln N). In general, f we know ahead of me he number of rals T, hen we can use ~ L = T as an upper bound on he cumulave loss of each sraegy. Dvdng boh sdes of Equaon () by T, we oban an explc bound on he rae a whch he average per-ral loss of Hedge() approaches he average loss for he bes sraegy: L Hedge() T mn p L 2 L T + ~ ln N T + ln N T : (2) Snce ~ L T, hs gves a wors case rae of convergence of O p (ln N)=T. However, f ~ L s close o zero, hen he rae of convergence wll be much faser, roughly, O((ln N)=T ). Lemma 4 can also be appled o he oher bounds gven n Theorem 2 o oban analogous resuls. 3 Applcaons The framework descrbed up o hs pon s que general and can be appled n a wde varey of learnng problems. Consder he followng se-up used by Chung [3]. We are gven a decson space, a space of oucomes, and a bounded loss funcon :! [0; ]. (Acually, our resuls requre only ha be bounded, bu, by rescalng, we can assume ha s range s [0; ].) A every me sep, he learnng algorhm selecs a decson 2, receves an oucome! 2, and suers loss ( ;! ). More generally, we may allow he learner o selec a dsrbuon D over he space of decsons, n whch case suers he expeced loss of a decson randomly seleced accordng o D ; ha s, s expeced loss s (D ;! ) where (D;!) = E D [(;!)]: To decde on dsrbuon D, we assume ha he learner has access o a se of N expers. A every me sep, exper produces s own dsrbuon E on, and suers loss (E ;! ). 7

8 The goal of he learner s o combne he dsrbuons produced by he expers so as o suer expeced loss \no much worse" han ha of he bes exper. The resuls of Secon 2 provde a mehod for solvng hs problem. Speccally, we run algorhm Hedge(), reang each exper as a sraegy. A every me sep, Hedge() produces a dsrbuon p on he se of expers whch s used o consruc he mxure dsrbuon D = = p E : For any oucome!, he loss suered by Hedge() wll hen be (D ;! ) = = p (E ;! ): Thus, f we dene ` = (E ;! ) hen he loss suered by he learner s p `,.e., exacly he mxure loss ha was analyzed n Secon 2. Hence, he bounds of Secon 2 can be appled o our curren framework. For nsance, applyng Equaon (), we oban he followng: Theorem 5 For any loss funcon, for any se of expers, and for any sequence of oucomes, he expeced loss of Hedge() f used as descrbed above s a mos = (D ;! ) mn = (E ;! ) + q 2 ~ L ln N + ln N where ~ L T s an assumed bound on he expeced loss of he bes exper, and = g( ~ L= ln N). Example. In he k-ary predcon problem, = = f; 2; : : :; kg, and (;!) s f 6=! and 0 oherwse. In oher words, he problem s o predc a sequence of leers over an alphabe of sze k. The loss funcon s f a msake was made, and 0 oherwse. Thus, (D;!) s he probably (wh respec o D) of a predcon ha dsagrees wh!. The cumulave loss of he learner, or of any exper, s herefore he expeced number of msakes on he enre sequence. So, n hs case, Theorem 2 saes ha he expeced number of msakes of he learnng pt algorhm wll exceed he expeced number of msakes of he bes exper by a mos O ln N, or possbly much less f he loss of he bes exper can be bounded ahead of me. Bounds of hs ype were prevously proved n he bnary case (k = 2) by Llesone and Warmuh [5] usng he same algorhm. Ther algorhm was laer mproved by Vovk [9] and Cesa-Banch e al. [2]. The man resul of hs secon s a proof ha such bounds can be shown o hold for any bounded loss funcon. Example 2. The loss funcon may represen an arbrary marx game, such as \rock, paper, scssors." Here, = = fr; P; Sg, and he loss funcon s dened by he marx:! R P S R 2 0 P 0 2 S 0 2 8

9 The decson represens he learner's play, and he oucome! s he adversary's play; hen (;!), he learner's loss, s f he learner loses he round, 0 f wns he round, and =2 f he round s ed. (For nsance, (S; P) = 0 snce \scssors cu paper.") So he cumulave loss of he learner (or an exper) s he expeced number of losses n a seres of rounds of game play (counng es as half a loss). Our resuls show hen ha, n repeaed play, he expeced number of rounds los by our algorhm wll converge quckly o he expeced number ha would have been los by he bes of he expers (for he parcular sequence of moves ha were acually played by he adversary). Example 3. Suppose ha and are ne, and ha represens a game marx as n he las example. Suppose furher ha we creae one exper for each decson 2 (.e., ha always recommends playng ). In hs case, Theorem 2 mples ha he learner's average perround loss on a sequence of repeaed plays of he game wll converge, a wors, o he value of he game,.e., o he loss ha would have been suered had he learner used he mnmax \opmal" sraegy for he game. Moreover, hs holds rue even f he learner knows nohng a all abou he game ha s beng played (so ha s unknown o he learner), and even f he adversaral opponen has complee knowledge boh of he game ha s beng played and he algorhm ha s beng used by he learner. (See he relaed work of Hannan [0].) Example 4. Suppose ha = s he un ball n R n, and ha (;!) = jj?!jj. Thus, he problem here s o predc he locaon of a pon!, and he loss suered s he Eucldean dsance beween he predced pon and he acual oucome!. Theorem 2 can be appled f probablsc predcons are allowed. However, n hs seng s more naural o requre ha he learner and each exper predc a sngle pon (raher han a measure on he space of possble pons). Essenally, hs s he problem of \rackng" a sequence of pons! ; : : :;! T where he loss funcon measures he dsance o he predced pon. To see how o handle he problem of ndng deermnsc predcons, noce ha he loss funcon (;!) s convex wh respec o : jj(a + (? a) 2 )?!jj ajj?!jj + (? a)jj 2?!jj (3) for any a 2 [0; ] and any! 2. Thus we can do as follows. A me, he learner predcs wh he weghed average of he expers' predcons: = P N = p " where " 2 Rn s he predcon of he h exper a me. Regardless of he oucome!, Equaon (3) mples ha jj?! jj = p jj"?! jj : Snce Theorem 2 provdes an upper bound on he rgh hand sde of hs nequaly, we also oban upper bounds for he lef hand sde. Thus, our resuls n hs case gve explc bounds on he oal error (.e., dsance beween predced and observed pons) for he learner relave o he bes of a eam of expers. In he one-dmensonal case (n = ), hs case was prevously analyzed by Llesone and Warmuh [5], and laer mproved upon by Kvnen and Warmuh [4]. Ths resul depends only on he convexy and he bounded range of he loss funcon (;!) wh respec o. Thus, can also be appled, for example, o he squared-dsance loss funcon (;!) = jj?!jj 2, as well as he log loss funcon (;!) =? ln(!) used by Cover [4] for he desgn of \unversal" nvesmen porfolos. (In hs las case, s he se of probably vecors on n pons, and = [=B; B] n for some consan B >.) 9

10 In many of he cases lsed above, superor algorhms or analyses are known. Alhough weaker n specc cases, should be emphaszed ha our resuls are far more general, and can be appled n sengs ha exhb consderably less srucure, such as he horse-racng example descrbed n he nroducon. 4 Boosng In hs secon we show how he algorhm presened n Secon 2 for he on-lne allocaon problem can be moded o boos he performance of weak learnng algorhms. We very brey revew he PAC learnng model (see, for nsance, Kearns and Vazran [3] for a more dealed descrpon). Le X be a se called he doman. A concep s a Boolean funcon c : X! f0; g. A concep class C s a collecon of conceps. The learner has access o an oracle whch provdes labeled examples of he form (x; c(x)) where x s chosen randomly accordng o some xed bu unknown and arbrary dsrbuon D on he doman X, and c 2 C s he arge concep. Afer some amoun of me, he learner mus oupu a hypohess h : X! [0; ]. The value h(x) can be nerpreed as a randomzed predcon of he label of x ha s wh probably h(x) and 0 wh probably? h(x). (Alhough we assume here ha we have drec access o he bas of hs predcon, our resuls can be exended o he case ha h s nsead a random mappng no f0; g.) The error of he hypohess h s he expeced value E xd (jh(x)? c(x)j) where x s chosen accordng o D. If h(x) s nerpreed as a sochasc predcon, hen hs s smply he probably of an ncorrec predcon. A srong PAC-learnng algorhm s an algorhm ha, gven ; > 0 and access o random examples, oupus wh probably? a hypohess wh error a mos. Furher, he runnng me mus be polynomal n =, = and oher relevan parameers (namely, he \sze" of he examples receved, and he \sze" or \complexy" of he arge concep). A weak PAC-learnng algorhm sases he same condons bu only for =2? where > 0 s eher a consan, or decreases as =p where p s a polynomal n he relevan parameers. We use WeakLearn o denoe a generc weak learnng algorhm. Schapre [6] showed ha any weak learnng algorhm can be ecenly ransformed or \boosed" no a srong learnng algorhm. Laer, Freund [8, 9] presened he \boos-bymajory" algorhm ha s consderably more ecen han Schapre's. Boh algorhms work by callng a gven weak learnng algorhm WeakLearn mulple mes, each me presenng wh a deren dsrbuon over he doman X, and nally combnng all of he generaed hypoheses no a sngle hypohess. The nuve dea s o aler he dsrbuon over he doman X n a way ha ncreases he probably of he \harder" pars of he space, hus forcng he weak learner o generae new hypoheses ha make less msakes on hese pars. An mporan, praccal decency of he boos-by-majory algorhm s he requremen ha he bas of he weak learnng algorhm WeakLearn be known ahead of me. No only s hs wors-case bas usually unknown n pracce, bu he bas ha can be acheved by WeakLearn wll ypcally vary consderably from one dsrbuon o he nex. Unforunaely, he boos-by-majory algorhm canno ake advanage of hypoheses compued by WeakLearn wh error sgncanly smaller han he presumed wors-case bas of =2?. In hs secon, we presen a new boosng algorhm whch was derved from he on-lne allocaon algorhm of Secon 2. Ths new algorhm s very nearly as ecen as boos-bymajory. However, unlke boos-by-majory, he accuracy of he nal hypohess produced by he new algorhm depends on he accuracy of all he hypoheses reurned by WeakLearn, and so s able o more fully explo he power of he weak learnng algorhm. 0

11 Also, hs new algorhm gves a clean mehod for handlng real-valued hypoheses whch ofen are produced by neural neworks and oher learnng algorhms. 4. The new boosng algorhm Alhough boosng has s roos n he PAC model, for he remander of he paper, we adop a more general learnng framework n whch he learner receves examples (x ; y ) chosen randomly accordng o some xed bu unknown dsrbuon P on X Y, where Y s a se of possble labels. As usual, he goal s o learn o predc he label y gven an nsance x. We sar by descrbng our new boosng algorhm n he smples case ha he label se Y consss of jus wo possble labels, Y = f0; g. In laer secons, we gve exensons of he algorhm for more general label ses. Freund [9] descrbes wo frameworks n whch boosng can be appled: boosng by lerng and boosng by samplng. In hs paper, we use he boosng by samplng framework, whch s he naural framework for analyzng \bach" learnng,.e., learnng usng a xed ranng se whch s sored n he compuer's memory. We assume ha a sequence of N ranng examples (labeled nsances) (x ; y ); : : :; (x N ; y N ) s drawn randomly from X Y accordng o dsrbuon P. We use boosng o nd a hypohess h f whch s conssen wh mos of he sample (.e., h f (x ) = y for mos N). In general, a hypohess whch s accurae on he ranng se mgh no be accurae on examples ousde he ranng se; hs problem s somemes referred o as \over-ng." Ofen, however, over- ng can be avoded by resrcng he hypohess o be smple. We wll come back o hs problem n Secon 4.3. The new boosng algorhm s descrbed n Fgure 2. The goal of he algorhm s o nd a nal hypohess wh low error relave o a gven dsrbuon D over he ranng examples. Unlke he dsrbuon P whch s over X Y and s se by \naure," he dsrbuon D s only over he nsances n he ranng se and s conrolled by he learner. Ordnarly, hs dsrbuon wll be se o be unform so ha D() = =N. The algorhm manans a se of weghs w over he ranng examples. On eraon a dsrbuon p s compued by normalzng hese weghs. Ths dsrbuon s fed o he weak learner WeakLearn whch generaes a hypohess h ha (we hope) has small error wh respec o he dsrbuon. Usng he new hypohess h, he boosng algorhm generaes he nex wegh vecor w +, and he process repeas. Afer T such eraons, he nal hypohess h f s oupu. The hypohess h f combnes he oupus of he T weak hypoheses usng a weghed majory voe. We call he algorhm AdaBoos because, unlke prevous algorhms, adjuss adapvely o he errors of he weak hypoheses reurned by WeakLearn. If WeakLearn s a PAC weak learnng algorhm n he sense dened above, hen =2? for all (assumng he examples have been generaed appropraely wh y = c(x ) for some c 2 C). However, such a bound on he error need no be known ahead of me. Our resuls hold for any 2 [0; ], and depend only on he performance of he weak learner on hose dsrbuons ha are acually generaed durng he boosng process. The parameer s chosen as a funcon of and s used for updang he wegh vecor. The updae rule reduces he probably assgned o hose examples on whch he hypohess Some learnng algorhms can be generalzed o use a gven dsrbuon drecly. For nsance, graden based algorhms can use he probably assocaed wh each example o scale he updae sep sze whch s based on he example. If he algorhm canno be generalzed n hs way, he ranng sample can be re-sampled o generae a new se of ranng examples ha s dsrbued accordng o he gven dsrbuon. The compuaon requred o generae each re-sampled example akes O(log N) me.

12 Algorhm AdaBoos Inpu: sequence of N labeled examples h(x ; y ); : : :; (x N ; y N ) dsrbuon D over he N examples weak learnng algorhm WeakLearn neger T specfyng number of eraons Inalze he wegh vecor: w = D() for = ; : : :; N. Do for = ; 2; : : :; T. Se p = w P N= w 2. Call WeakLearn, provdng wh he dsrbuon p ; ge back a hypohess h : X! [0; ]. 3. Calculae he error of h : = P N = p jh (x )? y j. 4. Se = =(? ). 5. Se he new weghs vecor o be w + = w?jh(x )?y j Oupu he hypohess h f (x) = f P T= log h (x) 2 P T= log 0 oherwse : Fgure 2: The adapve boosng algorhm. 2

13 makes a good predcon and ncreases he probably of he examples on whch he predcon s poor. 2 Noe ha AdaBoos, unlke boos-by-majory, combnes he weak hypoheses by summng her probablsc predcons. Drucker, Schapre and Smard [7], n expermens hey performed usng boosng o mprove he performance of a real-valued neural nework, observed ha summng he oucomes of he neworks and hen selecng he bes predcon performs beer han selecng he bes predcon of each nework and hen combnng hem wh a majory rule. I s neresng ha he new boosng algorhm's nal hypohess uses he same combnaon rule ha was observed o be beer n pracce, bu whch prevously lacked heorecal juscaon. Expermens are needed o measure wheher he new algorhm has an advanage n real world applcaons. 4.2 Analyss Comparng Fgures and 2, here s an obvous smlary beween he algorhms Hedge() and AdaBoos. Ths smlary reecs a surprsng \dual" relaonshp beween he on-lne allocaon model and he problem of boosng. Pu anoher way, here s a drec mappng or reducon of he boosng problem o he on-lne allocaon problem. In such a reducon, one mgh naurally expec a correspondence relang he sraeges o he weak hypoheses and he rals (and assocaed loss vecors) o he examples n he ranng se. However, he reducon we have used s reversed: he \sraeges" correspond o he examples, and he rals are assocaed wh he weak hypoheses. Anoher reversal s n he denon of he loss: n Hedge() he loss ` s small f he h sraegy suggess a good acon on he h ral whle n AdaBoos he \loss" ` =? jh (x )? y j appearng n he wegh-updae rule (Sep 5) s small f he h hypohess suggess a bad predcon on he h example. The reason s ha n Hedge() he wegh assocaed wh a sraegy s ncreased f he sraegy s successful whle n AdaBoos he wegh assocaed wh an example s ncreased f he example s \hard." The man echncal derence beween he wo algorhms s ha n AdaBoos he parameer s no longer xed ahead of me bu raher changes a each eraon accordng o. If we are gven ahead of me he nformaon ha =2? for some > 0 and for all = ; : : :; T, hen we could nsead drecly apply algorhm Hedge() and s analyss as follows: Fx o be?, and se ` =? jh (x )? y j, and h f as n AdaBoos, bu wh equal wegh assgned o all T hypoheses. Then p ` s exacly he accuracy of h on dsrbuon p, whch, by assumpon, s a leas =2 +. Also, leng S = f : h f (x ) 6= y g, s sraghforward o show ha f 2 S hen L T = T = ` =? T = jy? h (x )j =? y? T by h f 's denon, and snce y 2 f0; g. Thus, by Theorem 2, T (=2 + ) = = h (x ) =2 p `? ln(p 2S D()) + ( + 2 )(T=2) snce? ln() =? ln(? ) + 2 for 2 [0; =2]. Ths mples ha he error = P 2S D() of h f s a mos e?t 2 =2. 2 Furhermore, f h s Boolean (wh range f0; g), hen can be shown ha hs updae rule exacly removes he advanage of he las hypohess. Tha s, he error of h on dsrbuon p + s exacly =2. 3

14 The boosng algorhm AdaBoos has wo advanages over hs drec applcaon of Hedge(). Frs, by gvng a more rened analyss and choce of, we oban a sgncanly superor bound on he error. Second, he algorhm does no requre pror knowledge of he accuracy of he hypoheses ha WeakLearn wll generae. Insead, measures he accuracy of h a each eraon and ses s parameers accordngly. The updae facor decreases wh whch causes he derence beween he dsrbuons p and p + o ncrease. Decreasng also ncreases he wegh ln(= ) whch s assocaed wh h n he nal hypohess. Ths makes nuve sense: more accurae hypoheses cause larger changes n he generaed dsrbuons and have more nuence on he oucome of he nal hypohess. We now gve our analyss of he performance of AdaBoos. Noe ha hs heorem apples also f, for some hypoheses, =2. Theorem 6 Suppose he weak learnng algorhm WeakLearn, when called by AdaBoos, generaes hypoheses wh errors ; : : :; T (as dened n Sep 3 of Fgure 2.) Then he error = Pr D [h f (x ) 6= y ] of he nal hypohess h f oupu by AdaBoos s bounded above by 2 T T Y = q (? ): (4) Proof: We adap he man argumens from Lemma and Theorem 2. We use p and w as hey are dened n Fgure 2. Smlar o Equaon (4), he updae rule gven n Sep 5 n Fgure 2 mples ha! w(?(? )(? jh (x )? y j)) = (? (? )(? )) : = w + = = w?jh(x )?y j = Combnng hs nequaly over = ; : : :; T, we ge ha = w T + TY = = w (5) (? (? )(? )) : (6) The nal hypohess h f, as dened n Fgure 2, makes a msake on nsance only f TY =?jh(x )?y j TY =!?=2 (7) (snce y 2 f0; g). The nal wegh of any nsance s w T + = D() TY =?jh(x )?y j : (8) Combnng Equaons (7) and (8) we can lower bound he sum of he nal weghs by he sum of he nal weghs of he examples on whch h f s ncorrec: = w T + X w T + :h f (x )6=y X :h f (x )6=y D() A T Y =! =2 = TY =! =2 (9) 4

15 where s he error of h f. Combnng Equaons (6) and (9), we ge ha TY =? (? )(? ) p : (20) As all he facors n he produc are posve, we can mnmze he rgh hand sde by mnmzng each facor separaely. Seng he dervave of he h facor o zero, we nd ha he choce of whch mnmzes he rgh hand sde s = =(? ). Pluggng hs choce of no Equaon (20) we ge Equaon (4), compleng he proof. The bound on he error gven n Theorem 6, can also be wren n he form TY q!!? 4 2 = exp? KL(=2 jj =2? ) exp?2 (2) = = where KL(a jj b) = a ln(a=b) +(? a) ln((? a)=(? b)) s he Kullback-Lebler dvergence, and where has been replaced by =2?. In he case where he errors of all he hypoheses are equal o =2?, Equaon (2) smples o? 4 2 T =2 = exp (?T KL(=2 jj =2? )) exp?2t 2 : (22) Ths s a form of he Cherno bound for he probably ha less han T=2 con ps urn ou \heads" n T osses of a random con whose probably for \heads" s =2?. Ths bound has he same asympoc behavor as he bound gven for he boos-by-majory algorhm [9]. From Equaon (22) we ge ha he number of eraons of he boosng algorhm ha s sucen o acheve error of h f s T = KL(=2 jj =2? ) ln = 2 2 ln : (23) 2 Noe, however, ha when he errors of he hypoheses generaed by WeakLearn are no unform, Theorem 6 mples ha he nal error depends on he error of all of he weak hypoheses. Prevous bounds on he errors of boosng algorhms depended only on he maxmal error of he weakes hypohess and gnored he advanage ha can be ganed from he hypoheses whose errors are smaller. Ths advanage seems o be very relevan o praccal applcaons of boosng, because here one expecs he error of he learnng algorhm o ncrease as he dsrbuons fed o WeakLearn shf more and more away from he arge dsrbuon. 4.3 Generalzaon error We now come back o dscussng he error of he nal hypohess ousde he ranng se. Theorem 6 guaranees ha he error of h f on he sample s small; however, he quany ha neress us s he generalzaon error of h f, whch s he error of h f over he whole nsance space X; ha s, g = Pr (x;y)p [h f (x) 6= y]. In order o make g close o he emprcal error ^ on he ranng se, we have o resrc he choce of h f n some way. One naural way of dong hs n he conex of boosng s o resrc he weak learner o choose s hypoheses from some smple class of funcons and resrc T, he number of weak hypoheses ha are combned o make h f. The choce of he class of weak hypoheses s specc o he learnng problem a hand and should reec our knowledge abou he properes of he unknown concep. As for he choce of T, varous general mehods can be devsed. One popular mehod s o use 5

16 an upper bound on he VC-dmenson of he concep class. Ths mehod s somemes called \srucural rsk mnmzaon." See Vapnk's book [7] for an exensve dscusson of he heory of srucural rsk mnmzaon. For our purposes, we quoe Vapnk's Theorem 6.7: Theorem 7 (Vapnk) Le H be a class of bnary funcons over some doman X. Le d be he VC-dmenson of H. Le P be a dsrbuon over he pars X f0; g. For h 2 H, dene he (generalzaon) error of h wh respec o P o be g (h) : = Pr (x;y)p [h(x) 6= y]: Le S = f(x ; y ); : : :; (x N ; y N )g be a sample (ranng se) of N ndependen random examples drawn from X f0; g accordng o P. Dene he emprcal error of h wh respec o he sample S o be ^(h) = : jf : h(x ) 6= y gj : N Then, for any > 0 we have ha 2 6 vu u Pr 6 4 9h 2 H : j^(h)? d g(h)j > 2 ln 2Nd + + ln 9 N where he probably s compued wh respec o he random choce of he sample S. Le : R! f0; g be dened by (x) = f x 0 0 oherwse and, for any class H of funcons, le T (H) be he class of all funcons dened as a lnear hreshold of T funcons n H: T (H) = ( = a h? b! : b; a ; : : :; a T 2 R; h ; : : :; h T 2 H Clearly, f all hypoheses generaed by WeakLearn belong o some class H, hen he nal hypohess of AdaBoos, afer T rounds of boosng, belongs o T (H). Thus, he nex heorem provdes an upper bound on he VC-dmenson of he class of nal hypoheses generaed by AdaBoos n erms of he weak hypohess class. Theorem 8 Le H be a class of bnary funcons of VC-dmenson d 2. Then he VCdmenson of T (H) s a mos 2(d + )(T + ) log 2 (e(t + )). Therefore, f he hypoheses generaed by WeakLearn are chosen from a class of VCdmenson d 2, hen he nal hypoheses generaed by AdaBoos afer T eraons belong o a class of VC-dmenson a mos 2(d + )(T + ) log 2 [e(t + )]. Proof: We use a resul abou he VC-dmenson of compuaon neworks proved by Baum and Haussler []. We can vew he nal hypohess oupu by AdaBoos as a funcon ha s compued by a wo-layer feed-forward nework where he compuaon uns of he rs layer are he weak hypoheses and he compuaon un of he second layer s he lnear hreshold funcon whch combnes he weak hypoheses. The VC-dmenson of he se of lnear hreshold 6 ) :

17 funcons over R T s T + [20]. Thus he sum over all compuaon uns of he VC-dmensons of he classes of funcons assocaed wh each un s T d + (T + ) < (T + )(d + ). Baum and Haussler's Theorem [] mples ha he number of deren funcons ha can be realzed by h 2 T (H) when he doman s resrced o a se of sze m s a mos ((T + )em=(t + )(d + )) (T +)(d+). If d 2, T and we se m = d2(t + )(d + ) log 2 [e(t + )]e, hen he number of realzable funcons s smaller han 2 m whch mples ha he VC-dmenson of T (H) s smaller han m. Followng he gudelnes of srucural rsk mnmzaon we can do he followng (assumng we know a reasonable upper bound on he VC-dmenson of he class of weak hypoheses). Le h T f be he hypohess generaed by runnng AdaBoos for T eraons. By combnng he observed emprcal error of h T f wh he bounds gven n Theorems 7 and 8, we can compue an upper bound on he generalzaon error of h T f for all T. We would hen selec he hypohess h T f ha mnmzes he guaraneed upper bound. Whle srucural rsk mnmzaon s a mahemacally sound mehod, he upper bounds on g ha are generaed n hs way mgh be larger han he acual value and so he chosen number of eraons T mgh be much smaller han he opmal value, leadng o nferor performance. A smple alernave s o use \cross-valdaon" n whch a fracon of he ranng se s lef ousde he se used o generae h f as he so-called \valdaon" se. The value of T s hen chosen o be he one for whch he error of he nal hypohess on he valdaon se s mnmzed. (For an exensve analyss of he relaons beween deren mehods for selecng model complexy n learnng, see Kearns e al. [2].) Some nal expermens usng AdaBoos on real-world problems conduced by ourselves and Drucker and Cores [6] ndcae ha AdaBoos ends no o over-; on many problems, even afer hundreds of rounds of boosng, he generalzaon error connues o drop, or a leas does no ncrease. On problems where over-ng can occur, cross valdaon seems o be a reasonable mehod for ndng a good value of T. 4.4 A Bayesan nerpreaon The nal hypohess generaed by AdaBoos s closely relaed o one suggesed by a Bayesan analyss. As usual, we assume ha examples (x; y) are beng generaed accordng o some dsrbuon P on X f0; g; all probables n hs secon are aken wh respec o P. Suppose we are gven a se of f0; g-valued hypoheses h ; : : :; h T and ha our goal s o combne he predcons of hese hypoheses n he opmal way. Then, gven an nsance x and he hypohess predcons h (x), he Bayes opmal decson rule says ha we should predc he label wh he hghes lkelhood, gven he hypohess values,.e., we should predc f Pr [y = j h (x); : : :; h T (x)] > Pr [y = 0 j h (x); : : :; h T (x)]; and oherwse we should predc 0. Ths rule s especally easy o compue f we assume ha he errors of he deren hypoheses are ndependen of one anoher and of he arge concep, ha s, f we assume ha he even h (x) 6= y s condonally ndependen of he acual label y and he predcons of all he oher hypoheses h (x); : : :; h? (x); h + (x); : : :; h T (x). In hs case, by applyng Bayes rule, we can rewre he Bayes opmal decson rule n a parcularly smple form n whch we predc f Pr [y = ] Y :h (x)=0 Y :h (x)= (? ) > Pr [y = 0] Y :h (x)=0 (? ) Y :h (x)= ; 7

18 and 0 oherwse. Here = Pr [h (x) 6= y]. We add o he se of hypoheses he rval hypohess h 0 whch always predcs he value. We can hen replace Pr [y = 0] by 0. Takng he logarhm of boh sdes n hs nequaly and rearrangng he erms, we nd ha he Bayes opmal decson rule s dencal o he combnaon rule ha s generaed by AdaBoos. If he errors of he deren hypoheses are dependen, hen he Bayes opmal decson rule becomes much more complcaed. However, n pracce, s common o use he smple rule descrbed above even when here s no juscaon for assumng ndependence. (Ths s somemes called \nave Bayes.") An neresng and more prncpled alernave o hs pracce would be o use he algorhm AdaBoos o nd a combnaon rule whch, by Theorem 6, has a guaraneed non-rval accuracy. 4.5 Improvng he error bound We show n hs secon how he bound gven n Theorem 6 can be mproved by a facor of wo. The man dea of hs mprovemen s o replace he \hard" f0; g-valued decson used by h f by a \sof" hreshold. To be more precse, le r(x ) = P T= log h (x ) P T= log be a weghed average of he weak hypoheses h. We wll here consder nal hypoheses of he form h f (x ) = F (r(x )) where F : [0; ]! [0; ]. For he verson of AdaBoos gven n Fgure 2, F (r) s he hard hreshold ha equals f r =2 and 0 oherwse. In hs secon, we wll nsead use sof hreshold funcons ha ake values n [0; ]. As menoned above, when h f (x ) 2 [0; ], we can nerpre h f as a randomzed hypohess and h f (x ) as he probably of predcng. Then he error E D [jh f (x )? y j] s smply he probably of an ncorrec predcon. Theorem 9 Le ; : : :; T be as n Theorem 6, and le r(x ) be as dened above. Le he moded nal hypohess be dened by h f = F (r(x )) where F sases he followng for r 2 [0; ]: F (? r) =? F (r); and F (r) 2 TY =! =2?r: Then he error of h f s bounded above by 2 T? T Y = q (? ): For nsance, can be shown ha he sgmod funcon F (r) = he condons of he heorem. Proof: By our assumpons on F, he error of h f s = = = = D() jf (r(x ))? y j D()F (jr(x )? y j) + Q T = 2r?? sases 8

19 2 = D() TY = =2?jr(x )?y j! : Snce y 2 f0; g and by denon of r(x ), hs mples ha 2 = 2 2 = = TY = D() w T + TY =! TY = =2?jh(x )?y j?=2! (? (? )(? ))?=2 The las wo seps follow from Equaons (8) and (6), respecvely. The heorem now follows from our choce of. : 5 Boosng for mul-class and regresson problems So far, we have resrced our aenon o bnary classcaon problems n whch he se of labels Y conans only wo elemens. In hs secon, we descrbe wo possble exensons of AdaBoos o he mul-class case n whch Y s any ne se of class labels. We also gve an exenson for a regresson problem n whch Y s a real bounded nerval. We sar wh he mulple-label classcaon problem. Le Y = f; 2; : : :; kg be he se of possble labels. The boosng algorhms we presen oupu hypoheses h f : X! Y, and he error of he nal hypohess s measured n he usual way as he probably of an ncorrec predcon. The rs exenson of AdaBoos, whch we call AdaBoos.M, s he mos drec. The weak learner generaes hypoheses whch assgn o each nsance one of he k possble labels. We requre ha each weak hypohess have predcon error less han =2 (wh respec o he dsrbuon on whch was raned). Provded hs requremen can be me, we are able prove ha he error of he combned nal hypohess decreases exponenally, as n he bnary case. Inuvely, however, hs requremen on he performance of he weak learner s sronger han mgh be desred. In he bnary case (k = 2), a random guess wll be correc wh probably =2, bu when k > 2, he probably of a correc random predcon s only =k < =2. Thus, our requremen ha he accuracy of he weak hypohess be greaer han =2 s sgncanly sronger han smply requrng ha he weak hypohess perform beer han random guessng. In fac, when he performance of he weak learner s measured only n erms of error rae, hs dculy s unavodable as s shown by he followng nformal example (also presened by Schapre [6]): Consder a learnng problem where Y = f0; ; 2g and suppose ha s \easy" o predc wheher he label s 2 bu \hard" o predc wheher he label s 0 or. Then a hypohess whch predcs correcly whenever he label s 2 and oherwse guesses randomly beween 0 and s guaraneed o be correc a leas half of he me (sgncanly beang he =3 accuracy acheved by guessng enrely a random). On he oher hand, boosng hs learner o an arbrary accuracy s nfeasble snce we assumed ha s hard o dsngush 0- and -labeled nsances. As a more naural example of hs problem, consder classcaon of handwren dgs n an OCR applcaon. I may be easy for he weak learner o ell ha a parcular mage of a 9

20 \7" s no a \0" bu hard o ell for sure f s a \7" or a \9". Par of he problem here s ha, alhough he boosng algorhm can focus he aenon of he weak learner on he harder examples, has no way of forcng he weak learner o dscrmnae beween parcular labels ha may be especally hard o dsngush. In our second verson of mul-class boosng, we aemp o overcome hs dculy by exendng he communcaon beween he boosng algorhm and he weak learner. Frs, we allow he weak learner o generae more expressve hypoheses whose oupu s a vecor n [0; ] k, raher han a sngle label n Y. Inuvely, he yh componen of hs vecor represens a \degree of belef" ha he correc label s y. The componens wh large values (close o ) correspond o hose labels consdered o be plausble. Lkewse, labels consdered mplausble are assgned a small value (near 0), and quesonable labels may be assgned a value near =2. If several labels are consdered plausble (or mplausble), hen hey all may be assgned large (or small) values. Whle we gve he weak learnng algorhm more expressve power, we also place a more complex requremen on he performance of he weak hypoheses. Raher han usng he usual predcon error, we ask ha he weak hypoheses do well wh respec o a more sophscaed error measure ha we call he pseudo-loss. Ths pseudo-loss vares from example o example, and from one round o he nex. On each eraon, he pseudo-loss funcon s suppled o he weak learner by he boosng algorhm, along wh he dsrbuon on he examples. By manpulang he pseudo-loss funcon, he boosng algorhm can focus he weak learner on he labels ha are hardes o dscrmnae. The boosng algorhm AdaBoos.M2, descrbed n Secon 5.2, s based on hese deas and acheves boosng f each weak hypohess has pseudoloss slghly beer han random guessng (wh respec o he pseudo-loss measure ha was suppled o he weak learner). In addon o he wo exensons descrbed n hs paper, we menon an alernave, sandard approach whch would be o conver he gven mul-class problem no several bnary problems, and hen o use boosng separaely on each of he bnary problems. There are several sandard ways of makng such a converson, one of he mos successful beng he errorcorrecng oupu codng approach advocaed by Deerch and Bakr [5]. Fnally, n Secon 5.3 we exend AdaBoos o boosng regresson algorhms. In hs case Y = [0; ], and he error of a hypohess s dened as E (x;y)p (h(x)? y) 2. We descrbe a boosng algorhm AdaBoos.R whch, usng mehods smlar o hose used n AdaBoos.M2, booss he performance of a weak regresson algorhm. 5. Frs mul-class exenson In our rs and mos drec exenson o he mul-class case, he goal of he weak learner s o generae on round a hypohess h : X! Y wh low classcaon error : = Prp [h (x ) 6= y ]. Our exended boosng algorhm, called AdaBoos.M, s shown n Fgure 3, and ders only slghly from AdaBoos. The man derence s n he replacemen of he error jh (x )? y j for he bnary case by [h (x ) 6= y ] where, for any predcae, we dene [ ] o be f holds and 0 oherwse. Also, he nal hypohess h f, for a gven nsance x, now oupus he label y ha maxmzes he sum of he weghs of he weak hypoheses predcng ha label. In he case of bnary classcaon (k = 2), a weak hypohess h wh error sgncanly larger han =2 s of equal value o one wh error sgncanly less han =2 snce h can be replaced by? h. However, for k > 2, a hypohess h wh error =2 s useless o he boosng algorhm. If such a weak hypohess s reurned by he weak learner, our algorhm 20

21 Algorhm AdaBoos.M Inpu: sequence of N examples h(x ; y ); : : :; (x N ; y N ) wh labels y 2 Y = f; : : :; kg dsrbuon D over he examples weak learnng algorhm WeakLearn neger T specfyng number of eraons Inalze he wegh vecor: w = D() for = ; : : :; N. Do for = ; 2; : : :; T. Se p = w P N = w 2. Call WeakLearn, provdng wh he dsrbuon p ; ge back a hypohess h : X! Y. 3. Calculae he error of h : = P N = p [h (x ) 6= y ]. If > =2, hen se T =? and abor loop. 4. Se = =(? ). 5. Se he new weghs vecor o be w + = w? [h (x )6=y ] Oupu he hypohess h f (x) = arg max y2y = log [h (x) = y ]: Fgure 3: A rs mul-class exenson of AdaBoos. 2

22 smply hals, usng only he weak hypoheses ha were already compued. Theorem 0 Suppose he weak learnng algorhm WeakLearn, when called by AdaBoos.M, generaes hypoheses wh errors ; : : :; T, where s as dened n Fgure 3. Assume each =2. Then he error = Pr D [h f (x ) 6= y ] of he nal hypohess h f oupu by AdaBoos.M s bounded above by 2 T T Y = q (? ): Proof: To prove hs heorem, we reduce our seup for AdaBoos.M o an nsanaon of AdaBoos, and hen apply Theorem 6. For clary, we mark wh ldes varables n he reduced AdaBoos space. For each of he gven examples (x ; y ), we dene an AdaBoos example (~x ; ~y ) n whch ~x = and ~y = 0. We dene he AdaBoos dsrbuon D ~ over examples o be equal o he AdaBoos.M dsrbuon D. On he h round, we provde AdaBoos wh a hypohess ~ h dened by he rule ~h () = [h (x ) 6= y ] n erms of he h hypohess h whch was reurned o AdaBoos.M by WeakLearn. Gven hs seup, can be easly proved by nducon on he number of rounds ha he wegh vecors, dsrbuons and errors compued by AdaBoos and AdaBoos.M are dencal so ha ~w = w, ~p = p, ~ = and ~ =. Suppose ha AdaBoos.M's nal hypohess h f makes a msake on nsance so ha h f (x ) 6= y. Then, by denon of h f, = where = ln(= ). Ths mples [h (x ) = y ] = = [h (x ) = y ] 2 [h (x ) = h f (x )] usng he fac ha each 0 snce =2. By denon of ~ h, hs mples = ~ h () 2 so h ~ f () = by denon of he nal AdaBoos hypohess. Therefore, h Pr D [h f (x ) 6= y ] Pr ~hf D () = h Snce each AdaBoos nsance has a 0-label, Pr ~hf D () = s exacly he error of ~ h f. Applyng Theorem 6, we can oban a bound on hs error, compleng he proof. I s possble, for hs verson of he boosng algorhm, o allow hypoheses whch generae for each x, no only a predced class label h(x) 2 Y, bu also a \condence" (x) 2 [0; ]. The learner hen suers loss =2? (x)=2 f s predcon s correc and =2 + (x)=2 oherwse. (Deals omed.) 22 = = ; ; :

23 5.2 Second mul-class exenson In hs secon we descrbe a second alernave exenson of AdaBoos o he case where he label space Y s ne. Ths exenson requres more elaborae communcaon beween he boosng algorhm and he weak learnng algorhm. The advanage of dong hs s ha gves he weak learner more exbly n makng s predcons. In parcular, somemes enables he weak learner o make useful conrbuons o he accuracy of he nal hypohess even when he weak hypohess does no predc he correc label wh probably greaer han =2. As descrbed above, he weak learner generaes hypoheses whch have he form h : X Y! [0; ]. Roughly speakng, h(x; y) measures he degree o whch s beleved ha y s he correc label assocaed wh nsance x. If, for a gven x, h(x; y) aans he same value for all y hen we say ha he hypohess s unnformave on nsance x. On he oher hand, any devaon from src equaly s poenally nformave, because predcs some labels o be more plausble han ohers. As wll be seen, any such nformaon s poenally useful for he boosng algorhm. Below, we formalze he goal of he weak learner by denng a pseudo-loss whch measures he goodness of he weak hypoheses. To movae our denon, we rs consder he followng seup. For a xed ranng example (x ; y ), we use a gven hypohess h o answer k? bnary quesons. For each of he ncorrec labels y 6= y we ask he queson: \Whch s he label of x : y or y?" In oher words, we ask ha he correc label y be dscrmnaed from he ncorrec label y. Assume momenarly ha h only akes values n f0; g. Then f h(x ; y) = 0 and h(x ; y ) =, we nerpre h's answer o he queson above o be y (snce h deems y o be a plausble label for x, bu y s consdered mplausble). Lkewse, f h(x ; y) = and h(x ; y ) = 0 hen he answer s y. If h(x ; y) = h(x ; y ), hen one of he wo answers s chosen unformly a random. In he more general case ha h akes values n [0; ], we nerpre h(x; y) as a randomzed decson for he procedure above. Tha s, we rs choose a random b b(x; y) whch s wh probably h(x; y) and 0 oherwse. We hen apply he above procedure o he sochascally chosen bnary funcon b. The probably of choosng he ncorrec answer y o he queson above s Pr [b(x ; y ) = 0 ^ b(x ; y) = ] + 2 Pr [b(x ; y ) = b(x ; y)] = 2 (? h(x ; y ) + h(x ; y)): If he answers o all k? quesons are consdered equally mporan, hen s naural o dene he loss of he hypohess o be he average, over all k? quesons, of he probably of an ncorrec answer: k? X y6=y 2 (? h(x ; y ) + h(x ; y)) = 2 h(x ; y ) + k? X h(x ; y) A : (24) y6=y However, as was dscussed n he nroducon o Secon 5, deren dscrmnaon quesons are lkely o have deren mporance n deren suaons. For example, consderng he OCR problem descrbed earler, mgh be ha a some pon durng he boosng process, some example of he dg \7" has been recognzed as beng eher a \7" or a \9". A hs sage he queson ha dscrmnaes beween \7" (he correc label) and \9" s clearly much more mporan han he oher egh quesons ha dscrmnae \7" from he oher dgs. 23

A decision-theoretic generalization of on-line learning. and an application to boosting. AT&T Labs. 180 Park Avenue. Florham Park, NJ 07932

A decision-theoretic generalization of on-line learning. and an application to boosting. AT&T Labs. 180 Park Avenue. Florham Park, NJ 07932 A decson-heorec generalzaon of on-lne learnng and an applcaon o boosng Yoav Freund Rober E. Schapre AT&T Labs 80 Park Avenue Florham Park, NJ 07932 fyoav, schapreg@research.a.com December 9, 996 Absrac

More information

Introduction to Boosting

Introduction to Boosting Inroducon o Boosng Cynha Rudn PACM, Prnceon Unversy Advsors Ingrd Daubeches and Rober Schapre Say you have a daabase of news arcles, +, +, -, -, +, +, -, -, +, +, -, -, +, +, -, + where arcles are labeled

More information

Variants of Pegasos. December 11, 2009

Variants of Pegasos. December 11, 2009 Inroducon Varans of Pegasos SooWoong Ryu bshboy@sanford.edu December, 009 Youngsoo Cho yc344@sanford.edu Developng a new SVM algorhm s ongong research opc. Among many exng SVM algorhms, we wll focus on

More information

Solution in semi infinite diffusion couples (error function analysis)

Solution in semi infinite diffusion couples (error function analysis) Soluon n sem nfne dffuson couples (error funcon analyss) Le us consder now he sem nfne dffuson couple of wo blocks wh concenraon of and I means ha, n a A- bnary sysem, s bondng beween wo blocks made of

More information

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS R&RATA # Vol.) 8, March FURTHER AALYSIS OF COFIDECE ITERVALS FOR LARGE CLIET/SERVER COMPUTER ETWORKS Vyacheslav Abramov School of Mahemacal Scences, Monash Unversy, Buldng 8, Level 4, Clayon Campus, Wellngon

More information

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany Herarchcal Markov Normal Mxure models wh Applcaons o Fnancal Asse Reurns Appendx: Proofs of Theorems and Condonal Poseror Dsrbuons John Geweke a and Gann Amsano b a Deparmens of Economcs and Sascs, Unversy

More information

( ) () we define the interaction representation by the unitary transformation () = ()

( ) () we define the interaction representation by the unitary transformation () = () Hgher Order Perurbaon Theory Mchael Fowler 3/7/6 The neracon Represenaon Recall ha n he frs par of hs course sequence, we dscussed he chrödnger and Hesenberg represenaons of quanum mechancs here n he chrödnger

More information

CS286.2 Lecture 14: Quantum de Finetti Theorems II

CS286.2 Lecture 14: Quantum de Finetti Theorems II CS286.2 Lecure 14: Quanum de Fne Theorems II Scrbe: Mara Okounkova 1 Saemen of he heorem Recall he las saemen of he quanum de Fne heorem from he prevous lecure. Theorem 1 Quanum de Fne). Le ρ Dens C 2

More information

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim Korean J. Mah. 19 (2011), No. 3, pp. 263 272 GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS Youngwoo Ahn and Kae Km Absrac. In he paper [1], an explc correspondence beween ceran

More information

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!) i+1,q - [(! ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL The frs hng o es n wo-way ANOVA: Is here neracon? "No neracon" means: The man effecs model would f. Ths n urn means: In he neracon plo (wh A on he horzonal

More information

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue.

. The geometric multiplicity is dim[ker( λi. number of linearly independent eigenvectors associated with this eigenvalue. Lnear Algebra Lecure # Noes We connue wh he dscusson of egenvalues, egenvecors, and dagonalzably of marces We wan o know, n parcular wha condons wll assure ha a marx can be dagonalzed and wha he obsrucons

More information

Clustering (Bishop ch 9)

Clustering (Bishop ch 9) Cluserng (Bshop ch 9) Reference: Daa Mnng by Margare Dunham (a slde source) 1 Cluserng Cluserng s unsupervsed learnng, here are no class labels Wan o fnd groups of smlar nsances Ofen use a dsance measure

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Lnear Response Theory: The connecon beween QFT and expermens 3.1. Basc conceps and deas Q: ow do we measure he conducvy of a meal? A: we frs nroduce a weak elecrc feld E, and hen measure

More information

An introduction to Support Vector Machine

An introduction to Support Vector Machine An nroducon o Suppor Vecor Machne 報告者 : 黃立德 References: Smon Haykn, "Neural Neworks: a comprehensve foundaon, second edon, 999, Chaper 2,6 Nello Chrsann, John Shawe-Tayer, An Inroducon o Suppor Vecor Machnes,

More information

CHAPTER 10: LINEAR DISCRIMINATION

CHAPTER 10: LINEAR DISCRIMINATION CHAPER : LINEAR DISCRIMINAION Dscrmnan-based Classfcaon 3 In classfcaon h K classes (C,C,, C k ) We defned dscrmnan funcon g j (), j=,,,k hen gven an es eample, e chose (predced) s class label as C f g

More information

Relative controllability of nonlinear systems with delays in control

Relative controllability of nonlinear systems with delays in control Relave conrollably o nonlnear sysems wh delays n conrol Jerzy Klamka Insue o Conrol Engneerng, Slesan Techncal Unversy, 44- Glwce, Poland. phone/ax : 48 32 37227, {jklamka}@a.polsl.glwce.pl Keywor: Conrollably.

More information

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL Sco Wsdom, John Hershey 2, Jonahan Le Roux 2, and Shnj Waanabe 2 Deparmen o Elecrcal Engneerng, Unversy o Washngon, Seale, WA, USA

More information

Lecture 6: Learning for Control (Generalised Linear Regression)

Lecture 6: Learning for Control (Generalised Linear Regression) Lecure 6: Learnng for Conrol (Generalsed Lnear Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure 6: RLSC - Prof. Sehu Vjayakumar Lnear Regresson

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4 CS434a/54a: Paern Recognon Prof. Olga Veksler Lecure 4 Oulne Normal Random Varable Properes Dscrmnan funcons Why Normal Random Varables? Analycally racable Works well when observaon comes form a corruped

More information

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue.

. The geometric multiplicity is dim[ker( λi. A )], i.e. the number of linearly independent eigenvectors associated with this eigenvalue. Mah E-b Lecure #0 Noes We connue wh he dscusson of egenvalues, egenvecors, and dagonalzably of marces We wan o know, n parcular wha condons wll assure ha a marx can be dagonalzed and wha he obsrucons are

More information

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS THE PREICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS INTROUCTION The wo dmensonal paral dfferenal equaons of second order can be used for he smulaon of compeve envronmen n busness The arcle presens he

More information

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model Probablsc Model for Tme-seres Daa: Hdden Markov Model Hrosh Mamsuka Bonformacs Cener Kyoo Unversy Oulne Three Problems for probablsc models n machne learnng. Compung lkelhood 2. Learnng 3. Parsng (predcon

More information

Robust and Accurate Cancer Classification with Gene Expression Profiling

Robust and Accurate Cancer Classification with Gene Expression Profiling Robus and Accurae Cancer Classfcaon wh Gene Expresson Proflng (Compuaonal ysems Bology, 2005) Auhor: Hafeng L, Keshu Zhang, ao Jang Oulne Background LDA (lnear dscrmnan analyss) and small sample sze problem

More information

Epistemic Game Theory: Online Appendix

Epistemic Game Theory: Online Appendix Epsemc Game Theory: Onlne Appendx Edde Dekel Lucano Pomao Marcano Snscalch July 18, 2014 Prelmnares Fx a fne ype srucure T I, S, T, β I and a probably µ S T. Le T µ I, S, T µ, βµ I be a ype srucure ha

More information

Graduate Macroeconomics 2 Problem set 5. - Solutions

Graduate Macroeconomics 2 Problem set 5. - Solutions Graduae Macroeconomcs 2 Problem se. - Soluons Queson 1 To answer hs queson we need he frms frs order condons and he equaon ha deermnes he number of frms n equlbrum. The frms frs order condons are: F K

More information

Computing Relevance, Similarity: The Vector Space Model

Computing Relevance, Similarity: The Vector Space Model Compung Relevance, Smlary: The Vecor Space Model Based on Larson and Hears s sldes a UC-Bereley hp://.sms.bereley.edu/courses/s0/f00/ aabase Managemen Sysems, R. Ramarshnan ocumen Vecors v ocumens are

More information

Lecture VI Regression

Lecture VI Regression Lecure VI Regresson (Lnear Mehods for Regresson) Conens: Lnear Mehods for Regresson Leas Squares, Gauss Markov heorem Recursve Leas Squares Lecure VI: MLSC - Dr. Sehu Vjayakumar Lnear Regresson Model M

More information

On One Analytic Method of. Constructing Program Controls

On One Analytic Method of. Constructing Program Controls Appled Mahemacal Scences, Vol. 9, 05, no. 8, 409-407 HIKARI Ld, www.m-hkar.com hp://dx.do.org/0.988/ams.05.54349 On One Analyc Mehod of Consrucng Program Conrols A. N. Kvko, S. V. Chsyakov and Yu. E. Balyna

More information

FTCS Solution to the Heat Equation

FTCS Solution to the Heat Equation FTCS Soluon o he Hea Equaon ME 448/548 Noes Gerald Reckenwald Porland Sae Unversy Deparmen of Mechancal Engneerng gerry@pdxedu ME 448/548: FTCS Soluon o he Hea Equaon Overvew Use he forward fne d erence

More information

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer

12d Model. Civil and Surveying Software. Drainage Analysis Module Detention/Retention Basins. Owen Thornton BE (Mech), 12d Model Programmer d Model Cvl and Surveyng Soware Dranage Analyss Module Deenon/Reenon Basns Owen Thornon BE (Mech), d Model Programmer owen.hornon@d.com 4 January 007 Revsed: 04 Aprl 007 9 February 008 (8Cp) Ths documen

More information

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005 Dynamc Team Decson Theory EECS 558 Proec Shruvandana Sharma and Davd Shuman December 0, 005 Oulne Inroducon o Team Decson Theory Decomposon of he Dynamc Team Decson Problem Equvalence of Sac and Dynamc

More information

Robustness Experiments with Two Variance Components

Robustness Experiments with Two Variance Components Naonal Insue of Sandards and Technology (NIST) Informaon Technology Laboraory (ITL) Sascal Engneerng Dvson (SED) Robusness Expermens wh Two Varance Componens by Ana Ivelsse Avlés avles@ns.gov Conference

More information

( ) [ ] MAP Decision Rule

( ) [ ] MAP Decision Rule Announcemens Bayes Decson Theory wh Normal Dsrbuons HW0 due oday HW o be assgned soon Proec descrpon posed Bomercs CSE 90 Lecure 4 CSE90, Sprng 04 CSE90, Sprng 04 Key Probables 4 ω class label X feaure

More information

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model BGC1: Survval and even hsory analyss Oslo, March-May 212 Monday May 7h and Tuesday May 8h The addve regresson model Ørnulf Borgan Deparmen of Mahemacs Unversy of Oslo Oulne of program: Recapulaon Counng

More information

Lecture 11 SVM cont

Lecture 11 SVM cont Lecure SVM con. 0 008 Wha we have done so far We have esalshed ha we wan o fnd a lnear decson oundary whose margn s he larges We know how o measure he margn of a lnear decson oundary Tha s: he mnmum geomerc

More information

Volatility Interpolation

Volatility Interpolation Volaly Inerpolaon Prelmnary Verson March 00 Jesper Andreasen and Bran Huge Danse Mares, Copenhagen wan.daddy@danseban.com brno@danseban.com Elecronc copy avalable a: hp://ssrn.com/absrac=69497 Inro Local

More information

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

Time-interval analysis of β decay. V. Horvat and J. C. Hardy Tme-nerval analyss of β decay V. Horva and J. C. Hardy Work on he even analyss of β decay [1] connued and resuled n he developmen of a novel mehod of bea-decay me-nerval analyss ha produces hghly accurae

More information

Cubic Bezier Homotopy Function for Solving Exponential Equations

Cubic Bezier Homotopy Function for Solving Exponential Equations Penerb Journal of Advanced Research n Compung and Applcaons ISSN (onlne: 46-97 Vol. 4, No.. Pages -8, 6 omoopy Funcon for Solvng Eponenal Equaons S. S. Raml *,,. Mohamad Nor,a, N. S. Saharzan,b and M.

More information

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6)

Econ107 Applied Econometrics Topic 5: Specification: Choosing Independent Variables (Studenmund, Chapter 6) Econ7 Appled Economercs Topc 5: Specfcaon: Choosng Independen Varables (Sudenmund, Chaper 6 Specfcaon errors ha we wll deal wh: wrong ndependen varable; wrong funconal form. Ths lecure deals wh wrong ndependen

More information

TSS = SST + SSE An orthogonal partition of the total SS

TSS = SST + SSE An orthogonal partition of the total SS ANOVA: Topc 4. Orhogonal conrass [ST&D p. 183] H 0 : µ 1 = µ =... = µ H 1 : The mean of a leas one reamen group s dfferen To es hs hypohess, a basc ANOVA allocaes he varaon among reamen means (SST) equally

More information

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation Global Journal of Pure and Appled Mahemacs. ISSN 973-768 Volume 4, Number 6 (8), pp. 89-87 Research Inda Publcaons hp://www.rpublcaon.com Exsence and Unqueness Resuls for Random Impulsve Inegro-Dfferenal

More information

Department of Economics University of Toronto

Department of Economics University of Toronto Deparmen of Economcs Unversy of Torono ECO408F M.A. Economercs Lecure Noes on Heeroskedascy Heeroskedascy o Ths lecure nvolves lookng a modfcaons we need o make o deal wh he regresson model when some of

More information

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading Onlne Supplemen for Dynamc Mul-Technology Producon-Invenory Problem wh Emssons Tradng by We Zhang Zhongsheng Hua Yu Xa and Baofeng Huo Proof of Lemma For any ( qr ) Θ s easy o verfy ha he lnear programmng

More information

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD Journal of Appled Mahemacs and Compuaonal Mechancs 3, (), 45-5 HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD Sansław Kukla, Urszula Sedlecka Insue of Mahemacs,

More information

Math 128b Project. Jude Yuen

Math 128b Project. Jude Yuen Mah 8b Proec Jude Yuen . Inroducon Le { Z } be a sequence of observed ndependen vecor varables. If he elemens of Z have a on normal dsrbuon hen { Z } has a mean vecor Z and a varancecovarance marx z. Geomercally

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machne Learnng & Percepon Insrucor: Tony Jebara SVM Feaure & Kernel Selecon SVM Eensons Feaure Selecon (Flerng and Wrappng) SVM Feaure Selecon SVM Kernel Selecon SVM Eensons Classfcaon Feaure/Kernel

More information

Including the ordinary differential of distance with time as velocity makes a system of ordinary differential equations.

Including the ordinary differential of distance with time as velocity makes a system of ordinary differential equations. Soluons o Ordnary Derenal Equaons An ordnary derenal equaon has only one ndependen varable. A sysem o ordnary derenal equaons consss o several derenal equaons each wh he same ndependen varable. An eample

More information

General Weighted Majority, Online Learning as Online Optimization

General Weighted Majority, Online Learning as Online Optimization Sascal Technques n Robocs (16-831, F10) Lecure#10 (Thursday Sepember 23) General Weghed Majory, Onlne Learnng as Onlne Opmzaon Lecurer: Drew Bagnell Scrbe: Nahanel Barshay 1 1 Generalzed Weghed majory

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Li An-Ping. Beijing , P.R.China

Li An-Ping. Beijing , P.R.China A New Type of Cpher: DICING_csb L An-Png Bejng 100085, P.R.Chna apl0001@sna.com Absrac: In hs paper, we wll propose a new ype of cpher named DICING_csb, whch s derved from our prevous sream cpher DICING.

More information

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas)

Lecture 18: The Laplace Transform (See Sections and 14.7 in Boas) Lecure 8: The Lalace Transform (See Secons 88- and 47 n Boas) Recall ha our bg-cure goal s he analyss of he dfferenal equaon, ax bx cx F, where we emloy varous exansons for he drvng funcon F deendng on

More information

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth

Should Exact Index Numbers have Standard Errors? Theory and Application to Asian Growth Should Exac Index umbers have Sandard Errors? Theory and Applcaon o Asan Growh Rober C. Feensra Marshall B. Rensdorf ovember 003 Proof of Proposon APPEDIX () Frs, we wll derve he convenonal Sao-Vara prce

More information

Let s treat the problem of the response of a system to an applied external force. Again,

Let s treat the problem of the response of a system to an applied external force. Again, Page 33 QUANTUM LNEAR RESPONSE FUNCTON Le s rea he problem of he response of a sysem o an appled exernal force. Agan, H() H f () A H + V () Exernal agen acng on nernal varable Hamlonan for equlbrum sysem

More information

Lecture 2 L n i e n a e r a M od o e d l e s

Lecture 2 L n i e n a e r a M od o e d l e s Lecure Lnear Models Las lecure You have learned abou ha s machne learnng Supervsed learnng Unsupervsed learnng Renforcemen learnng You have seen an eample learnng problem and he general process ha one

More information

Fall 2010 Graduate Course on Dynamic Learning

Fall 2010 Graduate Course on Dynamic Learning Fall 200 Graduae Course on Dynamc Learnng Chaper 4: Parcle Flers Sepember 27, 200 Byoung-Tak Zhang School of Compuer Scence and Engneerng & Cognve Scence and Bran Scence Programs Seoul aonal Unversy hp://b.snu.ac.kr/~bzhang/

More information

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes. umercal negraon of he dffuson equaon (I) Fne dfference mehod. Spaal screaon. Inernal nodes. R L V For hermal conducon le s dscree he spaal doman no small fne spans, =,,: Balance of parcles for an nernal

More information

Comb Filters. Comb Filters

Comb Filters. Comb Filters The smple flers dscussed so far are characered eher by a sngle passband and/or a sngle sopband There are applcaons where flers wh mulple passbands and sopbands are requred Thecomb fler s an example of

More information

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance INF 43 3.. Repeon Anne Solberg (anne@f.uo.no Bayes rule for a classfcaon problem Suppose we have J, =,...J classes. s he class label for a pxel, and x s he observed feaure vecor. We can use Bayes rule

More information

Appendix to Online Clustering with Experts

Appendix to Online Clustering with Experts A Appendx o Onlne Cluserng wh Expers Furher dscusson of expermens. Here we furher dscuss expermenal resuls repored n he paper. Ineresngly, we observe ha OCE (and n parcular Learn- ) racks he bes exper

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 0 Canoncal Transformaons (Chaper 9) Wha We Dd Las Tme Hamlon s Prncple n he Hamlonan formalsm Dervaon was smple δi δ Addonal end-pon consrans pq H( q, p, ) d 0 δ q ( ) δq ( ) δ

More information

P R = P 0. The system is shown on the next figure:

P R = P 0. The system is shown on the next figure: TPG460 Reservor Smulaon 08 page of INTRODUCTION TO RESERVOIR SIMULATION Analycal and numercal soluons of smple one-dmensonal, one-phase flow equaons As an nroducon o reservor smulaon, we wll revew he smples

More information

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data Anne Chao Ncholas J Goell C seh lzabeh L ander K Ma Rober K Colwell and Aaron M llson 03 Rarefacon and erapolaon wh ll numbers: a framewor for samplng and esmaon n speces dversy sudes cology Monographs

More information

CHAPTER 2: Supervised Learning

CHAPTER 2: Supervised Learning HATER 2: Supervsed Learnng Learnng a lass from Eamples lass of a famly car redcon: Is car a famly car? Knowledge eracon: Wha do people epec from a famly car? Oupu: osve (+) and negave ( ) eamples Inpu

More information

Chapter 6: AC Circuits

Chapter 6: AC Circuits Chaper 6: AC Crcus Chaper 6: Oulne Phasors and he AC Seady Sae AC Crcus A sable, lnear crcu operang n he seady sae wh snusodal excaon (.e., snusodal seady sae. Complee response forced response naural response.

More information

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s Ordnary Dfferenal Equaons n Neuroscence wh Malab eamples. Am - Gan undersandng of how o se up and solve ODE s Am Undersand how o se up an solve a smple eample of he Hebb rule n D Our goal a end of class

More information

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University Hdden Markov Models Followng a lecure by Andrew W. Moore Carnege Mellon Unversy www.cs.cmu.edu/~awm/uorals A Markov Sysem Has N saes, called s, s 2.. s N s 2 There are dscree meseps, 0,, s s 3 N 3 0 Hdden

More information

Tight results for Next Fit and Worst Fit with resource augmentation

Tight results for Next Fit and Worst Fit with resource augmentation Tgh resuls for Nex F and Wors F wh resource augmenaon Joan Boyar Leah Epsen Asaf Levn Asrac I s well known ha he wo smple algorhms for he classc n packng prolem, NF and WF oh have an approxmaon rao of

More information

Machine Learning Linear Regression

Machine Learning Linear Regression Machne Learnng Lnear Regresson Lesson 3 Lnear Regresson Bascs of Regresson Leas Squares esmaon Polynomal Regresson Bass funcons Regresson model Regularzed Regresson Sascal Regresson Mamum Lkelhood (ML)

More information

2 Aggregate demand in partial equilibrium static framework

2 Aggregate demand in partial equilibrium static framework Unversy of Mnnesoa 8107 Macroeconomc Theory, Sprng 2009, Mn 1 Fabrzo Perr Lecure 1. Aggregaon 1 Inroducon Probably so far n he macro sequence you have deal drecly wh represenave consumers and represenave

More information

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β SARAJEVO JOURNAL OF MATHEMATICS Vol.3 (15) (2007), 137 143 SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β M. A. K. BAIG AND RAYEES AHMAD DAR Absrac. In hs paper, we propose

More information

Testing a new idea to solve the P = NP problem with mathematical induction

Testing a new idea to solve the P = NP problem with mathematical induction Tesng a new dea o solve he P = NP problem wh mahemacal nducon Bacground P and NP are wo classes (ses) of languages n Compuer Scence An open problem s wheher P = NP Ths paper ess a new dea o compare he

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecure Sldes for Machne Learnng nd Edon ETHEM ALPAYDIN, modfed by Leonardo Bobadlla and some pars from hp://www.cs.au.ac.l/~aparzn/machnelearnng/ The MIT Press, 00 alpaydn@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/mle

More information

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment EEL 6266 Power Sysem Operaon and Conrol Chaper 5 Un Commmen Dynamc programmng chef advanage over enumeraon schemes s he reducon n he dmensonaly of he problem n a src prory order scheme, here are only N

More information

Comparison of Differences between Power Means 1

Comparison of Differences between Power Means 1 In. Journal of Mah. Analyss, Vol. 7, 203, no., 5-55 Comparson of Dfferences beween Power Means Chang-An Tan, Guanghua Sh and Fe Zuo College of Mahemacs and Informaon Scence Henan Normal Unversy, 453007,

More information

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction

F-Tests and Analysis of Variance (ANOVA) in the Simple Linear Regression Model. 1. Introduction ECOOMICS 35* -- OTE 9 ECO 35* -- OTE 9 F-Tess and Analyss of Varance (AOVA n he Smple Lnear Regresson Model Inroducon The smple lnear regresson model s gven by he followng populaon regresson equaon, or

More information

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times Reacve Mehods o Solve he Berh AllocaonProblem wh Sochasc Arrval and Handlng Tmes Nsh Umang* Mchel Berlare* * TRANSP-OR, Ecole Polyechnque Fédérale de Lausanne Frs Workshop on Large Scale Opmzaon November

More information

Notes on the stability of dynamic systems and the use of Eigen Values.

Notes on the stability of dynamic systems and the use of Eigen Values. Noes on he sabl of dnamc ssems and he use of Egen Values. Source: Macro II course noes, Dr. Davd Bessler s Tme Seres course noes, zarads (999) Ineremporal Macroeconomcs chaper 4 & Techncal ppend, and Hamlon

More information

2.1 Constitutive Theory

2.1 Constitutive Theory Secon.. Consuve Theory.. Consuve Equaons Governng Equaons The equaons governng he behavour of maerals are (n he spaal form) dρ v & ρ + ρdv v = + ρ = Conservaon of Mass (..a) d x σ j dv dvσ + b = ρ v& +

More information

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms Course organzaon Inroducon Wee -2) Course nroducon A bref nroducon o molecular bology A bref nroducon o sequence comparson Par I: Algorhms for Sequence Analyss Wee 3-8) Chaper -3, Models and heores» Probably

More information

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window A Deermnsc Algorhm for Summarzng Asynchronous Sreams over a Sldng ndow Cosas Busch Rensselaer Polyechnc Insue Srkana Trhapura Iowa Sae Unversy Oulne of Talk Inroducon Algorhm Analyss Tme C Daa sream: 3

More information

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC CH.3. COMPATIBILITY EQUATIONS Connuum Mechancs Course (MMC) - ETSECCPB - UPC Overvew Compably Condons Compably Equaons of a Poenal Vecor Feld Compably Condons for Infnesmal Srans Inegraon of he Infnesmal

More information

Political Economy of Institutions and Development: Problem Set 2 Due Date: Thursday, March 15, 2019.

Political Economy of Institutions and Development: Problem Set 2 Due Date: Thursday, March 15, 2019. Polcal Economy of Insuons and Developmen: 14.773 Problem Se 2 Due Dae: Thursday, March 15, 2019. Please answer Quesons 1, 2 and 3. Queson 1 Consder an nfne-horzon dynamc game beween wo groups, an ele and

More information

ON THE WEAK LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS

ON THE WEAK LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS ON THE WEA LIMITS OF SMOOTH MAPS FOR THE DIRICHLET ENERGY BETWEEN MANIFOLDS FENGBO HANG Absrac. We denfy all he weak sequenal lms of smooh maps n W (M N). In parcular, hs mples a necessary su cen opologcal

More information

Satisficing in Gaussian bandit problems

Satisficing in Gaussian bandit problems Sasfcng n Gaussan band problems Paul Reverdy and Naom E. Leonard Absrac We propose a sasfcng objecve for he mularmed band problem,.e., where he objecve s o acheve performance above a gven hreshold. We

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 9 Hamlonan Equaons of Moon (Chaper 8) Wha We Dd Las Tme Consruced Hamlonan formalsm H ( q, p, ) = q p L( q, q, ) H p = q H q = p H = L Equvalen o Lagrangan formalsm Smpler, bu

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Ths documen s downloaded from DR-NTU, Nanyang Technologcal Unversy Lbrary, Sngapore. Tle A smplfed verb machng algorhm for word paron n vsual speech processng( Acceped verson ) Auhor(s) Foo, Say We; Yong,

More information

Density Matrix Description of NMR BCMB/CHEM 8190

Density Matrix Description of NMR BCMB/CHEM 8190 Densy Marx Descrpon of NMR BCMBCHEM 89 Operaors n Marx Noaon Alernae approach o second order specra: ask abou x magnezaon nsead of energes and ranson probables. If we say wh one bass se, properes vary

More information

Advanced Macroeconomics II: Exchange economy

Advanced Macroeconomics II: Exchange economy Advanced Macroeconomcs II: Exchange economy Krzyszof Makarsk 1 Smple deermnsc dynamc model. 1.1 Inroducon Inroducon Smple deermnsc dynamc model. Defnons of equlbrum: Arrow-Debreu Sequenal Recursve Equvalence

More information

Mechanics Physics 151

Mechanics Physics 151 Mechancs Physcs 5 Lecure 9 Hamlonan Equaons of Moon (Chaper 8) Wha We Dd Las Tme Consruced Hamlonan formalsm Hqp (,,) = qp Lqq (,,) H p = q H q = p H L = Equvalen o Lagrangan formalsm Smpler, bu wce as

More information

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov

e-journal Reliability: Theory& Applications No 2 (Vol.2) Vyacheslav Abramov June 7 e-ournal Relably: Theory& Applcaons No (Vol. CONFIDENCE INTERVALS ASSOCIATED WITH PERFORMANCE ANALYSIS OF SYMMETRIC LARGE CLOSED CLIENT/SERVER COMPUTER NETWORKS Absrac Vyacheslav Abramov School

More information

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations Chaper 6: Ordnary Leas Squares Esmaon Procedure he Properes Chaper 6 Oulne Cln s Assgnmen: Assess he Effec of Sudyng on Quz Scores Revew o Regresson Model o Ordnary Leas Squares () Esmaon Procedure o he

More information

A HIERARCHICAL KALMAN FILTER

A HIERARCHICAL KALMAN FILTER A HIERARCHICAL KALMAN FILER Greg aylor aylor Fry Consulng Acuares Level 8, 3 Clarence Sree Sydney NSW Ausrala Professoral Assocae, Cenre for Acuaral Sudes Faculy of Economcs and Commerce Unversy of Melbourne

More information

Chapter Lagrangian Interpolation

Chapter Lagrangian Interpolation Chaper 5.4 agrangan Inerpolaon Afer readng hs chaper you should be able o:. dere agrangan mehod of nerpolaon. sole problems usng agrangan mehod of nerpolaon and. use agrangan nerpolans o fnd deraes and

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

January Examinations 2012

January Examinations 2012 Page of 5 EC79 January Examnaons No. of Pages: 5 No. of Quesons: 8 Subjec ECONOMICS (POSTGRADUATE) Tle of Paper EC79 QUANTITATIVE METHODS FOR BUSINESS AND FINANCE Tme Allowed Two Hours ( hours) Insrucons

More information

arxiv: v4 [math.pr] 30 Sep 2016

arxiv: v4 [math.pr] 30 Sep 2016 arxv:1511.05094v4 [mah.pr] 30 Sep 2016 The Unfed Approach for Bes Choce Modelng Appled o Alernave-Choce Selecon Problems Rém Dendevel Absrac The objecve of hs paper s o show ha he so-called unfed approach

More information

2. SPATIALLY LAGGED DEPENDENT VARIABLES

2. SPATIALLY LAGGED DEPENDENT VARIABLES 2. SPATIALLY LAGGED DEPENDENT VARIABLES In hs chaper, we descrbe a sascal model ha ncorporaes spaal dependence explcly by addng a spaally lagged dependen varable y on he rgh-hand sde of he regresson equaon.

More information

AT&T Labs Research, Shannon Laboratory, 180 Park Avenue, Room A279, Florham Park, NJ , USA

AT&T Labs Research, Shannon Laboratory, 180 Park Avenue, Room A279, Florham Park, NJ , USA Machne Learnng, 43, 65 91, 001 c 001 Kluwer Acadec Publshers. Manufacured n The Neherlands. Drfng Gaes ROBERT E. SCHAPIRE schapre@research.a.co AT&T Labs Research, Shannon Laboraory, 180 Park Avenue, Roo

More information

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015 /4/ Learnng Objecves Self Organzaon Map Learnng whou Exaples. Inroducon. MAXNET 3. Cluserng 4. Feaure Map. Self-organzng Feaure Map 6. Concluson 38 Inroducon. Learnng whou exaples. Daa are npu o he syse

More information

On elements with index of the form 2 a 3 b in a parametric family of biquadratic elds

On elements with index of the form 2 a 3 b in a parametric family of biquadratic elds On elemens wh ndex of he form a 3 b n a paramerc famly of bquadrac elds Bora JadrevĆ Absrac In hs paper we gve some resuls abou prmve negral elemens p(c p n he famly of bcyclc bquadrac elds L c = Q ) c;

More information