Inroducon o Boosng Cynha Rudn PACM, Prnceon Unversy Advsors Ingrd Daubeches and Rober Schapre
Say you have a daabase of news arcles, +, +, -, -, +, +, -, -, +, +, -, -, +, +, -, + where arcles are labeled + f he caegory s eneranmen, and - oherwse. Your goal s: Gven a new arcle, fnd s label. Ths s no easy, here are nosy daases, hgh dmensons.
Examples of Sascal Learnng Tasks: Opcal Characer Recognon OCR pos offce, banks, obec recognon n mages. Bonformacs analyss of gene array daa for umor deecon, proen classfcaon, ec. Webpage classfcaon search engnes, emal flerng, documen rereval Semanc classfcaon for speech, auomac.mp3 sorng Tme-seres predcon regresson Huge number of applcaons, bu all have hgh dmensonal daa
Examples of classfcaon algorhms: SVM ssuppor Vecor Machnes large margn classfers Neural Neworks Decson Trees / Decson Sumps CART RBF Neworks Neares Neghbors BayesNe Whch s he bes? Depends on amoun and ype of daa, and applcaon! I s a e beween SVM s and Boosed Decson Trees/Sumps for general applcaons. One can always fnd a problem where a parcular algorhm s he bes. Boosed convoluonal neural nes are he bes for OCR Yann LeCun e al.
Tranng Daa: {x,y }..m where x,y s chosen d from an unknown probably dsrbuon on X {-,}. space of all possble arcles labels Huge Queson: Gven a new random example x, can we predc s correc label wh hgh probably? Tha s, can we generalze from our ranng daa? X + + +? _
Huge Queson: Gven a new random example x, can we predc s correc label wh hgh probably? Tha s, can we generalze from our ranng daa? Yes!!! Tha s wha he feld of sascal learnng s all abou. The goal of sascal learnng s o characerze pons from an unknown probably dsrbuon when gven a represenave sample from ha dsrbuon.
How do we consruc a classfer? Dvde he space X no wo secons, based on he sgn of a funcon f : X R. Decson boundary s he zero-level se of f. + fx0 - X + + +? _ Classfers dvde he space no wo peces for bnary classfcaon. Mulclass classfcaon can always be reduced o bnary.
Overvew of Talk The Sascal Learnng Problem done Inroducon o Boosng and AdaBoos AdaBoos as Coordnae Descen The Margn Theory and Generalzaon
Say we have a weak learnng algorhm: A weak learnng algorhm produces weak classfers. Thnk of a weak classfer as a rule of humb Examples of weak classfers for eneranmen applcaon: h + f conans he erm move, - oherwse h2 + f conans he erm acor, - oherwse h3 + f conans he erm drama, - oherwse Wouldn be nce o combne he weak classfers?
Boosng algorhms combne weak classfers n a meanngful way. Example: f sgn[.4 h +.3 h2 +.3 h3 ] A So boosng f he arcle algorhm conans akes he as erm npu: move, and he word drama, bu no he word acor : - he weak learnng algorhm whch produces he weak classfers - a large The ranng value of daabase f s sgn[.4-.3+.3], so we label +. and oupus: - he coeffcens of he weak classfers o make he combned classfer
Two ways o use a Boosng Algorhm: As a way o ncrease he performance of already `srong` classfers. Ex. neural neworks, decson rees On her own wh a really basc weak classfer Ex. decson sumps
AdaBoos Freund and Schapre 95 -Sar wh a unform dsrbuon weghs over ranng examples. The A he weghs end, make ell he carefully! weak learnng a lnear algorhm combnaon whch of examples he weak are classfers mporan. obaned a all eraons. -Reques a weak f classfer from he weak learnng algorhm, h :X {-,}. x sgn λ h x + K+ λ x fnal h n n -Increase he weghs on he ranng examples ha were msclassfed. -Repea
AdaBoos Defne hree mporan hngs: d R m : dsrbuon weghs over examples a me d [.25.3.2.25 ] 2 3 4
AdaBoos Defne hree mporan hngs: λ R n : coeffs of weak classfers for he lnear combnaon f, x sgn λ, h x +... + λ nhn x
AdaBoos Defne: M R m n :marx of hypoheses and daa h h h n move acor drama Enumerae every possble weak classfer whch can be produced by weak learnng algorhm M m # of daa pons M : h x y f weak classf. h classfes p x oherwse correcly The marx M has oo many columns o acually be enumeraed. M acs as he only npu o AdaBoos.
M AdaBoos λ fnal d, λ
AdaBoos Freund and Schapre 95 end for ln 2 arg max for all.. for 0 ', ' m fnal r r r e e d T e λ λ M d M d λ T T Mλ Mλ α α + + + Inalze coeffs o 0 Calculae normalzed dsrbuon Reques weak classf. from weak learnng algorhm } Updae lnear combo of weak classfers
AdaBoos Freund and Schapre 95 Edge or correlaon of weak classfer. ] [, m T h y h y d d E x M d end for ln 2 arg max for all.. for 0 ', ' m fnal r r r e e d T e λ λ M d M d λ T T Mλ Mλ α α + + +
AdaBoos as Coordnae Descen Breman, Mason e al., Duffy and Helmbold, ec. noced ha AdaBoos s a coordnae descen algorhm. Coordnae descen s a mnmzaon algorhm lke graden descen, excep ha we only move along coordnaes. We canno calculae he graden because of he hgh dmensonaly of he space! coordnaes weak classfers dsance o move n ha drecon he updae α
AdaBoos mnmzes he followng funcon va coordnae descen: m e F : Mλ λ Choose a drecon: max arg M d T Choose a dsance o move n ha drecon: r r r e λ λ M d T α α + + + ln 2
The funcon Mλ F λ : e s convex: m If he daa s non-separable by he weak classfers, he mnmzer of F occurs when he sze of λ s fne. Ths case s ok. AdaBoos converges o somehng we undersand. 2 If he daa s separable, he mnmum of F s 0 Ths case s confusng!
The orgnal paper suggesed ha AdaBoos would probably overf Bu ddn n pracce! Why no? The margn heory!
Boosng and Margns We wan he boosed classfer defned va λ o generalze well,.e., we wan o perform well on daa ha s no n he ranng se. The margn heory: The margn of a boosed classfer ndcaes wheher wll generalze well. Schapre e al. 98 Large margn classfers work well n pracce, bu here s more o hs sory. Thnk of he margn as he confdence of a predcon.
Generalzaon Ably of Boosed Classfers Can we guess wheher a boosed classfer f generalzes well? Can no calculae Pr error f Mnmze he rhs of a loose nequaly such as hs one Schapre e al. When here are no ranng errors, wh probably a leas -δ, Pr error f Ο Probably ha classfer f makes an error on a random poson x X m 2 log d m d + log 2 µ f δ # of ranng examples dvc dm. of hyp. space, d m 2. margn of f
The margn heory: When here are no ranng errors, wh hgh probably: Schapre e al, 98 Pr Probably ha classfer f makes an error on a random poson x X error f ~ Ο d m µ f. dvc dm. of hyp. space, d m # of ranng examples margn of f Large margn beer generalzaon smaller probably of error
For Boosng, he margn of combned classfer f λ where f λ : sgnλ h + + λ n h n s defned by margn : µ f λ : mn Mλ λ.
Does AdaBoos produce maxmum margn classfers? AdaBoos was nvened before he margn heory Grove and Schuurmans 98 - yes, emprcally. Schapre, e al. 98 - proved AdaBoos acheves a leas half he maxmum possble margn. Räsch and Warmuh 03 - yes, emprcally. - mproved he bound. R, Daubeches, Schapre 04 - no, doesn.
AdaBoos performs myserously well! AdaBoos performs beer han algorhms whch are desgned o maxmze he margn
Sll open: Why does AdaBoos work so well? Does AdaBoos converge? Beer / more predcable boosng algorhms!