Boostng: a st of approxmat prdctons ar to b combnd n a smpl way n ordr to obtan a mor accurat on. Show that, n th bnary classcaton cas, takng a wghtd

Similar documents
A Note on Estimability in Linear Models

Epistemic Foundations of Game Theory. Lecture 1

Soft k-means Clustering. Comp 135 Machine Learning Computer Science Tufts University. Mixture Models. Mixture of Normals in 1D

te Finance (4th Edition), July 2017.

An Overview of Markov Random Field and Application to Texture Segmentation

ST 524 NCSU - Fall 2008 One way Analysis of variance Variances not homogeneous

Review - Probabilistic Classification

Grand Canonical Ensemble

COMPLEX NUMBER PAIRWISE COMPARISON AND COMPLEX NUMBER AHP

Outlier-tolerant parameter estimation

ON THE COMPLEXITY OF K-STEP AND K-HOP DOMINATING SETS IN GRAPHS

cycle that does not cross any edges (including its own), then it has at least

10/7/14. Mixture Models. Comp 135 Introduction to Machine Learning and Data Mining. Maximum likelihood estimation. Mixture of Normals in 1D

Economics 600: August, 2007 Dynamic Part: Problem Set 5. Problems on Differential Equations and Continuous Time Optimization

The Hyperelastic material is examined in this section.

Chapter 6 Student Lecture Notes 6-1

Lucas Test is based on Euler s theorem which states that if n is any integer and a is coprime to n, then a φ(n) 1modn.

8-node quadrilateral element. Numerical integration

Analyzing Frequencies

Journal of Theoretical and Applied Information Technology 10 th January Vol. 47 No JATIT & LLS. All rights reserved.

1 Minimum Cut Problem

Econ107 Applied Econometrics Topic 10: Dummy Dependent Variable (Studenmund, Chapter 13)

arxiv: v1 [math.pr] 28 Jan 2019

Abstract Interpretation: concrete and abstract semantics

Folding of Regular CW-Complexes

CHAPTER 7d. DIFFERENTIATION AND INTEGRATION

ON EISENSTEIN-DUMAS AND GENERALIZED SCHÖNEMANN POLYNOMIALS

Section 6.1. Question: 2. Let H be a subgroup of a group G. Then H operates on G by left multiplication. Describe the orbits for this operation.

Derangements and Applications

Einstein Equations for Tetrad Fields

167 T componnt oftforc on atom B can b drvd as: F B =, E =,K (, ) (.2) wr w av usd 2 = ( ) =2 (.3) T scond drvatv: 2 E = K (, ) = K (1, ) + 3 (.4).2.2

Search sequence databases 3 10/25/2016

Group Codes Define Over Dihedral Groups of Small Order

Optimal Ordering Policy in a Two-Level Supply Chain with Budget Constraint

Physics of Very High Frequency (VHF) Capacitively Coupled Plasma Discharges

Random Process Part 1

Square of Hamilton cycle in a random graph

MP IN BLOCK QUASI-INCOHERENT DICTIONARIES

A Probabilistic Characterization of Simulation Model Uncertainties

Cramér-Rao Inequality: Let f(x; θ) be a probability density function with continuous parameter

Lecture 3: Phasor notation, Transfer Functions. Context

Elements of Statistical Thermodynamics

Basic Polyhedral theory

CHAPTER 33: PARTICLE PHYSICS

Decision-making with Distance-based Operators in Fuzzy Logic Control

A NEW GENERALISATION OF SAM-SOLAI S MULTIVARIATE ADDITIVE GAMMA DISTRIBUTION*

ACOUSTIC WAVE EQUATION. Contents INTRODUCTION BULK MODULUS AND LAMÉ S PARAMETERS

External Equivalent. EE 521 Analysis of Power Systems. Chen-Ching Liu, Boeing Distinguished Professor Washington State University

CPSC 665 : An Algorithmist s Toolkit Lecture 4 : 21 Jan Linear Programming

The Matrix Exponential

September 27, Introduction to Ordinary Differential Equations. ME 501A Seminar in Engineering Analysis Page 1. Outline

First derivative analysis

Decentralized Adaptive Control and the Possibility of Utilization of Networked Control System

Deift/Zhou Steepest descent, Part I

The Matrix Exponential

Ερωτήσεις και ασκησεις Κεφ. 10 (για μόρια) ΠΑΡΑΔΟΣΗ 29/11/2016. (d)

COHORT MBA. Exponential function. MATH review (part2) by Lucian Mitroiu. The LOG and EXP functions. Properties: e e. lim.

Higher order derivatives

2. Grundlegende Verfahren zur Übertragung digitaler Signale (Zusammenfassung) Informationstechnik Universität Ulm

Hardy-Littlewood Conjecture and Exceptional real Zero. JinHua Fei. ChangLing Company of Electronic Technology Baoji Shannxi P.R.

Quasi-Classical States of the Simple Harmonic Oscillator

From Structural Analysis to FEM. Dhiman Basu

u x v x dx u x v x v x u x dx d u x v x u x v x dx u x v x dx Integration by Parts Formula

Discrete Shells Simulation

Network Congestion Games

Electrochemical Equilibrium Electromotive Force. Relation between chemical and electric driving forces

BINOMIAL COEFFICIENTS INVOLVING INFINITE POWERS OF PRIMES

Naresuan University Journal: Science and Technology 2018; (26)1

Computation of Greeks Using Binomial Tree

You already learned about dummies as independent variables. But. what do you do if the dependent variable is a dummy?

Code Design for the Low SNR Noncoherent MIMO Block Rayleigh Fading Channel

Reliability of time dependent stress-strength system for various distributions

Function Spaces. a x 3. (Letting x = 1 =)) a(0) + b + c (1) = 0. Row reducing the matrix. b 1. e 4 3. e 9. >: (x = 1 =)) a(0) + b + c (1) = 0

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

Lecture 37 (Schrödinger Equation) Physics Spring 2018 Douglas Fields

A Sub-Optimal Log-Domain Decoding Algorithm for Non-Binary LDPC Codes

The Fourier Transform

Advanced Macroeconomics

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Approximately Maximizing Efficiency and Revenue in Polyhedral Environments

GPC From PeakSimple Data Acquisition

Logistic Regression I. HRP 261 2/10/ am

On spanning trees and cycles of multicolored point sets with few intersections

An Application of Hardy-Littlewood Conjecture. JinHua Fei. ChangLing Company of Electronic Technology Baoji Shannxi P.R.China

Gradebook & Midterm & Office Hours

Consider a system of 2 simultaneous first order linear equations

MA 262, Spring 2018, Final exam Version 01 (Green)

THE joint congestion-control and scheduling problem in

Total Least Squares Fitting the Three-Parameter Inverse Weibull Density

ECE602 Exam 1 April 5, You must show ALL of your work for full credit.

Guo, James C.Y. (1998). "Overland Flow on a Pervious Surface," IWRA International J. of Water, Vol 23, No 2, June.

On the irreducibility of some polynomials in two variables

BINOMIAL COEFFICIENTS INVOLVING INFINITE POWERS OF PRIMES. 1. Statement of results

Jones vector & matrices

Stress-Based Finite Element Methods for Dynamics Analysis of Euler-Bernoulli Beams with Various Boundary Conditions

The Variance-Covariance Matrix


MATCHED FILTER BOUND OPTIMIZATION FOR MULTIUSER DOWNLINK TRANSMIT BEAMFORMING

Problem Set 6 Solutions

5.80 Small-Molecule Spectroscopy and Dynamics

Transcription:

Boostng Rvstd Massmo Santn Dpartmnto d Scnz dll'informazon Unvrsta dgl Stud d Mlano Va Comlco, 39/41-20135 Mlano (Italy) Tchncal Rport 205-97 Abstract In ths papr som boostng and on-ln allocaton stratgs ar rvstd n ordr to provd nw proofs nsprd by Chrno's boundng tchnqus takn from common Probablty Thory. Evn f som of th rsults prsntd hr ar wakr than th bst known today, th gnralzd styl of th proofs hopfully ors a bttr undrstandng of known rsults and suggsts nw stratgs. Ths papr s a Mnor Thss for th PhD program of \Unvrsta Statal d Mlano". Th rfr of ths papr s Dr. Ncolo Csa-Banch. 1 Intorducton Th topcs tratd n ths papr can b nformally summarzd as follows: On-ln allocaton: for a crtan numbr of rounds a xd amount of a gvn rsourc s to b allocatd among a st of possbl chocs judgng only by th outcoms of prvous rounds. Show that, vn gnorng any a pror knowldg about th prformanc of ach ndvdual choc, an allocaton stratgy xsts whch nvr prforms much wors than th bst choc. PhD studnt at th \Unvrsta Statal d Mlano". E-mal: <santn@ds.unm.t>

Boostng: a st of approxmat prdctons ar to b combnd n a smpl way n ordr to obtan a mor accurat on. Show that, n th bnary classcaton cas, takng a wghtd majorty ovr a st of prdctons (ach on a lttl smartr than a random guss), th accuracy can b xponntally ncrasd wth rspct to th numbr of combnd prdctons. Dspt th drncs btwn thm, ths problms can b tratd n th sam probablstc framwork and a vry standard tchnqu can b adaptd to prov both th xstnc of an \optmal" on-ln allocaton stratgy and a boostng stratgy for prdcton algorthms. 1.1 Chrno's boundng To ntroduc Chrno's boundng tchnqu [Hof63], whch s cntral to ths papr, th proof of a vry smpl nqualty s now sktchd. Lt X n wth (n = 1; : : : ; N) a famly of..d. r.v.'s dnd on th p.s. h; F; P such that E[X n ] = 0 and X n 2 [0; 1]; thn, for any >), P " NX n=1 X n N " #?N" E P X n =?N" NY E Xn?2N" 2 whr th rst stp s du to th Markov's nqualty (lmma A.1), th scond to th stochastc ndpndnc of th r.v.'s and th last to a standard bound on momnt gnratng functon (lmma A.3). Ths rsult s usually paraphrasd P sayng that, undr sutabl hypothss, th probablty X that th mprcal man n xcds by an amount " th man E[X], dcrass xponntally fastr both wth th amount " and th sampl sz N N. n=1 Th crucal da of on-ln allocaton and boostng stratgs hr dscussd s how to buld a famly of r.v.'s rprsntng th \loss" of a crtan choc or th \accuracy" of a crtan prdcton, n such a way that vn f ths r.v.'s arn't stochastcally ndpndnt t rmans possbl to \xchang th xpctaton wth th product" as n th prvous drvaton [CB97b]. Snc th sum of ths r.v.'s wll b rlatd to th prformanc of th on-ln allocaton stratgy or th accuracy of th boostd prdcton, Chrno's boundng tchnqu bcoms a way to rlat ovrall prformanc to sngl choc or prdcton prformanc. Th focus of th nxt scton s on on-ln allocaton, thn boostng for classcaton prdctons s ntroducd. Th last two scton dal wth boostng for rgrsson, startng wth a rducton from rgrsson to classcaton and thn wth a mor drct approach. Each of ths sctons ar organzd n thr parts: a dnton of th topc, a concs collcton of formal proofs and a nal dscusson of th rsults wth som rfrnc to th xstng 2

ltratur. An appndx wth som tchncal lmmas ndd to complt th proofs n th prvous sctons s also provdd. Rmark that n th followng th trm \stratgy" s purposly usd nstad of th mor prcs trm \algorthm" snc th prsntd proofs and constructons ar dscussd wth no partcular strss on computatonal ssus; t s howvr a straghtforward task to rndr as algorthms most of th stratgs hr ntroducd. 2 On-ln allocaton 2.1 Dntons Gvn a nt st of chocs, a xd allocaton s a dstrbuton 1 on such that th mags sum up to 1 and a boundd loss s smply a functon P 7! [0; 1]; gvn a xd allocaton p and a boundd loss l, th surd loss s th amount p(!)l(!).!2 Consdrng P a squnc of rounds ndxd by t, f l t s a squnc of boundd losss th!-loss l t t(!) s th ovrall loss surd assumng a xd allocaton wth valu 1 on! constantly on all th rounds, th total loss s just th sum of all rounds surd losss and th nt loss s th drnc btwn th total loss and th mnmum!-loss ovr all th chocs. An on-ln allocaton stratgy s a way of choosng a squnc of xd allocatons p t wth rspct to an arbtrary squnc of boundd losss l t n such a way that th choc of p t dpnds only on l t 0 wth t 0 < t (but l t 0 s allowd to dpnd on any of th p t ). Th gnral am of an on-ln allocaton stratgy s to provd som bound on th nt loss snc th dnton of th stratgy allows a compltly advrsaral squnc of losss thus makng any bound on th total loss mannglss. Th formulaton of th problm allows to consdr a p.m. ovr a dscrt p.s. as a xd allocaton and a boundd r.v. as a boundd loss; s also asy to chck that th gvn dnton of surd loss concds n ths rspct wth an xpctaton. Th followng rsults ar thrfor statd n ths mor abstract sttng and thn dscussd n trm of on-ln allocaton stratgs. 2.2 Formal proofs Consdr a p.s. h; F; P. Lt L t :! [0; 1] b a famly 2 of boundd r.v.'s on h; F; P. 1 a functon 7! [0; 1] such that th mags sum up to 1. 2 n th followng th ndx t s ntndd to vary n 1; : : : ; T and th sam holds for (not xplctly ndxd) sums and products. 3

Gvn a ral postv paramtr, consdr th famly of p.m.'s on h; F such that P 1 = P and P t+1 = t P t?lt wth normalzng 1= t = E Pt?L t Obsrv that L t ar r.v.'s also on h; F; P t ; dn = X E Pt [L t ] and! = X L t (!) Thorm 2.1 If A 2 F and P [A] 6= 0, thn Proof.? max! 1!2A ln 1 P [A] + T 8?+T 2 =8 = Q?E Pt [L t]+ 2 =8 Q EPt?L t = EP P?Lt P!2A?P L t(!) P [!]? max!2ap Lt(!) P!2A P [!] =? max!2a! P [A] whr th \lft" drvatons follow by dnton of, lmmas A.2 and A.3 and th \rght" ons by postvty of?p L t(!) and by dnton of!. 2 Corollary 2.1 If p = P [! ] 6= 0, thn? (1=) ln(1=p) + T =8. Proof. By smply takng A = f! :! g n th prvous thorm; obsrv that, n ths cas, th \rght" part of th proof bcoms a standard Markov's nqualty. 2 Corollary 2.2 If p = P [argmn!2! ] 6= 0, thn? mn!2! (1=) ln(1=p) + T =8. Proof. By smply takng A = fargmn!2! g n th prvous thorm. 2 Corollary 2.3 If P [!] 1=jj and = p 8 ln jj=t, thn? mn!2! T p 2 ln jj=t Proof. Dvdng by T th nqualty of th prvous corollary, and mnmzng th trm (1=) ln jj + T =8 by smply drntatng wth rspct to. 2 4

2.3 Dscusson In th followng an ntrprtaton of th corollars s gvn as an xplanaton of th vry gnral nqualty of th thorm; not that probablty plays only th rol of a masur and consquntly all th followng conclusons (xcpt th rst, rlatv to corollary 2.1) ar dtrmnstc n th strct sns. Assum that P modls a stochastc bhavor of th famly of chocs (whch can b n prncpl a contnuous famly nstad of a nt on as n th sttng assumd hr), thn p can b ntrprtd as th probablty, wth rspct to th stochastc bhavor of th chocs, that th!-loss s boundd by som. Th nqualty of corollary 2.1 thn rlats th drnc btwn th total loss and such bound wth th logarthm of 1=p ssntally 3 n th sam way as n [Vov]. Th corollary 2.2 gvs a bound ssntally 4 of th form of th on prsntd n [FS97]. Accordng to th dnd sttng, concds wth th total loss and! wth th!-loss of th choc!, so th corollary 2.3 assrts that th on-ln allocaton stratgy corrspondng to th squnc of allocatons P t (startd wth a xd allocaton p that s qual for ach choc), s such that th avrag nt loss gos to 0 wth T as O( ln jj=t ); ths last rsult concds wth th on prsntd n [FS97]. 3 Bnary classcaton 3.1 Dntons Classcaton rfrs n gnral to th problm of prdctng a labl n a nt st L for ach lmnt of a st I of nstancs accordng to som rlatonshp btwn nstancs and labls that can b thought as an (unknown) dtrmnstc mappng I 7! L or as a jont probablty dstrbuton ovr I L; a prdcton s thn a mappng I 7! L whos rror s a masur of th dscrpancy btwn prdctd and ntndd labl (accordng th unknown mappng or th jont dstrbuton). Whn th cardnalty of L s 2, th classcaton s sad to b bnary. Goal of a boostng stratgy s to combn a famly of smpl prdctons n ordr to obtan a nal prdcton wth smallr rror wth rspct to th smpl ons. For computablty rasons, th attnton s usually rstrctd to a sampl,.. a nt numbr of pars (; l) drawn from I L accordng som dstrbuton; furthrmor som gnralzaton rsults ar usually provdd to rlat th rror on th sampl to th rror wth rspct to th whol st of nstancs. For ths rasons th followng formal proofs ar rstrctd to a nt st rprsntng th nstanc part of th sampl and a dtrmnstc mappng 7! f?1; +1g rprsntng 3 th drnc bng rlatv to th constants. 4 n [FS97] drnt, and n som sns optmal, constants ar prsnt. 5

th labl part of th sampl; s straghtforward to vrfy that varous bnary classcaton sttngs occurrng n ltratur can b rducd to th on prsntd hr. Th p.m. P assocatd to can b usd thr to somhow rproduc ovr th sampl th probablty gvn for th whol st of nstancs, or just consdrd unform to rlat to common mprcal stmat ovr th sampl. 3.2 Formal proofs Consdr a p.s. h; F; P, a functon c :! f?1; +1g and a famly of functons h t :! f?1; +1g, dn h :! R as whr t s a famly of ral paramtrs. h(!) = X t h t (!) Lt L t a famly of r.v.'s on h; F; P dnd as L t (!) = h t (!)c(!) s asy to chck that th rang of L t s f?1; +1g and that ( +1 h t (!) = c(!) L t (!) =?1 h t (!) 6= c(!) Consdr th famly of p.m.'s on h; F such that P 1 = P and P t+1 = t P t?tlt wth normalzng 1= t = E Pt? tl t Obsrv that L t ar r.v.'s also on h; F; P t ; dn " t = P t [h t 6= c] thn E Pt [L t ] = P t [L t = +1]? P t [L t =?1] = 1? 2P t [h t 6= c] = 1? 2" t Thorm 3.1 If t = 1? 2" t, thn P [hc 0] Q?2(1=2?"t)2. Proof. P [hc 0] = P = P h c X t h t 0 hx?t L t 0 E P P?tL t = Y E Pt? tl t Y?tE P t [L t]+ 2 t =2 whr last thr drvatons follow from lmmas A.1, A.2 and A.3 rspctvly. Hnc ths last product can b mnmzd drntatng wth rspct to t ach postv factor sparatly, obtanng th valu t = E Pt [L t ] = 1? 2" t whr th mnma ar attand. 2 6

Corollary 3.1 If > 0 xsts such that " t 1=2? and f(!) = sgn(h(!)), thn?2t 2 P [f 6= c] Proof. From th dnton of f t follows that P [f 6= c] = P [fc 0]; by straghtforward computaton, from th prvous thorm, t follows that P [fc 0]?2T 2. 2 Corollary 3.2 If t = 1?2" t 0, P t = 1 and 0, thn P [hc ] Q?2((1=2?"t)?=2)2. Proof. Th proof s ssntally th sam as for thorm 3.1, rplacng th 0 at th rght sd of th rst trm wth P t and thn prformng th sam drvatons. 2 3.3 Dscusson Evn f addrssng apparntly drnt stuatons, th proofs of thorms 3.1 and 2.1 shar a formal smlarty whch had bn justd [FS96b] by a knd of dualty btwn on-ln allocaton and boostng stratgs. Ths dualty has alrady bn xplotd to transform th wghtd majorty algorthm [LW94] to a boostng stratgy [FS97]. If th famly of mappngs h t s consdrd as a st of smpl prdctons and h as th nal on, thn " t rprsnts th rror wth rspct to probablty P t and th statmnt of th thorm rlats th postvty of h (and ultmatly th rror of sgn(h) as a prdcton) to th rrors of smpl prdctons. To bttr undrstand th gvn nqualty t s usually assumd a mor optmstc stuaton: th corollary 3.1 assrts that f all th smpl prdctons hav rror boundd away from 1=2 wth rspct to any P t, thn th rror of th nal prdcton f gos to 0 xponntally fast wth th numbr of combnd smpl prdctons. Consdr now a probablty P ovr I and lt rprsnt a sampl drawn ndpndntly at random form I accordng to P. In ordr to provd gnralzaton rsults th famly of smpl prdctons has to provd som (wak) structural proprty. A famly h t of smpl prdctons shattrs a subst S I f for any T S an h t xsts such that T = h?1 t (+1); th Vapnk-Chrvonnks dmnson [VC71] of th famly s th cardnalty of th largst (possbly nnt) shattrd subst of I. Undr th hypothss of corollary 3.2, f P (!) 1=jj and d < 1 s th Vapnk- Chrvonnks dmnson of th famly of smpl prdctons, thn wth probablty mor than 1? ovr th random choc of th sampl, t s possbl [SFBL97] to prov that P [hc 0] P [hc ] + O 1 p d ln 2 (jj=d) + ln(1=) jj 2 7 1=2!

by combnng ths nqualty wth th rsult of corollary 3.2 and at th sam tm assumng th hypothss of corollary 3.1, follows asly P [f 6= c]?2t (?=2)2 + O 1 p d ln 2 (jj=d) + ln(1=) jj 2 1=2! whch gvs an asymptotc bound on th gnralzaton rror n trm of sampl sz and numbr of combnd smpl prdctons. 4 Rgrsson, va rducton to classcaton 4.1 Dntons Rgrsson as tratd n ths papr s vry smlar to classcaton, th only drnc bng that th labl st s a nnt st ndowd wth a noton of dstanc takn as a masur of th dscrpancy btwn th prdctd and th ntndd labl for any nstanc, thus dnng a noton of rror smlar to th on ntroducd for th classcaton cas. For th sam computablty rasons as bfor, th st s a nt sampl as n th class- caton cas; vn n ths cas s straghtforward to vrfy that varous rgrsson sttngs occurrng n ltratur can b rducd to th on prsntd hr. Gvn a thrshold > 0 th rgrsson prdctons can b transformd n bnary prdctons n such a way that th bnary prdcton answrs +1 f th rgrsson on answrs a labl narr than to th ntndd on and?1 n th oppost cas. To prdct wth accuracy th corrct labl s thn quvalnt to prdct th constant labl +1. Onc th bnary classcaton boostng stratgy has rturnd th famly of paramtrs t a nal rgrsson prdcton can b obtand by a rlatvly smpl functon of ths valus. 4.2 Formal proofs Consdr a p.s. h; F; P, a functon c :! and a famly of functons h t :!. Lt d :! R + an arbtrary dstanc on ; gvn > 0 dn and h :! as!; = ft : d(h t (!); ) g h(!) = argmax 2 X!; t whr t s a famly of postv ral paramtrs. Hnc, for any t holds X X t?!;!;h(!) t 0 8

D Lt ; ~ ~F; ~PE b a p.s. whr ~ = and 5 ~P = P ; lt ~c : ~! f?1; +1g such that ~c(~!) +1 and lt h ~ t : ~! f?1; +1g b a famly of functons dnd as ( ~h t (~!) = h ~ +1 d(h t (!); ) t (!; ) =?1 d(h t (!); ) > and lt ~ h(~!) = P t ~ ht (~!), that s Th famly ~h(~!) = ~ h(!; ) = X!; t? X C!; D ~ ; ~F; ~P t E of p.m.'s s such that P t [d(h t (!); ) > ] = ~P t h ~h(!; ) 6= ~c(!; ) thrfor t = 1? 2 ~P t h ~h 6= ~c 0 f and only f P t [d(h t (!); ) > ] 1=2. Obsrv that d(; 0 ) > 2 mpls!; \!; 0 = ; or quvalntly C!; =!; 0 [ whr s a (possbly mpty) st of ndcs t; snc all th t ar postv s asy to chck that d(h(!); ) > 2 mpls X 0 ~h(!; ) = t? @ X 1 X t A? t 0!;!;h(!) t + X Thorm 4.1 If " t = P t [d(h t (!); ) > ] 1=2, thn P [d(h(!); ) > 2] Y?2(1=2?"t)2 t Proof. It follows asly from thorm 3.1 by th ctd rducton and obsrvng that P [d(h(!); ) > 2] ~P h ~h 0 = ~P h ~h~c 0 2 Corollary 4.1 If > 0 xsts such that P t [d(h t (!); ) > ] 1=2?, thn 5 whr dnots an unform masur on.?2t 2 P [d(h(!); ) > 2] 9

4.3 Dscusson Rducng rgrsson to classcaton for applyng boostng stratgs had bn rst suggstd n [FS97] for th cas = [0; 1] R. Instad, th prsnt approach s ssntally th sam dscussd n [BCP] whr = [0; 1] n R n for n > 1 and vn n th cas of n = 1, xprmntal analyss [FS96a] sms to show that th stratgy prsntd hr prforms bttr than th on n [FS97]. Th corollary 4.1 assrts that f th probablty of bng lss accurat than of th smpl prdctons s boundd away from 1=2 wth rspct to any P t, thn th probablty that th nal prdcton would b lss accurat than 2 gos to 0 xponntally fast wth th numbr of combnd smpl prdctons. Obsrv that all othr rsults rlat xpctatons to probablty whl hr s statd a rlaton xclusvly btwn probablts. 5 Rgrsson, a drct approach 5.1 Dntons Th followng approach s somhow mor gnral than th prvous on snc hr th rror s gvn n trm of an abstract loss functon whch nds only to b n som sns \convx" wth rspct to th composton chosn for combnng th smpl prdctons; f th labl st s a normd vctor spac on R th nducd dstanc (whch s convx n th usual sns) and an mprcal avrag of th prdctons, togthr satss th rqust of th prsntd sttng. 5.2 Formal proofs Consdr a p.s. h; F; P, a functon c :! and a famly of functons h t :!. Lt : T! and :! [l; l + ] R such that 6 for any! and dn h :! as T ((h 1 (!); : : : ; h T (!)); c) X (h t (!); c(!)) h(!) = (h 1 (!); : : : ; h T (!)) Lt L t a famly of r.v.'s on h; F; P dnd as L t (!) = (h t (!); c(!)) 6 obsrv that n th cas = R n th choc of (h 1 (!); : : : ; h T (!)) = 1=T P h t (!) and convx satss th dnton. 10

Gvn a ral postv paramtr, consdr th famly of p.m.'s on h; F such that P 1 = P and P t+1 = t P t Lt wth normalzng 1= t = E Pt L t Obsrv that L t ar r.v.'s also on h; F; P t ; dn = 1 X E Pt [L t ] T Thorm 5.1 If = 4= 2, thn P [(h; c) + ]?2T 2 = 2. Proof. P [(h; c) + ] P X 1 T (h t; c) + h X = P?T ( + ) + L t 0 P E P?T (+)+ L t Y P =?T (+) E Pt Lt?T (+) Y E P t [L t]+ 2 2 =8 whr th rst drvaton follows from th dnton of and last thr drvatons follow from lmmas A.1, A.2 and A.3 rspctvly. Hnc, gvn th dnton of, th last xprsson smpls to?t (+)+T 2 2 =8+ P E Pt [L t] = T 2 2 =8?T that can b mnmzd drntatng wth rspct to, obtanng th valu = 4= 2 whr th mnmum s attand. 2 5.3 Dscusson Rspct to th prvous stratgy, undr sutabl rstrcton on, th combnaton of th smpl prdctons hr can b as smpl as an avragd summaton whl th loss functon can stll rman a (boundd) dstanc; n ths sttng, th statmnt of thorm 5.1 assrts that [CB97a] th probablty that th loss of th nal prdcton xcds by any xd amount th avragd total loss gos to 0 wth th numbr T of th avragd smpl prdctons. A lss obvous task s how to compar (n th most gnral sttng) th rsults of thorms 4.1 and 5.1 snc vn f both gv an xponntally dcrasng uppr bound to th probablty of rror of th nal prdcton, wth rspct to th smpl prdctons, th formr s basd on probablty of rror, whl th lattr on th xpctaton of rror. 11

A Som tchncal lmmas Hr som tchncal lmmas whch ar ndd n th prvous proofs ar gvn. Evn f t s somtms unncssary rstrctv, hr th st s assumd to b nt, consquntly all th r.v.'s ar smpl. Lmma A.1 Lt X b a r.v. on h; F; P, thn P [X 0] = P X 1 E[ X ]. Proof. By smpl applcaton of Markov's nqualty. 2 Th nxt lmma gvs th p.m. transformaton whch th nsprng da of ths papr suggstd n [CB97b], as wll as th man tool of all th proofs. Lmma A.2 Lt X t b a famly of r.v.'s on h; F; P and P t a famly of p.m.'s on h; F dnd as P 1 = P and P t+1 = t P t X t wth normalzng 1= t = E Pt [X t ] thn X t s a famly of r.v.'s on h; F; P t and E P hy Xt = Y E Pt [X t ] Proof. By dnton of P t s asy to chck that X t ar r.v.'s dnd also on h; F; P t, thn E P hy Xt = E P Y Pt+1 t P t = E P PT +1 P 1 Y 1=t = Y E Pt [X t ] Lmma A.3 Lt X b a r.v.on h; F; P such that x X x + and E[X] = < 1, thn E X +2 2 =8 Proof. Assum that = 0, thn by convxty of th xponntal functon E X x + x? x (x+) = g(u) whr u = and g(u) =?pu + log(1? p + p u ) wth p =?x=. Is asy to vrfy that g(0) = g 0 (0) = 0 and that g 00 (u) 1=4. Hnc by Taylor's xpanson, for sutabl, g(u) = g(0) + ug 0 (0) + u2 2 g00 () u2 8 If now 6= 0, X? has 0 man and by th prvous nqualty 2? E[ X ] = E[ (X?) ] 2 2 =8 2 12

Rfrncs [BCP] [CB97a] [CB97b] [FS96a] [FS96b] A. Brton, P. Campadll, and M. Parod. A boostng algorthm for rgrsson. [To appar n ICANN'97]. N. Csa-Banch. A boostng algorthm for rgrsson. [Unpublshd manuscrpt], 17 Jun 1997. N. Csa-Banch. Concntraton of masur for sums of dpndnt random varabls. [Unpublshd manuscrpt], 6 Jun 1997. Y. Frund and R. E. Schapr. Exprmnts wth a nw boostng algorthm. In Proc. 13th Intrnatonal Confrnc on Machn Larnng, pags 148{146. Morgan Kaufmann, 1996. Y. Frund and R. E. Schapr. Gam thory, on-ln prdcton and boostng. In Proc. 9th Annu. Conf. on Comput. Larnng Thory, pags 325{332. ACM Prss, Nw York, NY, 1996. [FS97] Y. Frund and R. E. Schapr. A dcson-thortc gnralzaton of on-ln larnng and an applcaton to boostng. Journal of Computr and Systm Scncs, 55(1):119{139, August 1997. [Hof63] [LW94] W. Hodng. Probablty nqualts for sums of boundd random varabls. Journal of th Amrcan Statstal Assocaton, 58:13{30, 1963. N. Lttlston and M. K. Warmuth. Th wghtd majorty algorthm. Informaton and Computaton, 108(2):212{261, 1 Fbruary 1994. [SFBL97] R. E. Schapr, Y. Frund, P. Bartltt, and W. Sun L. Boostng th margn: a nw xplanaton for th ctvnss of votng mthods. In Proc. 14th Intrnatonal Confrnc on Machn Larnng, pags 322{330. Morgan Kaufmann, 1997. [VC71] V. N. Vapnk and A. Y. Chrvonnks. On th unform convrgnc of rlatv frquncs of vnts to thr probablts. Thory of Probablty and ts Applcatons, 16(2):264{280, 1971. [Vov] V. Vovk. Drandomzng stochastc prdcton stratgs. [To appar n COLT'97]. 13