Entropy Regularized LPBoost

Size: px

Start display at page:

Download "Entropy Regularized LPBoost"

Clara Hunter
5 years ago
Views:

1 Entropy Regularized LPBoost Manfred K. Warmuth Karen Glocer S.V.N. Vishwanathan (pretty slides from Gunnar Rätsch) Updated: October 13, 2008 M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

2 Protocol of Boosting [FS97] Maintain distribution on N ±1 labeled examples At iteration t = 1,..., T : - Receive a weak hypothesis h t - Update d t 1 to d t - put more weights on hard examples Output a convex combination of the weak hypotheses T t=1 w th t (x) Two sets of weights: - distribution on d on examples - distribution on w on hypotheses M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

3 Setup for Boosting 11 apples (examples) labeled +1 if natural and -1 if artificial want to classify the apples M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

4 Setup for Boosting 11 apples (examples) labeled +1 if natural and -1 if artificial want to classify the apples M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

5 Setup for Boosting 11 apples (examples) labeled +1 if natural and -1 if artificial want to classify the apples M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

6 Weak hypothesis weak hypotheses are decision stumps along the two features examples = rows weak hypotheses = possible columns M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

7 Weak hypothesis weak hypotheses are decision stumps along the two features examples = rows weak hypotheses = possible columns M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

8 Examples and Hypotheses Examples Labels h 1 : redness mistake mistake M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

9 Boosting: 1st Iteration First hypothesis: 2 error: 11 N dn 0 I(h 1 (x n ) y n ) n=1 9 edge: 22 N y n h(x n ) }{{} n=1 goodness on ex. n d n } {{ } average goodness = 1 2 error M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

10 Boosting: 1st Iteration First hypothesis: 2 error: 11 N dn 0 I(h 1 (x n ) y n ) n=1 9 edge: 22 N y n h(x n ) }{{} n=1 goodness on ex. n d n } {{ } average goodness = 1 2 error M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

11 Before 2nd Iteration Hard examples high weight M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

12 Boosting: 2nd Hypothesis Pick hypotheses with high edge M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

13 Update Distribution After update: edges of all chosen hypotheses should be small M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

14 Boosting: 3nd Hypothesis M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

15 Boosting: 4th Hypothesis M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

16 All Hypotheses M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

17 Decision: f w (x) = T t=1 w th t (x) > 0? M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

18 Edge vs. margin [Br99] Edge Measurement of goodness of a hypothesis w.r.t. a distribution Edge of a hypothesis h for a distribution d on the examples N y n h(x n ) d }{{} n d P N n=1 goodness on ex. n } {{ } average goodness Margin Measure of confidence in prediction for a hypothesis weighting Margin of example n for current hypothesis weighting w y n T t=1 h t (x n ) w t w P T M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

19 Edge vs. margin [Br99] Edge Measurement of goodness of a hypothesis w.r.t. a distribution Edge of a hypothesis h for a distribution d on the examples N y n h(x n ) d }{{} n d P N n=1 goodness on ex. n } {{ } average goodness Margin Measure of confidence in prediction for a hypothesis weighting Margin of example n for current hypothesis weighting w y n T t=1 h t (x n ) w t w P T M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

20 Objectives Edge Margin Edges of past hypotheses should be small after update Minimize maximum edge of past hypotheses Choose convex combination of weak hypotheses that maximizes the minimum margin SVN Boosting Which margin? 2-norm 1-norm Connection between objectives? M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

21 Edge vs. margin min d S N max q=1,2,...,t 1 = max w S t 1 Linear Programming duality N y n h q (x n )d n n=1 }{{} edge of hypothesis q t 1 min n=1,2,...,n q=1 y n h q (x n ) w q }{{} margin of example n M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

22 Min max thm for the inseparable case Slack variables in w domain = capping in d domain ( t ) max min u w S t n q w q + ψ n 1,ψ 0 n=1,2,...,n ν q=1 }{{} margin of example n = min max u q d d S N,d 1 ν 1 q=1,2,...,t }{{} edge of hypothesis q N n=1 ψ n Notation: u q n = y n h q (x n ) M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

23 Boosting = greedy method for increasing margin Converges to optimum marging w.r.t. all hypotheses M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

24 Assumption on next weak hypothesis For current weighting of examples, oracle returns hypothesis of edge g Goal For given ɛ, produce convex combination of weak hypotheses with soft margin g ɛ Number of iterations O( ln n/ν ɛ 2 ) M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

25 LPBoost [GS98,RSS+00,DBST02] Choose distribution that minimizes the maximum edge via LP min P n dn=1,d 1 ν 1 max q=1,2,...,t u q d Good practical boosting algorithm All weight is put on examples with minimum soft margin Brittle: iteration bound can be linear in N on malign artificial data sets [WGR07] M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

26 Entropy Regularized LPBoost min P n dn=1,d 1 ν 1 max q=1,2,...,t u q d + 1 η (d, d0 ) d n = exp η soft margin of example n Z soft min Within ɛ of maximum soft margin in O( ) iterations log n/ν ɛ 2 Above form of weights first appeared in ν-arc algorithm [RSS+00] M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

27 The effect of entropy regularization Different distribution on the examples LPBoost: lots of zeros ERLPBoost: smoother distribution M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

28 AdaBoost [FS97] d t n := d n t 1 exp( w t un) t n d t 1 n exp( w t un t ), where w t s.t. n d t 1 n exp( w u t n ) is minimized Easy to implement Gets within half of the optimal hard margin but only in the limit [RSD07] M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

29 Corrective versus totally corrective Processing last hypothesis versus all past hypotheses Corrective AdaBoost LogitBoost AdaBoost* SS,Colt08 Totally Corrective LPBoost TotalBoost SoftBoost ERLPBoost M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

30 Myths about boosting LPBoost does the trick in practice most of the time For safety, add relative entropy regularization Corrective algs Sometimes easy to code Fast per iteration Totally corrective algs Smaller number of iterations Nevertheless faster overall time Weak versus strong oracle makes a big difference weak: return hypothesis of edge larger than some guarantee g strong: return hypothesis of maximum edge M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

31 Soft margin objective vs. the number of iterations on a single run for the Banana data set with ɛ = 0.01 and ν/n = 0.1. For ERLPBoost, η = 2 ɛ log N ν. LPBoost indistinguishable from ERLPBoost SoftBoost s margin begins increasing much later than the others Corrective alg. converges more slowly than totally corrective M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

32 Corrective vs. Totally Corrective Titanic Ringnorm Results for a single run of each algorithm Margin vs. time Titanic is the smallest dataset we used Ringnorm is the largest dataset we used M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

33 Conclusion Adding relative entropy regularization of LPBoost leads to good boosting alg. Boosting is instantiation of MaxEnt and MinxEnt principles [Jaines 57,Kullback 59] Is sparsity neccessary for good generalization or is relative entropy reguarization sufficient? M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

34 From AdaBoost to SoftBoost AdaBoost (as interpreted in [KW99,La99]) Primal: Dual: min d (d, d t 1 ) s.t. d u t 1 = 0, d 1 = 1 Achieves half of optimum hard margin in the limit AdaBoost Primal: min d (d, d t 1 ) s.t. d u t 1 γ t 1, d 1 = 1 max ln w n d n t 1 exp(un t 1 w t 1 ) s.t. w 0 Dual: max w [RW05] ln n d n t 1 exp(un t 1 w t 1 ) γ t 1 w 1 s.t. w 0 where edgebound γ t is adjusted downward by a heuristic Good iteration bound for reaching optimum hard margin M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

35 SoftBoost Primal: min (d, d 0 ) d s.t. d 1 = 1, d 11 ν d u q γ t 1, 1 q t 1 Dual: [WGR07] min ln d 0 n exp( η t 1 unw q q w,ψ n q=1 ηψ n ) 1 ψ ν 1 γ t 1 w 1 s.t. w 0, ψ 0 where edgebound γ t 1 is adjusted downward by a heuristic ERLPBoost Primal: min d,γ γ + 1 η (d, d0 ) s.t. d 1 = 1, d 11 ν d u q γ, 1 q t 1 Dual: [WGV08] min 1 ln d 0 w,ψ η n exp( η t 1 unw q q n q=1 ηψ n ) 1 ψ ν 1 s.t. w 0, w 1 = 1, ψ 0 where for the iteration bound η is fixed to max( 2 ɛ ln N ν, 1 2 ) M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October 13, / 1

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62

Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from