Totally Corrective Boosting Algorithms that Maximize the Margin
|
|
- Buck Foster
- 5 years ago
- Views:
Transcription
1 Totally Corrective Boosting Algorithms that Maximize the Margin Manfred K. Warmuth 1 Jun Liao 1 Gunnar Rätsch 2 1 University of California, Santa Cruz 2 Friedrich Miescher Laboratory, Tübingen, Germany Last update: March 2, 2007 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
2 Outline 1 Basics 2 Large margins 3 Games 4 Convergence 5 Illustrative Experiments Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
3 Outline Basics 1 Basics 2 Large margins 3 Games 4 Convergence 5 Illustrative Experiments Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
4 Protocol of Boosting Basics Maintain a distribution d t on the examples At iteration t = 1,..., T : 1 Receive a weak hypothesis h t 2 Update d t to d t+1, put more weights on hard examples Output a convex combination of the weak hypotheses f α (x) = T α t h t (x) t=1 [Freund & Schapire, 1995] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
5 Protocol of Boosting Basics Maintain a distribution d t on the examples At iteration t = 1,..., T : 1 Receive a weak hypothesis h t 2 Update d t to d t+1, put more weights on hard examples Output a convex combination of the weak hypotheses f α (x) = T α t h t (x) t=1 [Freund & Schapire, 1995] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
6 Protocol of Boosting Basics Maintain a distribution d t on the examples At iteration t = 1,..., T : 1 Receive a weak hypothesis h t 2 Update d t to d t+1, put more weights on hard examples Output a convex combination of the weak hypotheses f α (x) = T α t h t (x) t=1 [Freund & Schapire, 1995] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
7 Boosting: 1st Iteration Basics First hypothesis: 2 Error rate: 11 N ɛ t = dni(h t t (x n ) = y n ) n=1 Edge: 9 γ t = 22 N dny t n h t (x n ) n=1 = 1 2ɛ t Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
8 Boosting: 1st Iteration Basics First hypothesis: 2 Error rate: 11 N ɛ t = dni(h t t (x n ) = y n ) n=1 Edge: 9 γ t = 22 N dny t n h t (x n ) n=1 = 1 2ɛ t Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
9 Update Distribution Basics Misclassified examples Increased weights After update: Error rate: ɛ(h t, d t+1 ) = 1 2 Edge: γ(h t, d t+1 ) = 0 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
10 Update Distribution Basics Misclassified examples Increased weights After update: Error rate: ɛ(h t, d t+1 ) = 1 2 Edge: γ(h t, d t+1 ) = 0 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
11 Basics Ada-Boost as Entropy Projection Minimize relative entropy to last distribution subject to constraint where min (d, d t ) d N s.t. d n y n h t (x n ) = 0 n=1 d P N (d, d t ) = N n=1 d n ln dn and dn t P N is the N dimensional probability simplex [Lafferty, 1999; Kivinen & Warmuth, 1999] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
12 Before 2nd Iteration Basics Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
13 Basics Boosting: 2nd Hypothesis AdaBoost assumption: Edge γ > ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
14 Update Distribution Basics Edge γ = 0 AdaBoost update sets edge of last hypothesis to 0 Number of iterations: 2 ln N ν 2 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
15 Which constraints? Basics Corrective - Ada-Boost: Single constraint min d P N (d, d t ) s.t. N d n y n h t (x n ) = 0 n=1 for n = 1,..., t Totally corrective: One constraint per past weak hypothesis min (d, d 1 ) d P N s.t. N d n y n h q (x n ) = 0 n=1 for q = 1,..., t [Lafferty, 1999; Kivinen & Warmuth, 1999] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
16 Which constraints? Basics Corrective - Ada-Boost: Single constraint min d P N (d, d t ) s.t. N d n y n h t (x n ) = 0 n=1 for n = 1,..., t Totally corrective: One constraint per past weak hypothesis min (d, d 1 ) d P N s.t. N d n y n h q (x n ) = 0 n=1 for q = 1,..., t [Lafferty, 1999; Kivinen & Warmuth, 1999] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
17 Basics Boosting: 3nd Hypothesis Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
18 Basics Boosting: 4th Hypothesis Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
19 All Hypotheses Basics Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
20 Basics Decision: f α (x) = T t=1 α th t (x) > 0? Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
21 Outline Large margins 1 Basics 2 Large margins 3 Games 4 Convergence 5 Illustrative Experiments Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
22 Large margins Large margins in addition to correct classification Margin of the combined hypothesis f α for example (x n, y n ) ρ n (α) = y n f α (x n ) T = y n α t h t (x n ) (α P T ) t=1 Margin of set of examples is minimum over examples ρ(α) := min ρ n (α) n [Freund, Schapire, Bartlett & Lee, 1998] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
23 Large margins Large Margin and Linear Separation Input space X Feature space F Φ Linear separation in F is nonlinear separation in X h 1 (x) Φ(x) = h 2 (x). H = {h 1, h 2,...} [Mangasarian, 1999; G.R., Mika, Schölkopf & Müller, 2002] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
24 Margin vs. edge Large margins Margin Measure for confidence in prediction for a hypothesis weighting Margin of example n for current hypothesis weighting α T ρ n (α) = y n f α (x n ) = y n α t h t (x n ) α P T Edge Measurement of goodness of a hypothesis w.r.t. a distribution Edge of a hypothesis h for a distribution d on the examples N γ h (d) = d n y n h(x n ) d P N What is the connection? n=1 t=1 [Breiman, 1999] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
25 Margin vs. edge Large margins Margin Measure for confidence in prediction for a hypothesis weighting Margin of example n for current hypothesis weighting α T ρ n (α) = y n f α (x n ) = y n α t h t (x n ) α P T Edge Measurement of goodness of a hypothesis w.r.t. a distribution Edge of a hypothesis h for a distribution d on the examples N γ h (d) = d n y n h(x n ) d P N What is the connection? n=1 t=1 [Breiman, 1999] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
26 Margin vs. edge Large margins Margin Measure for confidence in prediction for a hypothesis weighting Margin of example n for current hypothesis weighting α T ρ n (α) = y n f α (x n ) = y n α t h t (x n ) α P T Edge Measurement of goodness of a hypothesis w.r.t. a distribution Edge of a hypothesis h for a distribution d on the examples N γ h (d) = d n y n h(x n ) d P N What is the connection? n=1 t=1 [Breiman, 1999] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
27 Large margins von Neumann s Minimax-Theorem Set of examples S = {(x 1, y 1 ),..., (x N, y N )} and hypotheses set H t = {h 1,..., h t }, minimum edge: γ t =min maximum margin: ρ t =max max d P N h H t min α P t n edge of h {}}{ γ h (d) } {{ } edge of H t margin of (x n, y n) {}}{ y n f α (x n ) } {{ } margin of S Duality: γ t =ρ t [von Neumann, 1928] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
28 Large margins von Neumann s Minimax-Theorem Set of examples S = {(x 1, y 1 ),..., (x N, y N )} and hypotheses set H t = {h 1,..., h t }, minimum edge: γ t =min d P N maximum margin: ρ t =max t max q=1 { edge of h q }} { N n=1 d n y n h q (x n ) } {{ } edge of H t min y n α P t n q=1 Duality: γ t =ρ t margin of (x n, y n) {}}{ t α t h q (x n ) } {{ } margin of S Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
29 Outline Games 1 Basics 2 Large margins 3 Games 4 Convergence 5 Illustrative Experiments Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
30 Games Two-player Zero Sum Game Rock, Paper, Scissors game column player R P S α 1 α 2 α 3 row player R d P d S d gain matrix Single row is pure strategy of row player and d is mixed strategy Single column is pure strategy of column player and α is mixed strategy Row player minimizes Column player maximizes payoff = d T M α = i,j d im i,j α j [Freund and Schapire, 1997] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
31 Optimum Strategy Games Min-max theorem: R P S α 1 α 2 α R d P d S d min max d α dt Mα = min max d T Me j d j = max min d T Mα = max min e i Mα α d α i = value of the game ( 0 in example ) e j is pure strategy Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
32 Games Connection to Boosting? Rows are the examples Columns the weak hypothesis M i,j = h j (x i )y i Row sum: margin of example Column sum: edge of weak hypothesis Value of game: γ = ρ Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
33 Games New column added: boosting R P S α 1 α 2 α 3 α 4 margin R d P d S d edge Value of game increases from 0 to.11 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
34 Games Row added: on-line learning R P S α 1 α 2 α 3 margin R d P d S d d edge Value of game decreases from 0 to -.11 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
35 Games Boosting: maximize margin incrementally α1 1 d1 1 0 d2 1 1 d3 1-1 α 2 1 α 2 2 d d d α 3 1 α 3 2 α 3 3 d d d iteration 1 iteration 2 iteration 3 In each iteration solve optimization problem to update d Column player adds new column - weak hypothesis Some assumptions will be needed on the edge of the added hypothesis Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
36 Value non-decreasing Games γ, ρ : edge/margin for all hypotheses Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
37 Duality gap Games For any non-optimal d P N and α P t, γ(d) γ t = ρ t ρ(α) Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
38 Games How Large is the Maximal Margin? Assumptions on Weak learner For any distribution d on the examples, the weak learner returns a hypothesis h with edge γ h (d) at least g. Best case: g = ρ = γ [Breiman, 1999; Bennett et al.; G.R. et al., 2001; Rudin et al., 2004] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
39 Games How Large is the Maximal Margin? Assumptions on Weak learner For any distribution d on the examples, the weak learner returns a hypothesis h with edge γ h (d) at least g. Best case: g = ρ = γ Implication from Minimax Theorem There exists α P N, such that ρ(α) g [Breiman, 1999; Bennett et al.; G.R. et al., 2001; Rudin et al., 2004] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
40 Games How Large is the Maximal Margin? Assumptions on Weak learner For any distribution d on the examples, the weak learner returns a hypothesis h with edge γ h (d) at least g. Best case: g = ρ = γ Implication from Minimax Theorem There exists α P N, such that ρ(α) g Idea to iteratively solve LP: LPBoost Add best hypothesis h = argmax γ h (d t ) to H t and resolve d t+1 = argmin max γ h(d) d P N h H t [Breiman, 1999; Bennett et al.; G.R. et al., 2001; Rudin et al., 2004] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
41 Convergence? Games LPBoost? No iteration bounds known AdaBoost? May oscillate Does not find maximizing α (counter examples) But there some guarantees: ρ(α t ) 0 after 2 ln N/g 2 iterations ρ(α t ) g/2 in the limit [Breiman, 1999; Bennett et al.; G.R. et al., 2001; Rudin et al., 2004] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
42 Games How to maximize the margin? Modify AdaBoost for maximizing margin Arc-GV asymptotically maximizes the margin quite slow, no convergence rates LPBoost uses a Linear Programming solver Often very fast in practice, but no converge rates AdaBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Slow in practice, i.e. not faster than theory predicts TotalBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Fast in practice Combination of benefits [G.R. & Warmuth, 2004; Warmuth et al., 2006] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
43 Games How to maximize the margin? Modify AdaBoost for maximizing margin Arc-GV asymptotically maximizes the margin quite slow, no convergence rates LPBoost uses a Linear Programming solver Often very fast in practice, but no converge rates AdaBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Slow in practice, i.e. not faster than theory predicts TotalBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Fast in practice Combination of benefits [G.R. & Warmuth, 2004; Warmuth et al., 2006] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
44 Games How to maximize the margin? Modify AdaBoost for maximizing margin Arc-GV asymptotically maximizes the margin quite slow, no convergence rates LPBoost uses a Linear Programming solver Often very fast in practice, but no converge rates AdaBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Slow in practice, i.e. not faster than theory predicts TotalBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Fast in practice Combination of benefits [G.R. & Warmuth, 2004; Warmuth et al., 2006] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
45 Games How to maximize the margin? Modify AdaBoost for maximizing margin Arc-GV asymptotically maximizes the margin quite slow, no convergence rates LPBoost uses a Linear Programming solver Often very fast in practice, but no converge rates AdaBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Slow in practice, i.e. not faster than theory predicts TotalBoost ν requires 2 log(n) ν 2 iterations to get ρ t [ρ ν, ρ ] Fast in practice Combination of benefits [G.R. & Warmuth, 2004; Warmuth et al., 2006] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
46 Outline Convergence 1 Basics 2 Large margins 3 Games 4 Convergence 5 Illustrative Experiments Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
47 Convergence Want margin g ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
48 Convergence Want margin g ν Assumption: γ t g Estimate of target: γ t = (min t q=1 γ q ) ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
49 Convergence Idea: Projections to ˆγ t instead of 0 Corrective: Single constraint min d P N (d, d t ) AdaBoost ν N s.t. d n y n h t (x n ) ˆγ t for r = 1,..., t n=1 Totally corrective: One constraint per past weak hypothesis min (d, d 1 ) TotalBoost ν d P N N s.t. d n y n h q (x n ) ˆγ t for q = 1,..., t n=1 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
50 Convergence TotalBoost ν 1 Input: S = (x 1, y 1 ),..., (x N, y N ), desired accuracy ν 2 Initialize: d 1 n = 1/N for all n = 1... N 3 Do for t = 1,... 1 Train classifier on {S, d t } and obtain hypothesis h t : x [ 1, 1] and let u t i = y i h t (x i ) 2 Calculate the edge γ t of h t : γ t = d t u t 3 Set γ t = ( min t γ q) ν and solve q=1 d t+1 = argmin (d, d 1 ) {d P N d u q bγ t, for 1 q t}=c t 4 If above infeasible or d t+1 contains a zero then T = t and break 4 Output: f α (x) = T t=1 α th t (x), where the coefficients α t maximize margin over hypotheses set {h 1,..., h T }. Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
51 Convergence TotalBoost ν 1 Input: S = (x 1, y 1 ),..., (x N, y N ), desired accuracy ν 2 Initialize: d 1 n = 1/N for all n = 1... N 3 Do for t = 1,... 1 Train classifier on {S, d t } and obtain hypothesis h t : x [ 1, 1] and let u t i = y i h t (x i ) 2 Calculate the edge γ t of h t : γ t = d t u t 3 Set γ t = ( min t γ q) ν and solve q=1 d t+1 = argmin (d, d 1 ) {d P N d u q bγ t, for 1 q t}=c t 4 If above infeasible or d t+1 contains a zero then T = t and break 4 Output: f α (x) = T t=1 α th t (x), where the coefficients α t maximize margin over hypotheses set {h 1,..., h T }. Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
52 Convergence TotalBoost ν 1 Input: S = (x 1, y 1 ),..., (x N, y N ), desired accuracy ν 2 Initialize: d 1 n = 1/N for all n = 1... N 3 Do for t = 1,... 1 Train classifier on {S, d t } and obtain hypothesis h t : x [ 1, 1] and let u t i = y i h t (x i ) 2 Calculate the edge γ t of h t : γ t = d t u t 3 Set γ t = ( min t γ q) ν and solve q=1 doptimization t+1 = Problem argmin (d, d 1 ) {d P N d u q bγ t, for 1 q t}=c t 4 Ifd t+1 above= infeasible argminor (d, d t+1 dcontains 1 ) a zero d C t then T = t and break with C t := {d P N d u q γ t, for 1 q t} 4 Output: f α (x) = T t=1 α th t (x), where the coefficients α t maximize margin over hypotheses set {h 1,..., h T }. Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
53 Convergence Effect of entropy regularization High weight given to hard examples Softmin of margins dn t+1 exp( y n f α t+1(x n ) }{{} ) margin of ex. n α t+1 are current coefficients of hypotheses Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
54 Convergence Lower Bounding the Progress Theorem For d t, d t+1 P N and u t [ 1, 1] N, if (d t+1, d t ) finite and d t+1 u t d t u t then (d t+1, d t ) > (dt+1 u t d t u t ) 2 2 d t u t = γ t d t+1 u t γ t γ t ν Thus d t u t d t+1 u t ν and (d t+1, d t ) > ν2 2 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
55 Convergence Generalized Pythagorean Theorem C t = {d P N d u q γ t, 1 q t}, C 0 = P N, C t C t 1 d t is projection of d 1 onto C t 1 at iteration t 1 d t = argmin d C t 1 (d, d 1 ) (d t+1, d 1 ) (d t, d 1 ) + (d t+1, d t ) [Herbster & Warmuth, 2001] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
56 Sketch of Proof Convergence 1: (d 2, d 1 ) (d 1, d 1 ) (d 2, d 1 ) > ν2 2 2: (d 3, d 1 ) (d 2, d 1 ) (d 3, d 2 ) > ν2 2 t: (d t+1, d 1 ) (d t, d 1 ) (d t+1, d t ) > ν2 2 T 2: (d T 1, d 1 ) (d T 2, d 1 ) (d T 1, d T 2 ) > ν2 2 T 1: (d T, d 1 ) (d T 1, d 1 ) (d T, d T 1 ) > ν2 2 [Warmuth, Liao & G.R., 2006] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
57 Cancellation Convergence 0 {}}{ 1: (d 2, d 1 ) (d 1, d 1 ) (d 2, d 1 ) > ν2 2 2: (d 3, d 1 ) (d 2, d 1 ) (d 3, d 2 ) > ν2 2 t: (d t+1, d 1 ) (d t, d 1 ) (d t+1, d t ) > ν2 2 T 2: (d T 1, d 1 ) (d T 2, d 1 ) (d T 1, d T 2 ) > ν2 2 T 1: (d T, d 1 ) (d }{{} T 1, d 1 ) (d T, d T 1 ) > ν2 2 ln N Therefore, T 2 ln N ν 2 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
58 Overview Convergence [Long & Wu, 2002] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
59 Convergence Iteration Bounds for Other Variants Using the same techniques, we can prove iteration bound for: TotalBoost ν which optimizes the divergence (d, d t ) to the last distribution d t TotalBoost ν which uses the binary relative entropy 2 (d, d 1 ) or 2 (d, d t ) as the divergence The variant of AdaBoost ν which terminates when γ t < γ t Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
60 Convergence TotalBoost ν which optimizes (d, d t ) d t+1 is projection of d t onto C t at iteration t d t+1 = argmin d C t (d, d t ) (d T, d t ) (d T, d t+1 ) + (d t+1, d t ) Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
61 Sketch of Proof Convergence 1: (d T, d 1 ) (d T, d 2 ) (d 2, d 1 ) > ν2 2 2: (d T, d 2 ) (d T, d 3 ) (d 3, d 2 ) > ν2 2 t: (d T, d t ) (d T, d t+1 ) (d t+1, d t ) > ν2 2 T 2: (d T, d T 2 ) (d T, d T 1 ) (d T 1, d T 2 ) > ν2 2 T 1: (d T, d T 1 ) (d T, d T ) (d T, d T 1 ) > ν2 2 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
62 Cancellation Convergence ln N {}}{ 1: (d T, d 1 ) (d T, d 2 ) (d 2, d 1 ) > ν2 2 2: (d T, d 2 ) (d T, d 3 ) (d 3, d 2 ) > ν2 2 t: (d T, d t ) (d T, d t+1 ) (d t+1, d t ) > ν2 2 T 2: (d T, d T 2 ) (d T, d T 1 ) (d T 1, d T 2 ) > ν2 2 T 1: (d T, d T 1 ) (d T, d T ) (d }{{} T, d T 1 ) > ν2 2 0 Therefore, T 2 ln N ν 2 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
63 Convergence TotalBoost ν which optimizes (d, d t ) Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
64 Outline Illustrative Experiments 1 Basics 2 Large margins 3 Games 4 Convergence 5 Illustrative Experiments [Warmuth, Liao & G.R., 2006] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
65 Illustrative Experiments Illustrative Experiments Cox-1 dataset from Telik Inc. Relatively small drug-design data set 125 binary labeled examples 3888 binary features Compare convergence of margin versus number of iterations [Warmuth, Liao & G.R., 2006] Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
66 Illustrative Experiments Illustrative Experiments Cox-1 (ν = 0.01) AdaBoost ν LPBoost TotalBoost ν Results Corrective algorithms very slow LPBoost & TotalBoost ν need few iterations Initial speed crucially depends on ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
67 Illustrative Experiments Illustrative Experiments Cox-1 (ν = 0.01) AdaBoost ν LPBoost TotalBoost ν Results Corrective algorithms very slow LPBoost & TotalBoost ν need few iterations Initial speed crucially depends on ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
68 Illustrative Experiments Illustrative Experiments Cox-1 (ν = 0.01) AdaBoost ν LPBoost TotalBoost ν Results Corrective algorithms very slow LPBoost & TotalBoost ν need few iterations Initial speed crucially depends on ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
69 Illustrative Experiments Illustrative Experiments Cox-1 (ν = 0.01) AdaBoost ν LPBoost TotalBoost ν Results Corrective algorithms very slow LPBoost & TotalBoost ν need few iterations Initial speed crucially depends on ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
70 Illustrative Experiments Illustrative Experiments Cox-1 (ν = 0.01) AdaBoost ν LPBoost TotalBoost ν Results Corrective algorithms very slow LPBoost & TotalBoost ν need few iterations Initial speed crucially depends on ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
71 Illustrative Experiments Illustrative Experiments Cox-1 (ν = 0.01) AdaBoost ν LPBoost TotalBoost ν Results Corrective algorithms very slow LPBoost & TotalBoost ν need few iterations Initial speed crucially depends on ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
72 Illustrative Experiments LPBoost May Perform Much Worse Than TotalBoost Identified cases where LPBoost converges considerably slower than TotalBoost ν Dataset is a series of artificial datasets of 1000 examples with varying number of features created as follows: First generated N 1 random ±1-valued features x 1,..., x N1 and set the label of the examples as y = sign(x 1 + x 2 + x 3 + x 4 + x 5 ) Then duplicated each features N 2 times, perturbed the features by Gaussian noise with σ = 0.1, and clipped the feature values so that they lie in the interval [-1,1] Considered different N 1, N 2, the total number of features is N 2 N 1 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
73 Illustrative Experiments LPBoost performs worse for high dimensional data with many redundant features LPBoost vs. TotalBoostν on two 100,000 dimensional datasets: [left] many redundant features (N1 = 1, 000, N2 = 100) and [right] independent features (N1 = 100, 000, N2 = 1). Show margin vs. number of iterations Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
74 Illustrative Experiments Bound Not True for LPBoost A counter example: Examples Hypothesis No Table 4.1: A toy dataset. Each row corresponds an example and each column corresponds a hypothesis. +: correct prediction -: incorrect prediction If the prediction of an example by a hypothesis is correct, it is displayed as + otherwise displayed as TotalBoost -. averages hypothesis 1 and 6 (i.e. 2 iterations) to achieve maximum margin 0 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
75 Illustrative Experiments Bound Not True for LPBoost Distribuition of Examples Hypothesis No. Iteration No / / / / / / / / / Selected Hypothesis No Table 4.2: The first five iterations of LPBoost. We show the distribution of examples and the selected Simplex-based hypothesis No. at LPBoost each iteration. uses 5 iterations ((N+1)/2 iterations) to achieve margin 0 Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
76 Illustrative Experiments Regularized LPBoost LPBoost makes edge constraints as tight as possible Picking solutions at the corners can lead to slow convergence. Interior points methods avoid corners Regularized LPBoost: pick solution that minimizes relative entropy to initial distribution. It is identical to TotalBoost ν but the latter algorithm uses a higher edge bound Open problem: find an example where all versions of LPBoost need O(N) iterations Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
77 Illustrative Experiments Algorithms for Feature Selection We test for each algorithm: Whether selected hypotheses are redundant Size of final hypothesis Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
78 Illustrative Experiments Redundancy of Selected Hypotheses Test which algorithm selects redundant hypotheses Dataset: Dataset 2 was created by expanding Rudin s dataset. Dataset 2 has 100 blocks of features. Each block contains one original feature and 99 mutations of this feature. Each mutation feature inverts the feature value of one randomly chosen example (with replacement). The mutation features are considered redundant hypotheses. A good algorithm would avoid repeatedly selecting hypotheses from the same block. Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
79 Illustrative Experiments Experiment on Redundancy of Selected Hypotheses Show number of selected blocks v.s. number of selected hypotheses Redundancy of selected hypotheses: AdaBoost > TotalBoost ν (ν=0.01) > LPBoost If ρ is known, TotalBoost g ν selects one hypothesis per block (not shown). Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
80 Illustrative Experiments Size of Final Hypothesis Show margins v.s. number of used hypotheses (nonzero weight) TotalBoost ν (ν=0.01) and LPBoost use a small number of hypotheses in final hypothesis Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
81 Summary Illustrative Experiments AdaBoost can be viewed as entropy projection TotalBoost projects based on all previous hypotheses Provably maximizes the margin Theory: as fast as AdaBoost ν Practice: much faster ( LPBoost) Versatile techniques for proving iteration bounds Experiments corroborate our theory Good for feature selection LPBoost may have problems of maximizing the margin Future: extension to the soft margin case Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
82 Illustrative Experiments Iteration Bound for Variant of AdaBoost ν Totally Corrective Boosting Algorithms that Maximize Last update: the Margin March 2, / 62
Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62
Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from
More informationBoosting Algorithms for Maximizing the Soft Margin
Boosting Algorithms for Maximizing the Soft Margin Manfred K. Warmuth Dept. of Engineering University of California Santa Cruz, CA, U.S.A. Karen Glocer Dept. of Engineering University of California Santa
More informationEntropy Regularized LPBoost
Entropy Regularized LPBoost Manfred K. Warmuth Karen Glocer S.V.N. Vishwanathan (pretty slides from Gunnar Rätsch) Updated: October 13, 2008 M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationMaximizing the Margin with Boosting
Maximizing the Margin with Boosting Gunnar Rätsch and Manfred K. Warmuth RSISE, Australian National University Canberra, ACT 000, Australia Gunnar.Raetsch@anu.edu.au University of California at Santa Cruz
More informationEntropy Regularized LPBoost
Entropy Regularized LPBoost Manfred K. Warmuth 1, Karen A. Glocer 1, and S.V.N. Vishwanathan 2 1 Computer Science Department University of California, Santa Cruz CA 95064, U.S.A {manfred,kag}@cse.ucsc.edu
More informationThe Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins
Journal of Machine Learning Research 0 0) 0 Submitted 0; Published 0 The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins Cynthia Rudin crudin@princetonedu, rudin@nyuedu Program in Applied
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationInteractive Feature Selection with
Chapter 6 Interactive Feature Selection with TotalBoost g ν We saw in the experimental section that the generalization performance of the corrective and totally corrective boosting algorithms is comparable.
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationOnline Kernel PCA with Entropic Matrix Updates
Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationPrecise Statements of Convergence for AdaBoost and arc-gv
Contemporary Mathematics Precise Statements of Convergence for AdaBoost and arc-gv Cynthia Rudin, Robert E Schapire, and Ingrid Daubechies We wish to dedicate this paper to Leo Breiman Abstract We present
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationMatrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection
Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection Koji Tsuda, Gunnar Rätsch and Manfred K. Warmuth Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationBoosting Methods for Regression
Machine Learning, 47, 153 00, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Boosting Methods for Regression NIGEL DUFFY nigeduff@cse.ucsc.edu DAVID HELMBOLD dph@cse.ucsc.edu Computer
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationMonte Carlo Theory as an Explanation of Bagging and Boosting
Monte Carlo Theory as an Explanation of Bagging and Boosting Roberto Esposito Università di Torino C.so Svizzera 185, Torino, Italy esposito@di.unito.it Lorenza Saitta Università del Piemonte Orientale
More informationLinear Programming Boosting via Column Generation
Linear Programming Boosting via Column Generation Ayhan Demiriz demira@rpi.edu Dept. of Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, Troy, NY 12180 USA Kristin P. Bennett
More informationAn Introduction to Boosting and Leveraging
An Introduction to Boosting and Leveraging Ron Meir 1 and Gunnar Rätsch 2 1 Department of Electrical Engineering, Technion, Haifa 32000, Israel rmeir@ee.technion.ac.il, http://www-ee.technion.ac.il/ rmeir
More informationBoosting. March 30, 2009
Boosting Peter Bühlmann buhlmann@stat.math.ethz.ch Seminar für Statistik ETH Zürich Zürich, CH-8092, Switzerland Bin Yu binyu@stat.berkeley.edu Department of Statistics University of California Berkeley,
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationBoosting Methods: Why They Can Be Useful for High-Dimensional Data
New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,
More informationReducing the Overfitting of AdaBoost by Controlling its Data Distribution Skewness
International Journal of Pattern Recognition and Artificial Intelligence c World Scientific Publishing Company Reducing the Overfitting of AdaBoost by Controlling its Data Distribution Skewness Yijun Sun,
More informationAdaptive Boosting of Neural Networks for Character Recognition
Adaptive Boosting of Neural Networks for Character Recognition Holger Schwenk Yoshua Bengio Dept. Informatique et Recherche Opérationnelle Université de Montréal, Montreal, Qc H3C-3J7, Canada fschwenk,bengioyg@iro.umontreal.ca
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationGeneralization Bounds
Generalization Bounds Here we consider the problem of learning from binary labels. We assume training data D = x 1, y 1,... x N, y N with y t being one of the two values 1 or 1. We will assume that these
More informationOpen Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers
JMLR: Workshop and Conference Proceedings vol 35:1 8, 014 Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers Balázs Kégl LAL/LRI, University
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationDeep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.
Deep Boosting Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Ensemble Methods in ML Combining several base classifiers
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationNovel Distance-Based SVM Kernels for Infinite Ensemble Learning
Novel Distance-Based SVM Kernels for Infinite Ensemble Learning Hsuan-Tien Lin and Ling Li htlin@caltech.edu, ling@caltech.edu Learning Systems Group, California Institute of Technology, USA Abstract Ensemble
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationAnalysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example
Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationEnsemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods
The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationBoosting: Algorithms and Applications
Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationContents of this Course Part I: Introduction to Boosting 1. Basic Issues, PAC Learning 2. The Idea of Boosting 3. AdaBoost s Behavior on the Sample Pa
An Introduction to Boosting Part I Gunnar Rätsch Friedrich Miescher Laboratory of the Max Planck Society http://www.fml.tuebingen.mpg.de/~raetsch Help with earlier presentations: Ron Meir, Klaus-R. Müller
More informationLogistic Regression, AdaBoost and Bregman Distances
Proceedings of the Thirteenth Annual Conference on ComputationalLearning Theory, Logistic Regression, AdaBoost and Bregman Distances Michael Collins AT&T Labs Research Shannon Laboratory 8 Park Avenue,
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationA ROBUST ENSEMBLE LEARNING USING ZERO-ONE LOSS FUNCTION
Journal of the Operations Research Society of Japan 008, Vol. 51, No. 1, 95-110 A ROBUST ENSEMBLE LEARNING USING ZERO-ONE LOSS FUNCTION Natsuki Sano Hideo Suzuki Masato Koda Central Research Institute
More informationOutline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual
Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how
More informationCOMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017
COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University BOOSTING Robert E. Schapire and Yoav
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationThe Boosting Approach to Machine Learning. Rob Schapire Princeton University. schapire
The Boosting Approach to Machine Learning Rob Schapire Princeton University www.cs.princeton.edu/ schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples
More informationIntroduction to Boosting and Joint Boosting
Introduction to Boosting and Learning Systems Group, Caltech 2005/04/26, Presentation in EE150 Boosting and Outline Introduction to Boosting 1 Introduction to Boosting Intuition of Boosting Adaptive Boosting
More informationGame Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin
Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij
More informationLeaving The Span Manfred K. Warmuth and Vishy Vishwanathan
Leaving The Span Manfred K. Warmuth and Vishy Vishwanathan UCSC and NICTA Talk at NYAS Conference, 10-27-06 Thanks to Dima and Jun 1 Let s keep it simple Linear Regression 10 8 6 4 2 0 2 4 6 8 8 6 4 2
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationWhat makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity
What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.
More informationBarrier Boosting. Abstract. 1 Introduction. 2 Boosting and Convex Programming
Barrier Boosting Gunnar Rätsch, Manfred Warmuth, Sebastian Mika, Takashi Onoda Ý, Steven Lemm and Klaus-Robert Müller GMD FIRST, Kekuléstr. 7, 12489 Berlin, Germany Computer Science Department, UC Santa
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationBoosting & Deep Learning
Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection
More informationZero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory.
Public Strategies Minimax Theorem and Nash Equilibria Appendix 2013 Public Strategies Minimax Theorem and Nash Equilibria Appendix Definition Definition A zero-sum game is a strategic game, in which for
More information