Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62
|
|
- Rachel Turner
- 5 years ago
- Views:
Transcription
1 Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from an Optimization Perspective Manfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research
2 Outline 1 Introduction to Boosting 2 What is Boosting? 3 Entropy Regularized LPBoost 4 Overview of Boosting algorithms 5 Conclusion and Open Problems Warmuth (UCSC) ICML 09 Boosting Tutorial 2 / 62
3 Outline Introduction to Boosting 1 Introduction to Boosting 2 What is Boosting? 3 Entropy Regularized LPBoost 4 Overview of Boosting algorithms 5 Conclusion and Open Problems Warmuth (UCSC) ICML 09 Boosting Tutorial 3 / 62
4 Introduction to Boosting Setup for Boosting [Giants of field: Schapire,Freund] examples: 11 apples +1 if artificial - 1 if natural goal: classification Warmuth (UCSC) ICML 09 Boosting Tutorial 4 / 62
5 Introduction to Boosting Setup for Boosting /-1 examples weight d n size feature separable feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 5 / 62
6 Introduction to Boosting Weak hypotheses feature weak hypotheses: decision stumps on two features one can t do it goal: find convex combination of weak hypotheses that classifies all feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 6 / 62
7 Introduction to Boosting Boosting: 1st iteration First hypothesis: 1 error: 11 9 edge: 11 low error = high edge feature edge = 1 2 error feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 7 / 62
8 Introduction to Boosting Update after 1st feature Misclassified examples increased weights After update edge of hypothesis decreased feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 8 / 62
9 Introduction to Boosting Before 2nd iteration 0.8 Hard examples high weight 0.6 feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 9 / 62
10 Introduction to Boosting Boosting: 2nd hypothesis Pick hypotheses with high (weighted) edge feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 10 / 62
11 Introduction to Boosting Update after 2nd After update edges of all past hypotheses should be small feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 11 / 62
12 3rd hypothesis Introduction to Boosting feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 12 / 62
13 Update after 3rd Introduction to Boosting feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 13 / 62
14 4th hypothesis Introduction to Boosting feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 14 / 62
15 Update after 4th Introduction to Boosting feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 15 / 62
16 Introduction to Boosting Final convex combination of all hypotheses Decision: T t=1 w th t (x) 0? feature feature 1 Positive total weight - Negative total weight Warmuth (UCSC) ICML 09 Boosting Tutorial 16 / 62
17 Introduction to Boosting Protocol of Boosting [FS97] Maintain distribution on N ±1 labeled examples At iteration t = 1,..., T : - Receive weak hypothesis h t of high edge - Update d t 1 to d t more weights on hard examples Output convex combination of the weak hypotheses T t=1 w t h t (x) Two sets of weights: - distribution d on examples - distribution w on hypotheses Warmuth (UCSC) ICML 09 Boosting Tutorial 17 / 62
18 Data representation Introduction to Boosting examples x n labels y n h 1 (x n ) u y n h t (x n ) := un t perfect +1 opposite -1 neutral Warmuth (UCSC) ICML 09 Boosting Tutorial 18 / 62
19 Edge vs. margin Introduction to Boosting [Br99] Edge of a hypothesis h t for a distribution d on the examples N n=1 accuracy of example {}}{ u t n d n }{{} weighted accuracy of hypothesis d P N Margin of example n for current hypothesis weighting w T t=1 accuracy of example {}}{ u t n w t }{{} weighted accuracy of example w P T Warmuth (UCSC) ICML 09 Boosting Tutorial 19 / 62
20 Edge vs. margin Introduction to Boosting [Br99] Edge of a hypothesis h t for a distribution d on the examples N n=1 accuracy of example {}}{ u t n d n }{{} weighted accuracy of hypothesis d P N Margin of example n for current hypothesis weighting w T t=1 accuracy of example {}}{ u t n w t }{{} weighted accuracy of example w P T Warmuth (UCSC) ICML 09 Boosting Tutorial 19 / 62
21 AdaBoost Introduction to Boosting Initialize t = 0 and d 0 n = 1 N For t = 1,..., T Get h t whose edge w.r.t current distribution is 1 2ɛ ( ) t Set w t = 1 ln 1 ɛ t 2 ɛ t Update distribution as follows d t n = ( T Final hypothesis: sgn t=1 w th t ( ) d n t 1 exp( w t un) t n d t 1 n exp( w t un t ) ) Warmuth (UCSC) ICML 09 Boosting Tutorial 20 / 62
22 Objectives Introduction to Boosting Edge Margin Edges of past hypotheses should be small after update Minimize maximum edge of past hypotheses Choose convex combination of weak hypotheses that maximizes the minimum margin 0.8 feature SVM Boosting Which margin? 2-norm (weights on examples) 1-norm (weights on base hypotheses) feature 1 Connection between objectives? Warmuth (UCSC) ICML 09 Boosting Tutorial 21 / 62
23 Edge vs. margin Introduction to Boosting min max edge = max min margin min d S N max u q d }{{} q=1,2,...,t 1 edge of hypothesis q = max w S t 1 min n=1,2,...,n t 1 un q w q q=1 }{{} margin of example n Linear Programming duality Warmuth (UCSC) ICML 09 Boosting Tutorial 22 / 62
24 Introduction to Boosting Boosting as zero-sum-game [FS97] Rock, Paper, Scissors game column player R P S w 1 w 2 w 3 row player R d P d S d gain matrix Single row is pure strategy of row player and d is mixed strategy Row player minimizes Column player maximizes payoff = d T U w = i,j d iu i,j w j Single column is pure strategy of column player and w is mixed strategy Warmuth (UCSC) ICML 09 Boosting Tutorial 23 / 62
25 Optimum strategy Introduction to Boosting Min-max theorem: e j is pure strategy R P S w 1 w 2 w R d P d S d min max d w dt Uw = min max d T Ue j d j = max min d T Uw = max min e i Uw w d w i = value of the game ( 0 in example ) Warmuth (UCSC) ICML 09 Boosting Tutorial 24 / 62
26 Introduction to Boosting Connection to Boosting? Rows are the examples Columns u q encode weak hypothesis h q Row sum: margin of example Column sum: edge of weak hypothesis Value of game: min max edge = max min margin Van Neumann s Minimax Theorem Warmuth (UCSC) ICML 09 Boosting Tutorial 25 / 62
27 Edges/margins Introduction to Boosting R P S w 1 w 2 w 3 margin R d P d min S d edge max value of game 0 Warmuth (UCSC) ICML 09 Boosting Tutorial 26 / 62
28 Introduction to Boosting New column added: boosting R P S w 1 w 2 w 3 w 4 margin R d P d min S d edge max Value of game increases from 0 to.11 Warmuth (UCSC) ICML 09 Boosting Tutorial 27 / 62
29 Introduction to Boosting Row added: on-line learning R P S w 1 w 2 w 3 margin R d P d min S d d edge max Value of game decreases from 0 to -.11 Warmuth (UCSC) ICML 09 Boosting Tutorial 28 / 62
30 Introduction to Boosting Boosting: maximize margin incrementally w1 1 d1 1 0 d2 1 1 d3 1-1 w 2 1 w 2 2 d d d w 3 1 w 3 2 w 3 3 d d d iteration 1 iteration 2 iteration 3 In each iteration solve optimization problem to update d Column player / oracle provides new hypothesis Boosting is column generation method in d domain and coordinate descent in w domain Warmuth (UCSC) ICML 09 Boosting Tutorial 29 / 62
31 Outline What is Boosting? 1 Introduction to Boosting 2 What is Boosting? 3 Entropy Regularized LPBoost 4 Overview of Boosting algorithms 5 Conclusion and Open Problems Warmuth (UCSC) ICML 09 Boosting Tutorial 30 / 62
32 Want small number of iterations Warmuth (UCSC) ICML 09 Boosting Tutorial 31 / 62 What is Boosting? Boosting = greedy method for increasing margin Converges to optimum margin w.r.t. all hypotheses
33 What is Boosting? Assumption on next weak hypothesis For current weighting of examples, oracle returns hypothesis of edge g Goal For given ɛ, produce convex combination of weak hypotheses with soft margin g ɛ Number of iterations O( log N ɛ 2 ) Warmuth (UCSC) ICML 09 Boosting Tutorial 32 / 62
34 Recall min max thm What is Boosting? min d S N max q=1,2,...,t = max w S t u q d }{{} edge of hypothesis q min n=1,2,...,n ( t ) un q w q q=1 }{{} margin of example n Warmuth (UCSC) ICML 09 Boosting Tutorial 33 / 62
35 What is Boosting? Visualizing the margin feature feature 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 34 / 62
36 What is Boosting? Min max thm - inseparable case Slack variables in w domain = capping in d domain min d S N,d 1 ν 1 max q=1,2,...,t = max w S t,ψ 0 u q d }{{} edge of hypothesis q min n=1,2,...,n ( t ) un q w q + ψ n 1 ν q=1 }{{} soft margin of example n N n=1 ψ n Warmuth (UCSC) ICML 09 Boosting Tutorial 35 / 62
37 What is Boosting? Visualizing the soft margin hypothesis ψ hypothesis 1 Warmuth (UCSC) ICML 09 Boosting Tutorial 36 / 62
38 LPBoost What is Boosting? Objective value P LP d 1 Choose distribution that minimizes the maximum edge of current hypotheses by solving: min P n dn=1,d 1 ν 1 max q=1,2,...,t u q d } {{ } PLP t All weight is put on examples with minimum soft margin Warmuth (UCSC) ICML 09 Boosting Tutorial 37 / 62
39 Outline Entropy Regularized LPBoost 1 Introduction to Boosting 2 What is Boosting? 3 Entropy Regularized LPBoost 4 Overview of Boosting algorithms 5 Conclusion and Open Problems Warmuth (UCSC) ICML 09 Boosting Tutorial 38 / 62
40 Entropy Regularized LPBoost Entropy Regularized LPBoost min P n dn=1,d 1 ν 1 max q=1,2,...,t u q d + 1 η (d, d0 ) d n = exp η soft margin of example n Z soft min Form of weights first in ν-arc algorithm [RSS+00] Regularization in d domain makes problem strongly convex Gradient of dual Lipschitz continuous in w [e.g. HL93,RW97] Warmuth (UCSC) ICML 09 Boosting Tutorial 39 / 62
41 Entropy Regularized LPBoost The effect of entropy regularization Different distribution on the examples feature feature feature 1 LPBoost: lots of zeros / brittle feature 1 ERLPBoost: smoother Warmuth (UCSC) ICML 09 Boosting Tutorial 40 / 62
42 Outline Overview of Boosting algorithms 1 Introduction to Boosting 2 What is Boosting? 3 Entropy Regularized LPBoost 4 Overview of Boosting algorithms 5 Conclusion and Open Problems Warmuth (UCSC) ICML 09 Boosting Tutorial 41 / 62
43 AdaBoost Overview of Boosting algorithms [FS97] d t n := where w t s.t. n d t 1 n exp( w u t n i.e. P n dt 1 n exp( w u t n ) w w=wt = n ut n d n t 1 exp( w t un) t n d t 1 n exp( w t un t ), ) is minimized d t 1 n exp( w t u t n) Pn dt 1 n exp( w t u t n ) = ut d t = 0 Easy to implement Adjusts distribution so that edge of last hypothesis is zero Gets within half of the optimal hard margin [RSD07] but only in the limit Warmuth (UCSC) ICML 09 Boosting Tutorial 42 / 62
44 Overview of Boosting algorithms Corrective versus totally corrective Processing last hypothesis versus all past hypotheses Corrective AdaBoost LogitBoost AdaBoost* SS,Colt08 Totally Corrective LPBoost TotalBoost SoftBoost ERLPBoost Warmuth (UCSC) ICML 09 Boosting Tutorial 43 / 62
45 Overview of Boosting algorithms From AdaBoost to ERLPBoost AdaBoost (as interpreted in [KW99,La99]) Primal: Dual: min d (d, d t 1 ) s.t. d u t = 0, d 1 = 1 Achieves half of optimum hard margin in the limit AdaBoost Primal: min d (d, d t 1 ) s.t. d u t γ t, d 1 = 1 max ln w n d n t 1 exp( ηunw t t ) s.t. w 0 Dual: max w [RW05] ln n d t 1 n exp( ηu t nw t ) γ t w 1 s.t. w 0 where edge bound γ t is adjusted downward by a heuristic Good iteration bound for reaching optimum hard margin Warmuth (UCSC) ICML 09 Boosting Tutorial 44 / 62
46 SoftBoost Primal: Overview of Boosting algorithms min (d, d 0 ) d s.t. d 1 = 1, d 11 ν d u q γ t, 1 q t Dual: [WGR07] min ln d 0 n exp( η t unw q q w,ψ n q=1 ηψ n ) 1 ψ ν 1 γ t w 1 s.t. w 0, ψ 0 where edge bound γ t is adjusted downward by a heuristic Good iteration bound for reaching soft margin ERLPBoost Primal: min d,γ γ + 1 η (d, d0 ) s.t. d 1 = 1, d 11 ν d u q γ, 1 q t Dual: [WGV08] min 1 ln d 0 w,ψ η n exp( η t unw q q n q=1 ηψ n ) 1 ψ ν 1 s.t. w 0, w 1 = 1, ψ 0 where for the iteration bound η is fixed to max( 2 ɛ ln N ν, 1 2 ) Good iteration bound for reaching soft margin Warmuth (UCSC) ICML 09 Boosting Tutorial 45 / 62
47 Overview of Boosting algorithms Corrective ERLPBoost Primal: min d t q=1 w q(u q d) + 1 (d, η d0 ) s.t. d 1 = 1, d 11 ν Dual: min 1 ln ψ η n s.t. ψ 0 d 0 n exp( η t unw q q ηψ n ) 1 ψ ν 1 q=1 [SS08] where for the iteration bound η is fixed to max( 2 ɛ ln N ν, 1 2 ) Good iteration bound for reaching soft margin Warmuth (UCSC) ICML 09 Boosting Tutorial 46 / 62
48 Iteration bounds Overview of Boosting algorithms Corrective AdaBoost LogitBoost AdaBoost* SS,Colt08 Totally Corrective LPBoost TotalBoost SoftBoost ERLPBoost Strong oracle: returns hypothesis with maximum edge Weak oracle: returns hypothesis with edge g In O( log N ν ) iterations 2 within ɛ of maximum soft margin for strong oracle or within ɛ of g for weak oracle Ditto for hard margin case In O( log N ) iterations consistency with weak oracle g 2 Warmuth (UCSC) ICML 09 Boosting Tutorial 47 / 62
49 Overview of Boosting algorithms LPBoost may require Ω(N) iterations w 1 w 2 w 3 w 4 w 5 margin d d d d d d d d edge value -1 Warmuth (UCSC) ICML 09 Boosting Tutorial 48 / 62
50 Overview of Boosting algorithms LPBoost may require Ω(N) iterations w 1 w 2 w 3 w 4 w 5 margin d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 49 / 62
51 Overview of Boosting algorithms LPBoost may require Ω(N) iterations w 1 w 2 w 3 w 4 w 5 margin d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 50 / 62
52 Overview of Boosting algorithms LPBoost may require Ω(N) iterations w 1 w 2 w 3 w 4 w 5 margin d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 51 / 62
53 Overview of Boosting algorithms LPBoost may require Ω(N) iterations w 1 w 2 w 3 w 4 w 5 margin d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 52 / 62
54 Overview of Boosting algorithms LPBoost may require Ω(N) iterations w 1 w 2 w 3 w 4 w 5 margin d d d d d d d d edge value No ties! Warmuth (UCSC) ICML 09 Boosting Tutorial 53 / 62
55 Overview of Boosting algorithms LPBoost may return bad final hypothesis How good is the master hypothesis returned by LPBoost compared to the best possible convex combination of hypotheses? Any linearly separable dataset can be reduced to a dataset on which LPBoost misclassifies all examples by adding a bad example adding a bad hypothesis Warmuth (UCSC) ICML 09 Boosting Tutorial 54 / 62
56 Overview of Boosting algorithms Adding a bad example w 1 w 2 w 3 w 4 w 5 margin d d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 55 / 62
57 Overview of Boosting algorithms Adding a bad hypothesis w 1 w 2 w 3 w 4 w 5 w 6 margin d d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 56 / 62
58 Overview of Boosting algorithms Adding a bad hypothesis w 1 w 2 w 3 w 4 w 5 w 6 margin d d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 56 / 62
59 Overview of Boosting algorithms Adding a bad hypothesis w 1 w 2 w 3 w 4 w 5 w 6 margin d d d d d d d d d edge value Warmuth (UCSC) ICML 09 Boosting Tutorial 56 / 62
60 Overview of Boosting algorithms Adding a bad hypothesis w 1 w 2 w 3 w 4 w 5 w 6 margin d d d d d d d d d Warmuth (UCSC) ICML 09 Boosting Tutorial 56 / 62
61 Overview of Boosting algorithms Synopsis LPBoost often unstable For safety, add relative entropy regularization Corrective algs Sometimes easy to code Fast per iteration Totally corrective algs Smaller number of iterations Faster overall time when ɛ small Weak versus strong oracle makes a big difference in practice Warmuth (UCSC) ICML 09 Boosting Tutorial 57 / 62
62 Overview of Boosting algorithms O( log N ) iteration bounds ɛ 2 Good Bound is major design tool Any reasonable Boosting algorithm should have this bound Bad ln N N ɛ 2 Bound is weak ɛ =.01 N ɛ =.001 N Why are totally corrective algorithms much better in practice? Warmuth (UCSC) ICML 09 Boosting Tutorial 58 / 62
63 Overview of Boosting algorithms Lower bounds on the number of iterations Majority of Ω( log N ) hypotheses for achieving consistency with g 2 weak oracle of guarantee g [Fr95] Easy: Ω( 1 ɛ 2 ) iteration bound for getting within ɛ of hard margin with strong oracle Harder: Ω( log N ɛ 2 ) iteration bound for stron oracle [Ne83?] Warmuth (UCSC) ICML 09 Boosting Tutorial 59 / 62
64 Outline Conclusion and Open Problems 1 Introduction to Boosting 2 What is Boosting? 3 Entropy Regularized LPBoost 4 Overview of Boosting algorithms 5 Conclusion and Open Problems Warmuth (UCSC) ICML 09 Boosting Tutorial 60 / 62
65 Conclusion Conclusion and Open Problems Adding relative entropy regularization of LPBoost leads to good boosting alg. Boosting is instantiation of MaxEnt and MinxEnt principles [Jaines 57,Kullback 59] Relative entropy regularization smoothes one-norm regularization Open When hypotheses have one-sided error then O( log N ) iterations suffice [As00,HW03] ɛ Does ERLPBoost have O( log N ) bound when hypotheses ɛ one-sided? Replace geometric optimizers by entropic ones Compare ours with Freund s algorithms that don t just cap, but forget examples Warmuth (UCSC) ICML 09 Boosting Tutorial 61 / 62
66 Acknowledgment Conclusion and Open Problems Rob Schapire and Yoav Freund for pioneering Boosting Gunnar Rätsch for bringing in optimization Karen Glocer for helping with figures and plots Warmuth (UCSC) ICML 09 Boosting Tutorial 62 / 62
Entropy Regularized LPBoost
Entropy Regularized LPBoost Manfred K. Warmuth Karen Glocer S.V.N. Vishwanathan (pretty slides from Gunnar Rätsch) Updated: October 13, 2008 M.K.Warmuth et.al. () Entropy Regularized LPBoost Updated: October
More informationTotally Corrective Boosting Algorithms that Maximize the Margin
Totally Corrective Boosting Algorithms that Maximize the Margin Manfred K. Warmuth 1 Jun Liao 1 Gunnar Rätsch 2 1 University of California, Santa Cruz 2 Friedrich Miescher Laboratory, Tübingen, Germany
More informationEntropy Regularized LPBoost
Entropy Regularized LPBoost Manfred K. Warmuth 1, Karen A. Glocer 1, and S.V.N. Vishwanathan 2 1 Computer Science Department University of California, Santa Cruz CA 95064, U.S.A {manfred,kag}@cse.ucsc.edu
More informationBoosting Algorithms for Maximizing the Soft Margin
Boosting Algorithms for Maximizing the Soft Margin Manfred K. Warmuth Dept. of Engineering University of California Santa Cruz, CA, U.S.A. Karen Glocer Dept. of Engineering University of California Santa
More informationIntroduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 11 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Boosting Mehryar Mohri - Introduction to Machine Learning page 2 Boosting Ideas Main idea:
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More information15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018
15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationGame Theory, On-line prediction and Boosting (Freund, Schapire)
Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting,
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationCutting Plane Training of Structural SVM
Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationLinear Programming Boosting via Column Generation
Linear Programming Boosting via Column Generation Ayhan Demiriz demira@rpi.edu Dept. of Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, Troy, NY 12180 USA Kristin P. Bennett
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationCS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs
CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More informationInfinite Ensemble Learning with Support Vector Machinery
Infinite Ensemble Learning with Support Vector Machinery Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology ECML/PKDD, October 4, 2005 H.-T. Lin and L. Li (Learning Systems
More informationMotivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory
Instructor: Shengyu Zhang 1 LP Motivating examples Introduction to algorithms Simplex algorithm On a particular example General algorithm Duality An application to game theory 2 Example 1: profit maximization
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationMulticlass Boosting with Repartitioning
Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationA Brief Introduction to Adaboost
A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationCOMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017
COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University BOOSTING Robert E. Schapire and Yoav
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationOnline Kernel PCA with Entropic Matrix Updates
Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationMachine Learning, Fall 2011: Homework 5
0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation
More informationAdvanced Machine Learning
Advanced Machine Learning Deep Boosting MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Model selection. Deep boosting. theory. algorithm. experiments. page 2 Model Selection Problem:
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationGame Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin
Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationBoosting. March 30, 2009
Boosting Peter Bühlmann buhlmann@stat.math.ethz.ch Seminar für Statistik ETH Zürich Zürich, CH-8092, Switzerland Bin Yu binyu@stat.berkeley.edu Department of Statistics University of California Berkeley,
More informationBoosting: Foundations and Algorithms. Rob Schapire
Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationToday: Linear Programming (con t.)
Today: Linear Programming (con t.) COSC 581, Algorithms April 10, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 29.4 Reading assignment for
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationBoosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch
. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationMaximizing the Margin with Boosting
Maximizing the Margin with Boosting Gunnar Rätsch and Manfred K. Warmuth RSISE, Australian National University Canberra, ACT 000, Australia Gunnar.Raetsch@anu.edu.au University of California at Santa Cruz
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationPrecise Statements of Convergence for AdaBoost and arc-gv
Contemporary Mathematics Precise Statements of Convergence for AdaBoost and arc-gv Cynthia Rudin, Robert E Schapire, and Ingrid Daubechies We wish to dedicate this paper to Leo Breiman Abstract We present
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMassachusetts Institute of Technology 6.854J/18.415J: Advanced Algorithms Friday, March 18, 2016 Ankur Moitra. Problem Set 6
Massachusetts Institute of Technology 6.854J/18.415J: Advanced Algorithms Friday, March 18, 2016 Ankur Moitra Problem Set 6 Due: Wednesday, April 6, 2016 7 pm Dropbox Outside Stata G5 Collaboration policy:
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c15 2013/9/9 page 331 le-tex 331 15 Ensemble Learning The expression ensemble learning refers to a broad class
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationAn Introduction to Boosting and Leveraging
An Introduction to Boosting and Leveraging Ron Meir 1 and Gunnar Rätsch 2 1 Department of Electrical Engineering, Technion, Haifa 32000, Israel rmeir@ee.technion.ac.il, http://www-ee.technion.ac.il/ rmeir
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More information