Introduction to Boosting and Joint Boosting

Similar documents
Voting (Ensemble Methods)

ECE 5424: Introduction to Machine Learning

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

VBM683 Machine Learning

Learning with multiple models. Boosting.

CSCI-567: Machine Learning (Spring 2019)

Large-Margin Thresholded Ensembles for Ordinal Regression

Lecture 8. Instructor: Haipeng Luo

Stochastic Gradient Descent

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Large-Margin Thresholded Ensembles for Ordinal Regression

What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

Boosting: Algorithms and Applications

Hierarchical Boosting and Filter Generation

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

BBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University

COMS 4771 Lecture Boosting 1 / 16

A Brief Introduction to Adaboost

Ensembles. Léon Bottou COS 424 4/8/2010

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

Data Mining und Maschinelles Lernen

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Learning theory. Ensemble methods. Boosting. Boosting: history

Machine Learning for Signal Processing Detecting faces in images

Performance Evaluation and Comparison

Multiclass Boosting with Repartitioning

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

CS229 Supplemental Lecture notes

Decision Trees: Overfitting

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Ensemble Methods for Machine Learning

Boosting: Foundations and Algorithms. Rob Schapire

6.036 midterm review. Wednesday, March 18, 15

i=1 = H t 1 (x) + α t h t (x)

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Statistics and learning: Big Data

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

FINAL: CS 6375 (Machine Learning) Fall 2014

10701/15781 Machine Learning, Spring 2007: Homework 2

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Data Warehousing & Data Mining

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

CS534 Machine Learning - Spring Final Exam

Linear Classifiers. Michael Collins. January 18, 2012

Classification and Regression Trees

Classification: The rest of the story

Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers

Does Unlabeled Data Help?

Roadmap. Introduction to image analysis (computer vision) Theory of edge detection. Applications

Boos$ng Can we make dumb learners smart?

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CS7267 MACHINE LEARNING

Totally Corrective Boosting Algorithms that Maximize the Margin

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Robotics 2 AdaBoost for People and Place Detection

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Linear and Logistic Regression. Dr. Xiaowei Huang

PCA FACE RECOGNITION

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

Introduction to Machine Learning

Boosting with decision stumps and binary features

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

2 Upper-bound of Generalization Error of AdaBoost

Diagnostics. Gad Kimmel

2D1431 Machine Learning. Bagging & Boosting

Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9

Stephen Scott.

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Announcements Kevin Jamieson

Name (NetID): (1 Point)

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Midterm Exam Solutions, Spring 2007

Ensembles of Classifiers.

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

An overview of Boosting. Yoav Freund UCSD

Machine Learning

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Hypothesis Evaluation

PAC Generalization Bounds for Co-training

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Kernel Density Topic Models: Visual Topics Without Visual Words

Representing Images Detecting faces in images

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Neural Networks and Deep Learning

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Chapter 14 Combining Models

Least Squares Classification

Performance Evaluation

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

Boosting & Deep Learning

ECE521 Lecture7. Logistic Regression

Ensemble Methods and Random Forests

Machine Learning

Transcription:

Introduction to Boosting and Learning Systems Group, Caltech 2005/04/26, Presentation in EE150 Boosting and

Outline Introduction to Boosting 1 Introduction to Boosting Intuition of Boosting Adaptive Boosting (AdaBoost) 2 Boosting and

Apple Recognition Problem Intuition of Boosting Adaptive Boosting (AdaBoost) Is this a picture of an apple? We want to teach a class of 6 year olds. Gather photos from NY Apple Asso. and Google Image. Boosting and

Our Fruit Class Begins Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: How would you describe an apple? Michael? Michael: I think apples are circular. (Class): Apples are circular. Boosting and

Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Being circular is a good feature for the apples. However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina? Tina: It looks like apples are red. (Class): Apples are somewhat circular and somewhat red. Boosting and

Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey? Joey: Apples could also be green. (Class): Apples are somewhat circular and somewhat red and possibly green. Boosting and

Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Yes. It seems that apples might be circular, red, green. But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica? Jessica: Apples have stems at the top. (Class): Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top. Boosting and

Put Intuition to Practice Intuition of Boosting Adaptive Boosting (AdaBoost) Intuition Combine simple rules to approximate complex function. Emphasize incorrect data to focus on valuable information. AdaBoost Algorithm (Freund and Schapire 1997) Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, Learn a simple rule h t from emphasized training data. Get the confidence w t of such rule Emphasize the training data that do not agree with h t. Output: combined function H(x) = T t=1 w th t (x) with normalized w. Boosting and

Some More Details Introduction to Boosting Intuition of Boosting Adaptive Boosting (AdaBoost) AdaBoost Algorithm Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, Learn a simple rule h t from emphasized training data. How? Choose a h t H with minimum emphasized error. For example, H could be a set of decision stumps h θ,d,s (x) = s I[(x) d > θ]. Get the confidence w t of such rule How? An h t with lower error should get higher w t. Emphasize the training data that do not agree with h t. Output: combined function H(x) = T t=1 w th t (x) with normalized w. Let s see some demos. Boosting and

Why Boosting Works? Intuition of Boosting Adaptive Boosting (AdaBoost) Our intuition is correct. Provably, if each h t is better than a random guess (has error < 1/2), the combined function H(x) could make no error at all! Besides, boosting obtains large y i H(x i ) value on each data: H(x) could separate the data as clearly as possible. Boosting and

Multi-Class Boosting () Not very different from binary boosting. Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, For c = 1, 2,, C Learn a rule alone with confidence h t(x, c) from emphasized training data. Emphasize the training data that do not agree with h c,t. Output: combined function H(x, c) = T t=1 h t(x, c). Separate each class with the rest independently. Boosting and

Problem of Number of rules for good performance: O(C). For a budget of M rules, can only use M/C rules per class. For example, for fruits, many of the M rules (for apple, orange, tomato, etc.) would be it is circular. : waste of budget. The rules separate each class clearly: not contain mutual information between classes. For example, if we separate apples with other fruits, we have no idea that apples and tomatoes look similar. : each class resides in its own, budget-wasting rules. Boosting and

Introduction to Boosting Try to have joint rules. Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, For S {1, 2,, C} Learn a rule alone with confidence h t(x, S) using the classes in S combined together. Pick the rule h t (x, S t ) that achieves the best overall criteria. Emphasize the training data that do not agree with h t (x, S t ). Output: combined function H(x, c) = c S t h t (x, S t ). Separate a cluster of class jointly with the rest. Boosting and

Pros of A rule from a cluster of classes: meaningful and often stable. Number of rules for good performance: O(log C). Use the budget efficiently. Boosting and

Cons of The algorithm is very slow: S {1, 2,, C} is a loop of size 2 C. Replace the loop by a greedy search. Add the best single class to the cluster. Greedily combine a class to the cluster. Trace O(C 2 ) subsets instead of O(2 C ). Still slow in general, but could speed up when H is simple. For example, the regression stumps ai[(x) d > θ] + b. Boosting and

Experiment Framework Goal: detect 21 objects (13 indoor, 6 outdoor, 2 both) in the picture. Boosting and

Experiment Framework (Cont d) Extract feature with the following steps Scale the image by σ. Filter (by normalized correlation) with a patch g f. Mark the region to average response by a mask w f. Take the p-norm of average response in the region. Patches: small parts of the known objects randomly generated 2000. Example: a feature for the stem of an apple would be a patch (matched filter to stem) with mask at the top portion. Boosting and

Introduction to Boosting Experiment Results Similarity between combined classes (head and trash can). Boosting and

Experiment Results (Cont d) Save budgets for rules. Boosting and

Experiment Results (Cont d) Save needed data. Boosting and

Introduction to Boosting Experiment Results (Cont d) Simple rules are shared by more classes. Boosting and

Application: Multiview detection Multiview detection: usually consider each view as a class. Independent boosting: cannot allow too many classes (views). Views often share similar rules: joint boosting benefits. Boosting and

Result: Multiview detection Less false alarms in detection. Boosting and

Result: Multiview detection (Cont d) Significantly better ROC. Boosting and

Introduction to Boosting Boosting: reweight examples and combine rules. Independent boosting: separate each class with the rest independently. Joint boosting: find best joint cluster to separate with the rest. More complex algorithm. More meaningful and robust classifiers. Utility of joint boosting: When some of the classes share common rules: e.g. fruits. In multiview object detection: e.g. views of cars. Boosting and