# Introduction to Boosting and Joint Boosting

Size: px
Start display at page:

Download "Introduction to Boosting and Joint Boosting"

## Transcription

1 Introduction to Boosting and Learning Systems Group, Caltech 2005/04/26, Presentation in EE150 Boosting and

2 Outline Introduction to Boosting 1 Introduction to Boosting Intuition of Boosting Adaptive Boosting (AdaBoost) 2 Boosting and

3 Apple Recognition Problem Intuition of Boosting Adaptive Boosting (AdaBoost) Is this a picture of an apple? We want to teach a class of 6 year olds. Gather photos from NY Apple Asso. and Google Image. Boosting and

4 Our Fruit Class Begins Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: How would you describe an apple? Michael? Michael: I think apples are circular. (Class): Apples are circular. Boosting and

5 Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Being circular is a good feature for the apples. However, if you only say circular, you could make several mistakes. What else can we say for an apple? Tina? Tina: It looks like apples are red. (Class): Apples are somewhat circular and somewhat red. Boosting and

6 Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Yes. Many apples are red. However, you could still make mistakes based on circular and red. Do you have any other suggestions, Joey? Joey: Apples could also be green. (Class): Apples are somewhat circular and somewhat red and possibly green. Boosting and

7 Our Fruit Class Continues Intuition of Boosting Adaptive Boosting (AdaBoost) Teacher: Yes. It seems that apples might be circular, red, green. But you may confuse them with tomatoes or peaches, right? Any more suggestions, Jessica? Jessica: Apples have stems at the top. (Class): Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top. Boosting and

8 Put Intuition to Practice Intuition of Boosting Adaptive Boosting (AdaBoost) Intuition Combine simple rules to approximate complex function. Emphasize incorrect data to focus on valuable information. AdaBoost Algorithm (Freund and Schapire 1997) Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, Learn a simple rule h t from emphasized training data. Get the confidence w t of such rule Emphasize the training data that do not agree with h t. Output: combined function H(x) = T t=1 w th t (x) with normalized w. Boosting and

9 Some More Details Introduction to Boosting Intuition of Boosting Adaptive Boosting (AdaBoost) AdaBoost Algorithm Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, Learn a simple rule h t from emphasized training data. How? Choose a h t H with minimum emphasized error. For example, H could be a set of decision stumps h θ,d,s (x) = s I[(x) d > θ]. Get the confidence w t of such rule How? An h t with lower error should get higher w t. Emphasize the training data that do not agree with h t. Output: combined function H(x) = T t=1 w th t (x) with normalized w. Let s see some demos. Boosting and

10 Why Boosting Works? Intuition of Boosting Adaptive Boosting (AdaBoost) Our intuition is correct. Provably, if each h t is better than a random guess (has error < 1/2), the combined function H(x) could make no error at all! Besides, boosting obtains large y i H(x i ) value on each data: H(x) could separate the data as clearly as possible. Boosting and

11 Multi-Class Boosting () Not very different from binary boosting. Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, For c = 1, 2,, C Learn a rule alone with confidence h t(x, c) from emphasized training data. Emphasize the training data that do not agree with h c,t. Output: combined function H(x, c) = T t=1 h t(x, c). Separate each class with the rest independently. Boosting and

12 Problem of Number of rules for good performance: O(C). For a budget of M rules, can only use M/C rules per class. For example, for fruits, many of the M rules (for apple, orange, tomato, etc.) would be it is circular. : waste of budget. The rules separate each class clearly: not contain mutual information between classes. For example, if we separate apples with other fruits, we have no idea that apples and tomatoes look similar. : each class resides in its own, budget-wasting rules. Boosting and

13 Introduction to Boosting Try to have joint rules. Input: training data Z = (x i, y i ) N i=1. For t = 1, 2,, T, For S {1, 2,, C} Learn a rule alone with confidence h t(x, S) using the classes in S combined together. Pick the rule h t (x, S t ) that achieves the best overall criteria. Emphasize the training data that do not agree with h t (x, S t ). Output: combined function H(x, c) = c S t h t (x, S t ). Separate a cluster of class jointly with the rest. Boosting and

14 Pros of A rule from a cluster of classes: meaningful and often stable. Number of rules for good performance: O(log C). Use the budget efficiently. Boosting and

15 Cons of The algorithm is very slow: S {1, 2,, C} is a loop of size 2 C. Replace the loop by a greedy search. Add the best single class to the cluster. Greedily combine a class to the cluster. Trace O(C 2 ) subsets instead of O(2 C ). Still slow in general, but could speed up when H is simple. For example, the regression stumps ai[(x) d > θ] + b. Boosting and

16 Experiment Framework Goal: detect 21 objects (13 indoor, 6 outdoor, 2 both) in the picture. Boosting and

17 Experiment Framework (Cont d) Extract feature with the following steps Scale the image by σ. Filter (by normalized correlation) with a patch g f. Mark the region to average response by a mask w f. Take the p-norm of average response in the region. Patches: small parts of the known objects randomly generated Example: a feature for the stem of an apple would be a patch (matched filter to stem) with mask at the top portion. Boosting and

18 Introduction to Boosting Experiment Results Similarity between combined classes (head and trash can). Boosting and

19 Experiment Results (Cont d) Save budgets for rules. Boosting and

20 Experiment Results (Cont d) Save needed data. Boosting and

21 Introduction to Boosting Experiment Results (Cont d) Simple rules are shared by more classes. Boosting and

22 Application: Multiview detection Multiview detection: usually consider each view as a class. Independent boosting: cannot allow too many classes (views). Views often share similar rules: joint boosting benefits. Boosting and

23 Result: Multiview detection Less false alarms in detection. Boosting and

24 Result: Multiview detection (Cont d) Significantly better ROC. Boosting and

25 Introduction to Boosting Boosting: reweight examples and combine rules. Independent boosting: separate each class with the rest independently. Joint boosting: find best joint cluster to separate with the rest. More complex algorithm. More meaningful and robust classifiers. Utility of joint boosting: When some of the classes share common rules: e.g. fruits. In multiview object detection: e.g. views of cars. Boosting and

### Voting (Ensemble Methods)

1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers

### ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

### Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

### VBM683 Machine Learning

VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

### Learning with multiple models. Boosting.

CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

### CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

### Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006

### Lecture 8. Instructor: Haipeng Luo

Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine

### Stochastic Gradient Descent

Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

### COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University BOOSTING Robert E. Schapire and Yoav

### Learning Ensembles. 293S T. Yang. UCSB, 2017.

Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)

### Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,

### What makes good ensemble? CS789: Machine Learning and Neural Network. Introduction. More on diversity

What makes good ensemble? CS789: Machine Learning and Neural Network Ensemble methods Jakramate Bootkrajang Department of Computer Science Chiang Mai University 1. A member of the ensemble is accurate.

### Boosting: Algorithms and Applications

Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE Boosting Definition

### Hierarchical Boosting and Filter Generation

January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

### AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

### BBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University

BBM406 - Introduc0on to ML Spring 2014 Ensemble Methods Aykut Erdem Dept. of Computer Engineering HaceDepe University 2 Slides adopted from David Sontag, Mehryar Mohri, Ziv- Bar Joseph, Arvind Rao, Greg

### COMS 4771 Lecture Boosting 1 / 16

COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?

### A Brief Introduction to Adaboost

A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman. 1 Outline Background Adaboost Algorithm Theory/Interpretations 2 What s So Good

### Ensembles. Léon Bottou COS 424 4/8/2010

Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon

### Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results

### Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

### Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

### Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

### Machine Learning for Signal Processing Detecting faces in images

Machine Learning for Signal Processing Detecting faces in images Class 7. 19 Sep 2013 Instructor: Bhiksha Raj 19 Sep 2013 11755/18979 1 Administrivia Project teams? Project proposals? 19 Sep 2013 11755/18979

### Performance Evaluation and Comparison

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

### Multiclass Boosting with Repartitioning

Multiclass Boosting with Repartitioning Ling Li Learning Systems Group, Caltech ICML 2006 Binary and Multiclass Problems Binary classification problems Y = { 1, 1} Multiclass classification problems Y

### The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ

### CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

### Decision Trees: Overfitting

Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9

### Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Outline: Ensemble Learning We will describe and investigate algorithms to Ensemble Learning Lecture 10, DD2431 Machine Learning A. Maki, J. Sullivan October 2014 train weak classifiers/regressors and how

### Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

### Boosting: Foundations and Algorithms. Rob Schapire

Boosting: Foundations and Algorithms Rob Schapire Example: Spam Filtering problem: filter out spam (junk email) gather large collection of examples of spam and non-spam: From: yoav@ucsd.edu Rob, can you

### 6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

### i=1 = H t 1 (x) + α t h t (x)

AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that

### Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

. Machine Learning Boosting Prof. Dr. Martin Riedmiller AG Maschinelles Lernen und Natürlichsprachliche Systeme Institut für Informatik Technische Fakultät Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de

### A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

### Statistics and learning: Big Data

Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

### A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

A Gentle Introduction to Gradient Boosting Cheng Li chengli@ccs.neu.edu College of Computer and Information Science Northeastern University Gradient Boosting a powerful machine learning algorithm it can

### FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

### 10701/15781 Machine Learning, Spring 2007: Homework 2

070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

### Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

### Data Warehousing & Data Mining

13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

### Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

### CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

### Linear Classifiers. Michael Collins. January 18, 2012

Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that

### Classification and Regression Trees

Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

### Classification: The rest of the story

U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

### Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers

JMLR: Workshop and Conference Proceedings vol 35:1 8, 014 Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers Balázs Kégl LAL/LRI, University

### Does Unlabeled Data Help?

Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

### Roadmap. Introduction to image analysis (computer vision) Theory of edge detection. Applications

Edge Detection Roadmap Introduction to image analysis (computer vision) Its connection with psychology and neuroscience Why is image analysis difficult? Theory of edge detection Gradient operator Advanced

### Boos\$ng Can we make dumb learners smart?

Boos\$ng Can we make dumb learners smart? Aarti Singh Machine Learning 10-601 Nov 29, 2011 Slides Courtesy: Carlos Guestrin, Freund & Schapire 1 Why boost weak learners? Goal: Automa'cally categorize type

### CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#\$

### CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

### Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Manfred K. Warmuth 1 Jun Liao 1 Gunnar Rätsch 2 1 University of California, Santa Cruz 2 Friedrich Miescher Laboratory, Tübingen, Germany

### Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

### Robotics 2 AdaBoost for People and Place Detection

Robotics 2 AdaBoost for People and Place Detection Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard v.1.0, Kai Arras, Oct 09, including material by Luciano Spinello and Oscar Martinez Mozos

### Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

### Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

### PCA FACE RECOGNITION

PCA FACE RECOGNITION The slides are from several sources through James Hays (Brown); Srinivasa Narasimhan (CMU); Silvio Savarese (U. of Michigan); Shree Nayar (Columbia) including their own slides. Goal

### From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

### Introduction to Machine Learning

Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................

### Boosting with decision stumps and binary features

Boosting with decision stumps and binary features Jason Rennie jrennie@ai.mit.edu April 10, 2003 1 Introduction A special case of boosting is when features are binary and the base learner is a decision

### Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the

### Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

### 2 Upper-bound of Generalization Error of AdaBoost

COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y

### Diagnostics. Gad Kimmel

Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

### 2D1431 Machine Learning. Bagging & Boosting

2D1431 Machine Learning Bagging & Boosting Outline Bagging and Boosting Evaluating Hypotheses Feature Subset Selection Model Selection Question of the Day Three salesmen arrive at a hotel one night and

### Designing Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9

EECS 16B Designing Information Devices and Systems II Fall 18 Elad Alon and Miki Lustig Homework 9 This homework is due Wednesday, October 31, 18, at 11:59pm. Self grades are due Monday, November 5, 18,

### Stephen Scott.

1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

### The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets

### Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

### Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

### Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

### Announcements Kevin Jamieson

Announcements My office hours TODAY 3:30 pm - 4:30 pm CSE 666 Poster Session - Pick one First poster session TODAY 4:30 pm - 7:30 pm CSE Atrium Second poster session December 12 4:30 pm - 7:30 pm CSE Atrium

### Name (NetID): (1 Point)

CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

### Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

### Midterm Exam Solutions, Spring 2007

1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

### Ensembles of Classifiers.

Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

### An overview of Boosting. Yoav Freund UCSD

An overview of Boosting Yoav Freund UCSD Plan of talk Generative vs. non-generative modeling Boosting Alternating decision trees Boosting and over-fitting Applications 2 Toy Example Computer receives telephone

### Machine Learning

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

### MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE March 28, 2012 The exam is closed book. You are allowed a double sided one page cheat sheet. Answer the questions in the spaces provided on

### Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example Günther Eibl and Karl Peter Pfeiffer Institute of Biostatistics, Innsbruck, Austria guenther.eibl@uibk.ac.at Abstract.

### Hypothesis Evaluation

Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction

### PAC Generalization Bounds for Co-training

PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research

### Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

The wisdom of the crowds Ensemble learning Sir Francis Galton discovered in the early 1900s that a collection of educated guesses can add up to very accurate predictions! Chapter 11 The paper in which

### Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

### Representing Images Detecting faces in images

11-755 Machine Learning for Signal Processing Representing Images Detecting faces in images Class 5. 15 Sep 2009 Instructor: Bhiksha Raj Last Class: Representing Audio n Basic DFT n Computing a Spectrogram

### Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

CSE 473 Chapter 18 Decision Trees and Ensemble Learning Recall: Learning Decision Trees Example: When should I wait for a table at a restaurant? Attributes (features) relevant to Wait? decision: 1. Alternate:

### Neural Networks and Deep Learning

Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

### Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

### Chapter 14 Combining Models

Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

### Least Squares Classification

Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

### Performance Evaluation

Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

### AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.

### Boosting & Deep Learning

Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection

### ECE521 Lecture7. Logistic Regression

ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

### Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

### Machine Learning

Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem