Robotics 2 AdaBoost for People and Place Detection

Similar documents
Robotics 2. AdaBoost for People and Place Detection. Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Hierarchical Boosting and Filter Generation

CSCI-567: Machine Learning (Spring 2019)

Introduction to Mobile Robotics Information Gain-Based Exploration. Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Giorgio Grisetti, Kai Arras

Boosting: Foundations and Algorithms. Rob Schapire

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

CS7267 MACHINE LEARNING

COMS 4771 Introduction to Machine Learning. Nakul Verma

Large-Margin Thresholded Ensembles for Ordinal Regression

Robotics 2 Target Tracking. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Chapter 14 Combining Models

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

CS229 Supplemental Lecture notes

Voting (Ensemble Methods)

Decision Trees: Overfitting

Ensembles. Léon Bottou COS 424 4/8/2010

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

Robotics 2 Least Squares Estimation. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Maren Bennewitz, Wolfram Burgard

Large-Margin Thresholded Ensembles for Ordinal Regression

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Linear Classifiers. Michael Collins. January 18, 2012

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Learning with multiple models. Boosting.

18.9 SUPPORT VECTOR MACHINES

ECE 5424: Introduction to Machine Learning

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Lecture 8. Instructor: Haipeng Luo

Face detection and recognition. Detection Recognition Sally

Boosting: Algorithms and Applications

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Learning theory. Ensemble methods. Boosting. Boosting: history

Representing Images Detecting faces in images

Midterm Exam Solutions, Spring 2007

Introduction to Support Vector Machines

Stochastic Gradient Descent

Learning Methods for Linear Detectors

Stephen Scott.

Neural Networks and Ensemble Methods for Classification

Machine Learning, Fall 2011: Homework 5

Statistical Machine Learning from Data

2D Image Processing Face Detection and Recognition

Reconnaissance d objetsd et vision artificielle

VBM683 Machine Learning

Machine Learning. Ensemble Methods. Manfred Huber

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS534 Machine Learning - Spring Final Exam

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Ensemble Methods for Machine Learning

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

6.036 midterm review. Wednesday, March 18, 15

CPSC 340: Machine Learning and Data Mining

COMS 4771 Lecture Boosting 1 / 16

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

FINAL: CS 6375 (Machine Learning) Fall 2014

Ensemble learning 11/19/13. The wisdom of the crowds. Chapter 11. Ensemble methods. Ensemble methods

Boosting & Deep Learning

TDT4173 Machine Learning

Open Problem: A (missing) boosting-type convergence result for ADABOOST.MH with factorized multi-class classifiers

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

The Perceptron Algorithm, Margins

CS 231A Section 1: Linear Algebra & Probability Review

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Logistic Regression and Boosting for Labeled Bags of Instances

i=1 = H t 1 (x) + α t h t (x)

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

A Brief Introduction to Adaboost

Data Warehousing & Data Mining

Useful Equations for solving SVM questions

Final Exam, Fall 2002

Microarray Data Analysis: Discovery

The Perceptron. Volker Tresp Summer 2016

Boos$ng Can we make dumb learners smart?

Modern Information Retrieval

An overview of Boosting. Yoav Freund UCSD

The Perceptron algorithm

Kernel Methods & Support Vector Machines

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Machine Learning for Signal Processing Detecting faces in images

Analysis of the Performance of AdaBoost.M2 for the Simulated Digit-Recognition-Example

Introduction to Boosting and Joint Boosting

10-701/ Machine Learning - Midterm Exam, Fall 2010

Infinite Ensemble Learning with Support Vector Machinery

ECE 5984: Introduction to Machine Learning

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

1 Handling of Continuous Attributes in C4.5. Algorithm

Machine Learning Lecture 10

Transcription:

Robotics 2 AdaBoost for People and Place Detection Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard v.1.0, Kai Arras, Oct 09, including material by Luciano Spinello and Oscar Martinez Mozos

Contents Motivation AdaBoost Boosting Features for People Detection Boosting Features for Place Labeling

Motivation: People Detection People tracking is a key technology for many problems and robotics-related fields: Human-Robot-Interaction (HRI) Social Robotics: social learning, learning by imitation and observation Motion planning in populated environments Human activity and intent recognition Abnormal behavior detection Crowd behavior analysis and control People detection is a key component therein.

Motivation: People Detection Where are the people?

Motivation: People Detection Where are the people? Why is it hard? Range data contain little information on people Hard in cluttered environments

Motivation: People Detection Appearance of people in range data changes drastically with: - Body pose - Distance to sensor - Partial occlusion and self-occlusion 2d data from a SICK laser scanner

Motivation: People Detection Appearance of people in range data: 3d data from a Velodyne sensor

Motivation: People Detection

Motivation: People Detection Freiburg main station

Motivation: People Detection Freiburg Main Station data set: raw data

Motivation: People Detection Freiburg Main Station data set: labeled data

Motivation: People Detection Freiburg Main Station data set: labeled data

Contents Motivation AdaBoost Boosting Features for People Detection Boosting Features for Place Labeling

Boosting Supervised Learning technique user provides <input data x, label y> Learning an accurate classifier by combining moderately inaccurate rules of thumb Inaccurate rules: weak classifiers Accurate rule: strong classifier Most popular algorithm: AdaBoost [Freund et al. 95], [Schapire et al. 99] AdaBoost in Robotics: [Viola et al. 01], [Treptow et al. 04], [Martínez-Mozos et al. 05], [Rottmann et al. 05], [Monteiro et al. 06], [Arras et al. 07]

AdaBoost What AdaBoost can do for you: 1. It tells you what the best "features" are 2. What the best thresholds are, and 3. How to combine this to a classifier It's a non-linear classifier Robust to overfitting Very simple to implement Classifier design becomes science, not art

AdaBoost A machine learning ensemble technique (also called committee methods) There are other ensemble technique such as Bagging, Voting etc. Combines an ensemble of weak classifiers (weak learners) to create a strong classifier Prerequisite: weak classifier better than chance, that is, error < 0.5 (in a binary classification problem)

Classification Linear vs Non-Linear Classifier L NL

Classification Overfitting Overfitting occurs when a model begins to memorize the training data rather than learning the underlying relationship Occurs typically when fitting a statistical model with too many parameters Overfitted models explain training data perfectly But: model does not generalize! There are techniques to avoid overfitting such as regularization or crossvalidation

Classification Error types True value T N Predicted value of the classifier T' N' True Positive False Negative False Positive True Negative Sensivity (True Positive Rate) = TP / T Detected Not Detected Specificity (True Negative Rate) = TN / N many more measures...

Classification Classification algorithms are supervised algorithms to predict categorical labels Differs from regression which is a supervised technique to predict real-valued labels Formal problem statement: Produce a function that maps Given a training set labels values

AdaBoost: Weak Classifier Alternatives Decision stump: Single axis-parallel partition of space Decision tree: Hierarchical partition of space Multi-layer perceptron: General non-linear function approximators Radial basis function: Non-linear expansions based on kernels

AdaBoost: Weak Classifier Decision stump Simple-most type of decision tree Equivalent to linear classifier defined by affine hyperplane Hyperplane is orthogonal to axis with which it intersects in threshold θ Commonly not used on its own Formally, h(x; j,θ) = +1 1 x j > θ else where x is training sample, j is dimension

AdaBoost: Algorithm Given the training data 1. Initialize weights 2. For m = 1,...,M w i =1 n Train a weak classifier on weighted training data minimizing Σ i w i Ι(y i h m (x i )) h m (x) Compute error of : h m (x) n e m = w i Ι(y i h m (x i )) i =1 n i =1 w i h m (x) Compute voting weight of : α m = log( 1 e m e m ) Recompute weights: w i = w i exp{ α m Ι(y i h m (x i ) } 3. Make predictions using the final strong classifier

AdaBoost: Strong Classifier Computing the voting weight classifier α m of a weak α m α m = log( 1 e m e m ) error chance: e m = 0.5

AdaBoost: Strong Classifier Training is completed... The weak classifiers voting weight α 1...M h 1...M (x) are now fix and their The resulting strong classifier is M H(x i ) = sgn α m h m (x i ) m =1 Class Result {+1, -1} Put your data here Weighted majority voting scheme

AdaBoost: Weak Classifier Train a decision stump on weighted data This consists in... w i Ι(y i h m (x i )) ( j *, θ * i=1 ) = argmin j,θ Finding an optimum parameter θ* for each dimension j =1 d and then select the j* for which the cost is minimal. n n i=1 w i

AdaBoost: Weak Classifier A simple training algorithm for stumps: j = 1...d end Sort samples x i in ascending order along dimension j i = 1...n end Compute n cumulative sums Threshold θ j is at extremum of Sign of extremum gives direction p j of inequality Global extremum in all d sums threshold θ* and dimension j* w cum j w cum j w cum gives (i) = i k =1 w k y k

AdaBoost: Weak Classifier Training algorithm for stumps: Intuition Label y : red: + blue: Assuming all weights = 1 j w cum (i) = i k =1 w k y k θ *, j * = 1

AdaBoost: Algorithm Given the training data 1. Initialize weights 2. For m = 1,...,M w i =1 n Train a weak classifier on weighted training data minimizing Σ i w i Ι(y i h m (x i )) h m (x) Compute error of : h m (x) n e m = w i Ι(y i h m (x i )) i =1 n i =1 w i h m (x) Compute voting weight of : α m = log( 1 e m e m ) Recompute weights: w i = w i exp{ α m Ι(y i h m (x i ) } 3. Make predictions using the final strong classifier

AdaBoost: Step-By-Step Training data

AdaBoost: Step-By-Step Iteration 1, train weak classifier 1 Threshold θ* = 0.37 Dimension j* = 1 Weighted error e m = 0.2 Voting weight α m = 1.39 Total error = 4

AdaBoost: Step-By-Step Iteration 1, recompute weights Threshold θ* = 0.37 Dimension j* = 1 Weighted error e m = 0.2 Voting weight α m = 1.39 Total error = 4

AdaBoost: Step-By-Step Iteration 2, train weak classifier 2 Threshold θ* = 0.47 Dimension j* = 2 Weighted error e m = 0.16 Voting weight α m = 1.69 Total error = 5

AdaBoost: Step-By-Step Iteration 2, recompute weights Threshold θ* = 0.47 Dimension j* = 2 Weighted error e m = 0.16 Voting weight α m = 1.69 Total error = 5

AdaBoost: Step-By-Step Iteration 3, train weak classifier 3 Threshold θ* = 0.14 Dimension, sign j* = 2, neg Weighted error e m = 0.25 Voting weight α m = 1.11 Total error = 1

AdaBoost: Step-By-Step Iteration 3, recompute weights Threshold θ* = 0.14 Dimension, sign j* = 2, neg Weighted error e m = 0.25 Voting weight α m = 1.11 Total error = 1

AdaBoost: Step-By-Step Iteration 4, train weak classifier 4 Threshold θ* = 0.37 Dimension j* = 1 Weighted error e m = 0.20 Voting weight α m = 1.40 Total error = 1

AdaBoost: Step-By-Step Iteration 4, recompute weights Threshold θ* = 0.37 Dimension j* = 1 Weighted error e m = 0.20 Voting weight α m = 1.40 Total error = 1

AdaBoost: Step-By-Step Iteration 5, train weak classifier 5 Threshold θ* = 0.81 Dimension j* = 1 Weighted error e m = 0.28 Voting weight α m = 0.96 Total error = 1

AdaBoost: Step-By-Step Iteration 5, recompute weights Threshold θ* = 0.81 Dimension j* = 1 Weighted error e m = 0.28 Voting weight α m = 0.96 Total error = 1

AdaBoost: Step-By-Step Iteration 6, train weak classifier 6 Threshold θ* = 0.47 Dimension j* = 2 Weighted error e m = 0.29 Voting weight α m = 0.88 Total error = 1

AdaBoost: Step-By-Step Iteration 6, recompute weights Threshold θ* = 0.47 Dimension j* = 2 Weighted error e m = 0.29 Voting weight α m = 0.88 Total error = 1

AdaBoost: Step-By-Step Iteration 7, train weak classifier 7 Threshold θ* = 0.14 Dimension, sign j* = 2, neg Weighted error e m = 0.29 Voting weight α m = 0.88 Total error = 1

AdaBoost: Step-By-Step Iteration 7, recompute weights Threshold θ* = 0.14 Dimension, sign j* = 2, neg Weighted error e m = 0.29 Voting weight α m = 0.88 Total error = 1

AdaBoost: Step-By-Step Iteration 8, train weak classifier 8 Threshold θ* = 0.93 Dimension, sign j* = 1, neg Weighted error e m = 0.25 Voting weight α m = 1.12 Total error = 0

AdaBoost: Step-By-Step Iteration 8, recompute weights Threshold θ* = 0.93 Dimension, sign j* = 1, neg Weighted error e m = 0.25 Voting weight α m = 1.12 Total error = 0

AdaBoost: Step-By-Step Final Strong Classifier Total error = 0! (rarely met in practice) Colored background dots in figure is test set.

AdaBoost: Wrap Up Misclassified samples receive higher weight. The higher the weight the "more attention" a training sample gets Algorithm generates weak classifier by training the next learner on the mistakes of the previous one Now we also understand the name: adaptive Boosting AdaBoost Once training is done, AdaBoost is a voting method Large impact on ML community and beyond

Contents Motivation AdaBoost Boosting Features for People Detection Boosting Features for Place Labeling

Problem and Approach Can we find robust features for people, legs and groups of people in 2D range data? What are the best features for people detection? Can we find people that do not move? Approach: Classifying groups of adjacent beams Computing a set of simple (scalar) features on these groups Boosting the features

Related Work People Tracking [Fod et al. 2002] [Kleinhagenbrock et al. 2002] [Schulz et al. 2003] [Scheutz et al. 2004] [Topp et al. 2005] [Cui et al. 2005] [Schulz 2006] [Mucientes et al. 2006] SLAM in dyn. env. [Montemerlo et al. 2002] [Hähnel et al. 2003] [Wang et al. 2003]... People detection done with very simple classifiers: manual feature selection, heuristic thresholds Typically: narrow local-minima blobs that move

Segmentation Divide the scan into segments "Range image segmentation"

Segmentation Raw scan Segmented scan Method: Jump distance condition See [Premebida et al. 2005] for survey Size filter: rejection of too small segments Feature profiles

Segmentation Raw scan Segmented scan Method: Jump distance condition See [Premebida et al. 2005] for survey Size filter: rejection of too small segments Feature profiles

Segmentation Raw scan Segmented scan Method: Jump distance condition See [Premebida et al. 2005] for survey Size filter: rejection of too small segments Feature profiles

Features Segment 1. Number of points 2. Standard Deviation 3. Mean avg. deviation from median 4. Jump dist. to preceding segment 5. Jump dist. to succeeding segment 6. Width

Features Segment r c 7. Linearity 8. Circularity 9. Radius

Features Segment 10. Boundary Length 11. Boundary Regularity 12. Mean curvature 13. Mean angular difference 14. Mean speed

Features Resulting feature signature for each segment

Training: Data Labeling Mark segments that correspond to people Either manually or automatically

Training: Data Labeling Automatic labeling: obvious approach, define area of interest 3 m 4 m Here: discrimination from clutter is interesting. Features include spatial relation between fore- and background. Thus: labeling is done by hand

Training Resulting Training Set +1-1 Segments corresponding to people (foreground segments) Segments corresponding to other objects (background segments)

AdaBoost: Weak Classifiers Each binary weak classifier h j (x) is created using a single-valued feature f j in the form: Where θ j is a threshold and p j {-1, 1} indicates the direction of the inequality Weak classifier must be better than random

AdaBoost: Final Strong Classifier Strong Binary Classifier example 1... example N w 1 h 1 Boosting... Σ {-1,1} w T h T f #1... f #14 Vocabulary of features Weighted majority vote classifier

Experiments Env. 1: Corridor, no clutter Env. 2: Office, very cluttered

Experiments Env. 1 & 2: Corridor and Office Env. 1 2: Cross-evaluation Trained in corridor, applied in office

Experiments Adding motion feature (mean speed, f#14) Motion feature has no contribution Experimental setup: Robot Herbert SICK LMS 200 laser range finder, 1 deg angular resolution

Experiments Comparison with hand-tuned classifier Jump distance θ δ = 30 cm Width θ w,m = 5 cm, θ w,m = 50 cm Number of points θ n = 4 Standard deviation θ σ = 50 cm Motion of points θ v = 2 cm People are often not detected

Experiments Five best features: 1: Radius of LSQ-fitted circle, robust size measure (#9) 2: Mean angular difference Convexity measure (#13) 3/4: Jump distances Local minima measure (#4 and #5) 5: Mad from median Robust compactness measure (#3)

Result: Classification T F T F

Summary People detection phrased as a classification problem of groups of neighboring beams AdaBoost allows for a systematic approach to perform this task One-shot/single-frame people detection with over 90% accuracy Learned classifier clear superior to hand-tuned classifier No background knowledge such as an a priori map is needed (e.g. to perform background substraction)

Contents Motivation AdaBoost Boosting Features for People Detection Boosting Features for Place Labeling

Place Labeling: Motivation A map is a metric and topological model of the environment

Place Labeling: Motivation Wanted: semantic information about places Room Corridor Doorway

Scenario Example Albert, where are you? I am in the corridor

Scenario Example 2 Semantic mapping Corridor Room Doorway Human-Robot Interaction of type: "Robot, get out of my room, go into the corridor!"

Problem Statement Classification of the position of the robot using a single observation: a 360 laser range scan

Observations

Observations Room

Observations Room

Observations Room Doorway

Observations Room Doorway

Observations Corridor Room Doorway

Similar Observations

Similar Observations Corridor Doorway

Classification Problem

Classification Problem

Classification Problem? Corridor Room Doorway

Representing the Observations How we represent the 360 laser beams for our classification task? As a list of beams Problem: which beam is the first beam? Not invariant to rotation!!=

Representing the Observations A list of scalar geometrical features of the scan The features are all invariant to rotation =

Simple Features d i d d Σ d i Gap = d > θ f = # Gaps Minimum f = d d f =Area f =Perimeter f = d

Simple Features Features of the raw beams

Simple Features Features of the closed polynom P(z) made up by the beams

Multiple Classes Corridor Room Doorway 1 2 3

Multiple Classes Corridor Room Doorway 1 2 3

Multiple Classes Sequence of binary classifiers in a decision list Corridor Classifier H(x)= 1 Room Classifier H(x)= 1 Doorway H(x)=1 Corridor H(x)=1 Room Order matters as accuracy differs Order according to error rate Generalizes to sequential AdaBoost for K classes

Experiments Training (top) # examples: 16045 Test (bottom) # examples: 18726 classification: 93.94% Building 079 Uni. Freiburg Corridor Room Doorway

Experiments Training (left) # examples: 13906 Test (right) # examples: 10445 classification: 89.52% Building 101 Uni. Freiburg Corridor Room Doorway Hallway

Application to New Environment Training map Intel Research Lab in Seattle

Application to New Environment Training map Intel Research Lab in Seattle Corridor Room Doorway

Training Learn a strong classifier from a set of previously labeled observations Room Corridor Doorway features features features AdaBoost Multi-class Classifier