Boosting: Algorithms and Applications

Similar documents
PCA FACE RECOGNITION

Face detection and recognition. Detection Recognition Sally

COS 429: COMPUTER VISON Face Recognition

Reconnaissance d objetsd et vision artificielle

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

2D Image Processing Face Detection and Recognition

ECE 661: Homework 10 Fall 2014

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Learning theory. Ensemble methods. Boosting. Boosting: history

The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m

Learning with multiple models. Boosting.

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Visual Object Detection

Hierarchical Boosting and Filter Generation

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

COMS 4771 Lecture Boosting 1 / 16

Boosting: Foundations and Algorithms. Rob Schapire

CS7267 MACHINE LEARNING

Multi-Layer Boosting for Pattern Recognition

REAL-TIME object detection inherently involves searching

An overview of Boosting. Yoav Freund UCSD

Robotics 2. AdaBoost for People and Place Detection. Kai Arras, Cyrill Stachniss, Maren Bennewitz, Wolfram Burgard

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Learning Ensembles. 293S T. Yang. UCSB, 2017.

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

Introduction to Boosting and Joint Boosting

Advanced Machine Learning & Perception

Ensembles. Léon Bottou COS 424 4/8/2010

Ensemble Methods for Machine Learning

Two-stage Pedestrian Detection Based on Multiple Features and Machine Learning

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

10-701/ Machine Learning - Midterm Exam, Fall 2010

Classification Ensemble That Maximizes the Area Under Receiver Operating Characteristic Curve (AUC)

Multiclass Boosting with Repartitioning

Decision Trees: Overfitting

Data Mining und Maschinelles Lernen

arxiv: v1 [cs.cv] 10 Jan 2013

Face recognition Computer Vision Spring 2018, Lecture 21

Robotics 2 AdaBoost for People and Place Detection

A Brief Introduction to Adaboost

Characterization of Jet Charge at the LHC

Learning Methods for Linear Detectors

Machine Learning for Signal Processing Detecting faces in images

MACHINE LEARNING ADVANCED MACHINE LEARNING

Two-Layered Face Detection System using Evolutionary Algorithm

Lecture 13 Visual recognition

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. Department of Computer Science TU Darmstadt

Voting (Ensemble Methods)

Discriminative part-based models. Many slides based on P. Felzenszwalb

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

CS5670: Computer Vision

DETECTING HUMAN ACTIVITIES IN THE ARCTIC OCEAN BY CONSTRUCTING AND ANALYZING SUPER-RESOLUTION IMAGES FROM MODIS DATA INTRODUCTION

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Classification: The rest of the story

CS229 Supplemental Lecture notes

Chapter 18. Decision Trees and Ensemble Learning. Recall: Learning Decision Trees

Graphical Object Models for Detection and Tracking

Representing Images Detecting faces in images

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Advanced Machine Learning & Perception

Totally Corrective Boosting Algorithms that Maximize the Margin

Boosting & Deep Learning

Robust License Plate Detection Using Covariance Descriptor in a Neural Network Framework

Name (NetID): (1 Point)

A Learning Approach to Detecting Lung Nodules in CT Images

The Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016

W vs. QCD Jet Tagging at the Large Hadron Collider

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Evaluation. Andrea Passerini Machine Learning. Evaluation

Lecture 8. Instructor: Haipeng Luo

Example: Face Detection

COMS 4771 Introduction to Machine Learning. Nakul Verma

BBM406 - Introduc0on to ML. Spring Ensemble Methods. Aykut Erdem Dept. of Computer Engineering HaceDepe University

Empirical Risk Minimization Algorithms

Pulse characterization with Wavelet transforms combined with classification using binary arrays

MACHINE LEARNING ADVANCED MACHINE LEARNING

Announcements Kevin Jamieson

Data Warehousing & Data Mining

Shape of Gaussians as Feature Descriptors

A Discriminatively Trained, Multiscale, Deformable Part Model

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Logistic Regression. Machine Learning Fall 2018

Linear Classifiers as Pattern Detectors

Boosting with decision stumps and binary features

CITS 4402 Computer Vision

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

ECE 5984: Introduction to Machine Learning

i=1 = H t 1 (x) + α t h t (x)

Fast Human Detection from Videos Using Covariance Features

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

L11: Pattern recognition principles

Transcription:

Boosting: Algorithms and Applications Lecture 11, ENGN 4522/6520, Statistical Pattern Recognition and Its Applications in Computer Vision ANU 2 nd Semester, 2008 Chunhua Shen, NICTA/RSISE

Boosting Definition of Boosting: Boosting refers to the general problem of producing a very accurate prediction rule by combining rough and moderately inaccurate rules. Boosting procedures Given a set of labeled training examples On each round The booster devises a distribution (importance) over the example set The booster requests a weak hypothesis/classifier/learner with low error Upon convergence, the booster combine the weak hypothesis into a single prediction rule.

Boosting (Freund & Schapire, 1997)

Boosting: 1 st iteration

Boosting: Update Distribution

Boosting as Entropy Projection Minimizing relative entropy to last distribution s.t. linear constraints

Boosting: 2 nd Hypothesis

Boosting: 3 rd Hypothesis

Boosting: 4 th Hypothesis

All hypotheses

AdaBoost

Properties of AdaBoost Adaboost adjusts adaptively the errors of the weak hypotheses by weak learner. Unlike the conventional boosting algorithm, the prior error need not be known ahead of time. The update rule reduces the probability assigned to those examples on which the hypothesis makes a good predictions and increases the probability of the examples on which the prediction is poor.

Multi-class Extensions The previous discussion is restricted to binary classification problems. The traing data could have any number of labels, which is a multi-class problems. The multi-class case (AdaBoost.M1) requires the accuracy of the weak hypothesis greater than ½. This condition in the multi-class is stronger than that in the binary classification cases

Detection Pedestrian Using Patterns of Motion and Appearance Paul Viola, Michael J. Jones, Daniel Snow IEEE ICCV

The System A pedestrian detection system using image intensity information and motion information with the detectors trained by AdaBoost. The first approach combining both the appearance and motion information in a single detector. Advantages: High efficiency High detection rate & low false positive rate

Rectangle Filters Measuring the difference between region averages at various scales, orientations and aspect ratios. However, this information is limited and needs to be boosted to perform accurate classification

Motion Information Information about the direction of motion can be extracted from the difference between shifted versions of the second image in time with the first image Motion filters (direction, shear, magnitude) operate on 5 images: Δ = abs I I U L R D = = = = abs abs abs abs ( ) t t+ 1 ( I I ) t t+ 1 ( It It+ 1 ) ( I I ) t t+ 1 ( I I ) t t+ 1

An Example

Appearance Filter Appearance Filter is rectangular filters that operate on the first input image f = φ ( ) m I t

Integral Image The integral image at location x,y contains the sum of the pixels above and to the left of x,y, inclusive: ii ( x, y) = i( x, y ) x x, y y ii ( x, y) i ( x, y) where is the integral image and is the original image s ii ( x, y) = s( x, y 1) + i( x, y) ( x, y) = ii( x 1, y) + s( x, y) where s(x,y) is the cumulative row sum

Training Filters The rectangle filters can have any size, aspect ratio or position as long as they fit in the detection window; therefore, there are quite a number of possible motion and appearance filters, from which a learning algorithm selects to build classifiers.

Training Process The training process uses AdaBoost to select a subset of features (F) which minimize the weighted error, to construct the classifier. In each round, the learning algorithm chooses a set of filters from motion and appearance filters. Also picks the optimal threshold (t) for each feature as well as the linear weights The outputs of AdaBoost is a linear combination of the selected features.

Training Process A cascade architecture is used to raise the efficiency of the system. The true and false positives passed at the current stage will be used in the next stage of the cascade. The goal is to reduce the false positive rate faster than the detection rate.

Overview of the Cascaded Structure Classifier 1 Weak Classifier 1 Weak Classifier 2 Weak Classifier 3 Weak Classifier 4 0.9 0.7 0.5 0.3 = 0.9 + 0.7 + 0.3 = 1.9 > 1.0 (threshold) Strong Classifier Weight Classifier 2 Weak Classifier 1 Weak Classifier 2 Weak Classifier 3 Weak Classifier 4 0.9 0.7 0.5 0.3 Strong Classifier Weight = 0.5 + 0.3 = 0.8 < 1.0 (threshold)

Experiments Each classifier in the cascade is trained using the original positive examples and the same number of false positives from the previous stage or negative examples at the first stage. The resulting classifier of previous stage is used as the input of the current stage and build a new classifier with lower false positive rate The detection threshold is set using a validation set of image pairs.

Training samples A small sample of positive training examples: A pair of image patterns comprise a single example for training

Training the cascade A large number of motion and appearance filters for training the dynamic pedestrians Fewer number of appearance filters for training the static pedestrians

Training results The first five filters learned for the dynamic pedestrian detector. The six images used in the motion and appearance representation are shown for each filter The first five filters learned for the static pedestrian detector

Testing Detection results of the dynamic detector

Testing Detection results of the static detector

Pedestrian Detection Using Boosting and Covariance Features Sakrapee Paisitkriangkrai, Chunhua Shen, and Jian Zhang, IEEE T-CSVT

Covariance Features The image is divided into small overlapped regions. Each pixel in the region is converted to an eight-dimensional feature vector [ ] = = k k k k k k k Y X Y X n Y X n Y X E Y X 1 1 1 ) )( ( ), cov( μ μ + = YY XX X Y Y X Y X I I I I I I I I y x y x F 1 2 2 tan ), ( Covariance matrix is calculated from To improve the calculation time, technique which employs integral image has been applied. In other words, we compute the integral image of k k k k k k k Y X Y X

Experimental Results 1 Feature Comparsion 0.98 0.96 0.94 Detection Rate 0.92 0.9 0.88 0.86 COV, RBF SVM (g=0.01) HoG, RBF SVM (g=0.01) 0.84 HoG, Quadratic SVM COV, Quadratic SVM 0.82 HoG, Linear SVM COV, Linear SVM 0.8 0 0.002 0.004 0.006 0.008 0.01 False Positive Rate

Remarks Although, covariance features with non-linear SVM outperform many state-of-the-art techniques, it has the following disadvantages: The block size used in SVM is fixed (7x7 pixels), which means unable to capture human body parts with other rectangular shapes e.g. human limbs, torso, etc. Parameter tuning process in SVM is rather tedious. High computation time of non-linear SVM. Building a new, simpler pedestrian detector using covariance features AdaBoost with weighted Fisher linear discriminant analysis (WLDA) based weak classifiers cascaded structure.

Linear Discriminant Analysis (LDA) Motivation Project data onto a line (R n R 1 ) such that patterns become well separated (in a least square sense). Two-dimensional example

Linear Discriminant Analysis (LDA) Motivation Project data onto a line (R n R 1 ) such that patterns become well separated (in a least square sense). Two-dimensional example Best separation between two classes

Covariance features with LDA Combine covariance features with LDA and compare it against haar-like features. var[ X ] cov[ X, Y... cov[ X, I YY ] ] var[ Y ]... cov[ Y, I YY ]...... var[ I YY ] Observations It is possible to achieve a 5% test error rate using either 25 covariance features or 100 Haar-like features

Components Combine multi-dimensional covariance features with weighted LDA #1 # 2 # 3 # 4 0.9 0.7 0.5 0.3 Strong Classifier = Weight 0.9 + 0.7 + 0.3 = 1.9 > 1.0 (threshold) Trained the new features on AdaBoost framework for faster speed and high accuracy. Apply multiple layer boosting with heterogeneous features on cascaded structure

Architecture Architecture of the pedestrian detection system using boosted covariance features. Training dataset A complete set of rectangular filters (Weak classifiers ) Calculate region covariance matrix and stack upper triangle of the matrix into a vector (R n ) Apply Weighted Fisher Linear Discriminant (R n R 1 ) AdaBoost selects best weak learner with respect to the weighted error Update the sample weights F Test the predefined objective Hit rate: 99.5% False Pos: 50% T Strong Classifier

Observations Observations Covariance features The combined covariance features represent a distinct part of the human body. The 1 st covariance feature represents human legs (two parallel vertical bars) The 2 nd covariance feature captures the information of the head and the human body Compare with Haar features 1. The 1 st haar feature represents human head/shoulder contour 2. The 2 nd haar feature represents human left leg.

Experimental Results The proposed boosted covariance detector achieves about ten times faster detection speed than the conventional covariance detector (Tuzel et al. 2007). On a 360 x 288 pixels image, our system can process at around 4 frames per second. This is the first real-time covariance feature based pedestrian detector.

Experimental Results

Face Face Detection Applications Summary: Viola & Jones Face Detector Use Integral image for efficient feature extraction Use AdaBoost for feature selection Apply cascade classifier for efficient non-faces elimination Pros: Fast and robust face detector The system can be run in real-time Cons: Training stage is time consuming (1~2 weeks) depending on number of training samples and number of features used Require a lot of face training samples Discussions: Performance of face detection depends crucially on the features that are used to represent the objects Good features not only result in better generalization abilities but also require smaller training database.

Error Face Detection Applications Proposed work Similar to previous experiment, we apply covariance features to face detection The differences between our work and Viola & Jones framework: We use covariance features We adopt the weighted FDA as weak classifiers To show the better classification capability, we have trained a boosted classifier on the banana dataset with multidimensional decision stump and FDA as weak classifiers. 0.4 0.35 0.3 0.25 0.2 0.15 0.1 Train error Test error 200 400 600 800 1000 # of weak classifiers (multidimensional stump) Error 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 Train error Test error 50 100 150 200 # of weak classifiers (Fisher discriminant analysis)

Observations / Experimental results ROC curves show that covariance features significantly outperform Haarlike wavelet features when the training database size is small. As the number of samples grows, the performance difference between the two techniques decreases. ROC curve on MIT + CMU test set (250 faces) ROC curve on MIT + CMU test set (500 faces) 0.9 0.9 Correct Detection Rate 0.85 0.8 0.75 Correct Detection Rate 0.85 0.8 0.75 0.7 COV Features (250 faces) Haar Features (250 faces) 0.65 0 50 100 150 200 250 300 350 400 Number of False Positives 0.7 COV Features (500 faces) Haar Features (500 faces) 0.65 0 50 100 150 200 250 300 350 400 Number of False Positives ROC curves for our algorithm on the MIT+CMU test set.

Experimental Results Some detection results of our face detectors trained using 250 frontal faces on MIT + CMU test images

Summary Boosting AdaBoost AdaBoost for pedestrian detection using Haar features and dynamic temporal information AdaBoost for pedestrian detection using new covariance features Face detection using new covariance features

Questions?