INTRODUCTION HIERARCHY OF CLASSIFIERS

Similar documents
PROCEEDINGS OF SPIE. , "Front Matter: Volume 6814," Proc. SPIE 6814, Computational Imaging VI, (12 May 2008); doi: /12.

Multi-Layer Boosting for Pattern Recognition

COS 429: COMPUTER VISON Face Recognition

Visual Object Detection

A Hierarchy of Support Vector Machines for Pattern Detection

Self-Normalized Linear Tests

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Self-Normalized Linear Tests

Learning theory. Ensemble methods. Boosting. Boosting: history

Discriminative part-based models. Many slides based on P. Felzenszwalb

Performance Evaluation and Comparison

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

Object Detection Grammars

Hierarchical Boosting and Filter Generation

Introduction to Boosting and Joint Boosting

Clustering with k-means and Gaussian mixture distributions

Pictorial Structures Revisited: People Detection and Articulated Pose Estimation. Department of Computer Science TU Darmstadt

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Clustering with k-means and Gaussian mixture distributions

Reconnaissance d objetsd et vision artificielle

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

Distinguish between different types of scenes. Matching human perception Understanding the environment

Face detection and recognition. Detection Recognition Sally

PCA FACE RECOGNITION

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Boosting: Algorithms and Applications

Introduction to Machine Learning

Presentation in Convex Optimization

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Clustering with k-means and Gaussian mixture distributions

Learning Methods for Linear Detectors

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Graphical Object Models for Detection and Tracking

Probabilistic Graphical Models & Applications

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Representing Images Detecting faces in images

ABC random forest for parameter estimation. Jean-Michel Marin

Models, Data, Learning Problems

Algorithm-Independent Learning Issues

10-701/ Machine Learning - Midterm Exam, Fall 2010

Machine Learning Techniques for Computer Vision

The state of the art and beyond

Undirected Graphical Models

Stochastic Gradient Descent

Fast Human Detection from Videos Using Covariance Features

Intelligent Systems Discriminative Learning, Neural Networks

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Machine Learning Basics

Biometrics: Introduction and Examples. Raymond Veldhuis

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine Learning for Signal Processing Detecting faces in images

Bagging and Other Ensemble Methods

The Origin of Deep Learning. Lili Mou Jan, 2015

Diagnostics. Gad Kimmel

Introduction to Supervised Learning. Performance Evaluation

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

CIKM 18, October 22-26, 2018, Torino, Italy

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Beyond Spatial Pyramids

Classifier Performance. Assessment and Improvement

Gaussian with mean ( µ ) and standard deviation ( σ)

Loss Functions and Optimization. Lecture 3-1

CS446: Machine Learning Spring Problem Set 4

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Recognition Performance from SAR Imagery Subject to System Resource Constraints

Global Scene Representations. Tilke Judd

Multilayer Neural Networks

Not so naive Bayesian classification

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Bayesian Methods for Machine Learning

Bregman divergence and density integration Noboru Murata and Yu Fujimoto

Roadmap. Introduction to image analysis (computer vision) Theory of edge detection. Applications

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Mining Classification Knowledge

CS534 Machine Learning - Spring Final Exam

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Final Examination CS 540-2: Introduction to Artificial Intelligence

CS6220: DATA MINING TECHNIQUES

Deformable Template As Active Basis

Statistical learning. Chapter 20, Sections 1 3 1

Linear Classifiers as Pattern Detectors

PDEEC Machine Learning 2016/17

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

DynamicBoost: Boosting Time Series Generated by Dynamical Systems

Lecture 13 Visual recognition

Machine Learning (CS 567) Lecture 3

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Region Covariance: A Fast Descriptor for Detection and Classification

Advanced statistical methods for data analysis Lecture 2

CSCI-567: Machine Learning (Spring 2019)

Estimating Parameters

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Stephen Scott.

Information Theory in Computer Vision and Pattern Recognition

Transcription:

INTRODUCTION DETECTION AND FOLDED HIERARCHIES FOR EFFICIENT DONALD GEMAN JOINT WORK WITH FRANCOIS FLEURET 2 / 35 INTRODUCTION DETECTION (CONT.) HIERARCHY OF CLASSIFIERS...... Advantages - Highly efficient scene parsing; - Organized focusing on hard examples. Disadvantages...... - Many classifiers to learn; - Inefficient learning (unless training examples can be synthesized). 3 / 35 4 / 35

INTRODUCTION RELATED WORK Part-based Models - Constellation (M. Weber, M. Welling, P. Perona), - Composite (D. J. Crandall, D. P. Huttenlocher), - Implicit Shape (E. Seemann, M. Fritz, B. Schiele), - Patchwork of Parts (Y. Amit, A. Trouvé), - Compositional (S. Geman). Wholistic - Convolutional Networks (Y. Lecun), - Boosted Cascades (P. Viola, M. Jones), - Bag-of-Features (A.Zisserman, F.F.Li,). INTRODUCTION POSE SPACE Let Y be the space of poses and Y 1,..., Y K a partition. Let I be an image and Y = (Y 1,..., Y K ), where, for each k, Y k is a Boolean variable stating if there is an object in I with pose in Y k. 5 / 35 6 / 35 INTRODUCTION POSE SPACE (CONT.) For faces, typically y = (u c, v c, θ, s). Cats are more complex. TRAINING A DETECTOR FRAGMENTATION Given a training set {( I (t), Y (t))} t, we are interested in building, for each k, a classifier f k : I {0, 1} A coarse description is y = (u h, v h, s h, u b, v b ). for predicting Y k. Without additional knowledge about the relationship between k, Y k and I, we would train f k with {( )} I (t), Y (t) k Hence, each f k is trained with a fragment ( 1/K ) of the positive population. t. 7 / 35 8 / 35

TRAINING A DETECTOR AGGREGATION TRAINING A DETECTOR AGGREGATION (CONT.) To avoid fragmentation, samples are often normalized in pose. Let ξ : I R N denote an N-dimensional vector of features. Let ψ : {1,..., K } I I be a transformation such that the conditional distribution of ξ(ψ(k, I)) given Y k = 1 does not depend on k. Then train a single classifier g with the training set {( ( ψ k, I (t)) )}, Y (t) k and define f k (I) = g(ψ(k, I)). t, k Here, samples are aggregated for training: all positive samples are used to build each f k. 9 / 35 10 / 35 TRAINING A DETECTOR AGGREGATION (CONT.) However: - Evaluating ψ is computationally intensive for any non-trivial transformation. - The mapping ψ does not exist for a complex pose. INTRODUCTION We directly index the features with the pose. {1,..., K } I I ψ X ξ R N Hence, in practice, fragmentation is still the norm to deal with many deformations. But how does on deal with complex poses? 11 / 35 12 / 35

DEFINITION DEFINITION (CONT.) Hence, we define a family of pose-indexed features as a mapping X : {1,..., K } I R N such that they are stationary in the following sense: for every k {1,..., K }, the probability distribution does not depend on k. P(X(k, I) = x Y k = 1), x R N 13 / 35 14 / 35 TOY EXAMPLE, 1D SIGNAL { } Y = (θ 1, θ 2 ) {1,..., N} 2, 1 < θ 1 < θ 2 < N P(I Y = (θ 1, θ 2 )) = φ 0 (I n ) φ 1 (I n ) φ 0 (I n ) n<θ 1 θ 1 n θ 2 θ 2 <n TRAINING We then define f k (I) = g(x(k, I)), k {1,..., K }, 2 1.5 1 0.5 0 2 1.5 1 0.5 0 where one single classifier g : R N {0, 1} is trained with {( ( X k, I (t)) )}, Y (t) k t, k. -0.5-0.5-1 -1 0 5 10 15 20 25 30 0 5 10 15 20 25 30 X((θ 1, θ 2 ), I) = (I(θ 1 1), I(θ 1 ), I(θ 2 ), I(θ 2 + 1)) Notice that stationarity is the main condition required to ensure that the training samples are identically distributed. P(X = (x 1, x 2, x 3, x 4 ) Y = (θ 1, θ 2 )) = φ 0 (x 1 )φ 1 (x 2 )φ 1 (x 3 )φ 0 (x 4 ) 15 / 35 16 / 35

EDGE DETECTORS EDGE DETECTORS (CONT.) Let e φ,σ (I, u, v) be the presence on an edge in image I at location (u, v) with orientation φ {0,..., 7} at scale σ {1, 2, 4}. σ = 1 17 / 35 18 / 35 EDGE DETECTORS (CONT.) EDGE DETECTORS (CONT.) σ = 2 σ = 4 19 / 35 20 / 35

BASE FEATURES We use three types of stationary features: The windows W and W are indexed either with respect to the head or with respect to the head-belly. - Proportion of an edge (φ, σ) in a window W. - L 1 -distance between the histograms of orientations at scale σ in two windows W and W. - L 1 -distance between the histograms of gray-levels in two windows W and W. h h Head registration 21 / 35 h h 22 / 35 CLASSIFIER FOLDED HIERARCHIES b SUMMARY b We build classifiers with Adaboost and an asymmetric weighting by sampling. If the weighted error rate is L(h) = t,k ω t,k 1 { }, h(k, I (t) ) Z (t) k we use all the positive samples, and sub-sample negative ones with µ(k, t) ω t,k 1 { Y (t) k =0}. Total of 2327 scenes containing 1683 cats, 85% for training. Experiments (not shown) demonstrate: - Fragmentation applied Head-belly to complex registration poses is disastrous in practice; - Naive, brute-force exploration of the pose space is computationally intractable. Alternatively: - Stationary features largely avoid fragmentation; - Hierarchical search concentrates computation on ambiguous regions. We call this alternative a folded hierarchy of classifiers. 23 / 35 24 / 35

FOLDED HIERARCHIES SUMMARY (CONT.) SCENE PARSING STRATEGY g 1 g 2 25 / 35 26 / 35 SCENE PARSING ERROR CRITERION SCENE PARSING ERROR RATES TP H B HB 85% 11.29 (3.61) 80% 17.23 (3.87) 5.07 (1.08) 70% 11.40 (2.89) 1.88 (0.32) 60% 7.80 (1.98) 0.95 (0.22) 50% 5.41 (1.62) 0.53 (0.13) Average number of FAs per 640 480 27 / 35 28 / 35

S CENE PARSING S CENE PARSING E RROR R ESULTS ( PICKED RATES ( CONT.) AT RANDOM ) 1 H B HB True positive rate 0.8 0.6 0.4 0.2 0 0.001 0.01 0.1 1 10 Average number of FAs per 640x480 100 29 / 35 30 / 35 S CENE PARSING S CENE PARSING R ESULTS ( PICKED R ESULTS ( SELECTED AT RANDOM, CONT.) 31 / 35 FALSE ALARMS ) 32 / 35

SCENE PARSING SELECTED CONCLUSION A folded hierarchy of classifiers is highly efficient: - For (offline) learning; - For (online) scene processing. It combines the strengths of: - Template matching; - Powerful, wholistic machine learning; - Hierarchical search. However, this is achieved at the expense of - Rich annotation of the training samples. - Designing stationary features. 33 / 35 34 / 35 ACKNOWLEDGMENT 35 / 35