MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

Similar documents
6.867 Machine learning

Lecture 3: Pattern Classification. Pattern classification

Issues and Techniques in Pattern Classification

Linear Discriminant Functions

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Kernel Machine Based Learning For Multi-View Face Detection and Pose Estimation

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Reconnaissance d objetsd et vision artificielle

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

MACHINE LEARNING ADVANCED MACHINE LEARNING

Machine Learning Lecture 2

Statistical Learning Reading Assignments

COS 429: COMPUTER VISON Face Recognition

Support Vector Machine (continued)

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Linear Classifiers. Michael Collins. January 18, 2012

Anti-Faces for Detection

COMS 4771 Introduction to Machine Learning. Nakul Verma

INF Introduction to classifiction Anne Solberg

Face detection and recognition. Detection Recognition Sally

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Discriminative Direction for Kernel Classifiers

The maximum margin classifier. Machine Learning and Bayesian Inference. Dr Sean Holden Computer Laboratory, Room FC06

Metric-based classifiers. Nuno Vasconcelos UCSD

Machine Learning 4771

The Bayes classifier

Machine Learning 2017

1 Kernel methods & optimization

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

2D Image Processing Face Detection and Recognition

MACHINE LEARNING ADVANCED MACHINE LEARNING

5. Discriminant analysis

Characterization of Jet Charge at the LHC

Principal Component Analysis

L11: Pattern recognition principles

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Linear & nonlinear classifiers

Inf2b Learning and Data

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Clustering. Introduction to Data Science. Jordan Boyd-Graber and Michael Paul SLIDES ADAPTED FROM LAUREN HANNAH

Two-Layered Face Detection System using Evolutionary Algorithm

c 4, < y 2, 1 0, otherwise,

Performance Evaluation and Comparison

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Machine Learning Lecture 2

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Final Examination CS540-2: Introduction to Artificial Intelligence

Linear Models for Classification

Qualifying Exam in Machine Learning

Machine Learning, Midterm Exam

Classification: The rest of the story

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Machine Learning Practice Page 2 of 2 10/28/13

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

Presented by. Committee

Minimax risk bounds for linear threshold functions

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Notation. Pattern Recognition II. Michal Haindl. Outline - PR Basic Concepts. Pattern Recognition Notions

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Learning Methods for Linear Detectors

EECS 545 Project Progress Report Sparse Kernel Density Estimates B.Gopalakrishnan, G.Bellala, G.Devadas, K.Sricharan

Bayes Decision Theory - I

Machine Learning Lecture 3

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Machine learning for pervasive systems Classification in high-dimensional spaces

What is semi-supervised learning?

CS446: Machine Learning Spring Problem Set 4

Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Pattern Recognition. Parameter Estimation of Probability Density Functions

Formulation with slack variables

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Midterm Exam, Spring 2005

MinOver Revisited for Incremental Support-Vector-Classification

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

Name (NetID): (1 Point)

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Linear discriminant functions

Jeff Howbert Introduction to Machine Learning Winter

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Linear & nonlinear classifiers

Chemometrics: Classification of spectra

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Bayesian Decision Theory

Part of the slides are adapted from Ziko Kolter

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Pattern Recognition and Machine Learning

Learning From Data: Modelling as an Optimisation Problem

PAC-learning, VC Dimension and Margin-based Bounds

A Statistical Analysis of Fukunaga Koontz Transform

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

1 Learning Linear Separators

Transcription:

MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1

The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect occurrences of this pattern in an image. Detection should be: Accurate (small number of mis-detections/falsealarms). As fast as possible.

Eample Face detection in images

Face Eamples

Variations in Faces Faces may vary in their appearance: Scale Location Illumination condition Pose Identity Facial epression

An input Image Typical Approach Detected Face Positions Compose a pyramid with 1:f resolution ratio (f=1.) Etract blocks from each location in each resolution layer Classifier Face Finder

The above type of algorithm is capable of finding faces: In various scales In various locations Brute Force Search In various illumination conditions In various pose Of various identities In various facial epressions Trained Classifier

Compleity Searching for faces in a 10001000 image, the classifier is applied 1e6 times The algorithm compleity is dominated by the classifier

Pattern Detection as a Classification Problem Pattern detection requires a separation between two classes: a. The Target class. b. The Clutter class. Given an input pattern Z, we would like to classify it as Target or Clutter.

Definition: A classifier is a non-linear parametric function C(Z,θ) of the form: ( ) n : R { + 1, 1} C Z,θ A simple eample: For blocks of 4 piels [z 1,z,z 3,z 4 ] we can define C{Z} as: C(Z,θ)= sign(θ 0 + θ 1 z 1 + θ z + θ 3 z 3 + θ 4 z 4 )

Geometric View C(Z) draws a separating manifold between the two classes C(Z)=0 +1-1 R n

Supervised Learning: In order to obtain a precise classifier we must find a good choice of parameters θ: C { Z, θ} : R n { + 1, 1} Use eamples with known labeling to find a good set of parameters

Eample Based Classifiers Given two training sets: { } N { } N k k=1 We want to find a set of parameters θ such that: k k =1 C {, θ} = + 1 {, θ} = 1 k C k We "hope" that the generalization is correct.

Linear Classifiers The Linear Classifiers are the simplest ones. The decision is based on the projection of an input signal Z onto a kernel θ. C(Z,θ) = sign { } T Z θ θ 0 The parameters {θ,θ 0 } define a separating hyperplane.

T Z θ < θ 0 T Z θ > θ 0 θ θ 0

Designing Optimal Linear Classifiers θ 0 T θ T θ θ False Alarms Mis-Detections To Be Minimized

The Fisher Linear Classifiers Choose the projection kernel θ that maimizes the Mahalanobis-distance between T θ and T θ d M µ ( i, j) = i i + µ j j maimize minimize

The Support Vector Machine (SVM) Choose the projection kernel θ that maimizes the classes margin d min. (Vapnik-Chernoveskis 8) { } N k k 1 = { } L k k = 1 d min

The Support Vector Machine - Continue The support vectors are those eamples realizing the minimal distance. The decision function is composed of a linear combination of these vectors. The optimal projection that emerges turns out to be the solution of a QP problem. Generalization error is bounded, and the SVM acheives the tightest bound.

The Limitations of Linear Classifiers The above classifiers are suitable for linearly seperable classes (or close to this). In other cases: Generalize to non-linear classifiers. Map into higher dimensional feature space. Complicated!!

Face Detection - Previous Works Rowley & Kanade (98), Juel & March (96): Neural Network approach - Non linear classifier. Sung & Poggio (98): Clustering the faces/non-faces into sub-groups, and RBF classifier- Non Linear. Osuna, Freund, & Girosi (97): Support Vector Machine - Classification in high dimensional feature space. Keren & Gotsman (98): Anti-Faces method - finding a linear kernel that is orthogonal to faces and smooth.

Our Approach - Two Steps Back Observations: A typical configuration in Pattern Detection is that the Target class is surrounded by the Clutter class. P{Target}<<P{Clutter} Conclusions: A pdf separation is not appropriate. Clutter labeling should be performed fast.

Distance Definition: Define a distance of a point from a pdf p (y): ( ) D,p = y ( y) p ( y) ( µ ) y dy = + p y Consequally, we define the distance of p () from p (y): (,p ) D p = ( µ ) + p ( ) d = ( µ µ ) + +

(,p ) D p = ( µ µ ) + + Alternatively, we can define the proimity of p to p y : Pr o ( p,p ) = ( µ µ ) + + Note, that the distance is asymmetric. p p D(p,p ) < D(p,p ):

Optimal Classifier for Pattern Detection We would like to find a projection kernel θ which minimizes the overlap between p =p( T θ) and p y =p( T θ): E(θ) = P() ( µ µ ) y y + + ( ) ( ) If P()=P() we obtain the Fisher Linear Classifier. In Pattern Detection P()<<P(), hence we get: y + P µ µ y + + y θ = argmin ( µ µ ) + + y y

θ = argmin ( ) µ µ + + The penalty term can be minimized by two alternatives: Maimize the between class distance. Minimize while maimizing y. The second alternative is more common in Pattern Detection. y y Minimize Maimize

Maimal Rejection The optimal θ assures that most Z are distant from. Rejection: two decision levels [d 1,d ] such that the number of rejected clutters is maimized while finding all targets. Projection onto θ 1 d d 1 Rejected points

Successive Rejection Following the first rejection as many as possible clutters were classified, while targets remain unclassified. In order to further reject clutter, we apply the maimal rejection technique to the remaining classes. Projection onto θ Projection onto θ 1 d d 1 Rejected points

Formal Derivation Formal Derivation In practice we have only samples from P and P. The class means are estimated: The class covariances are estimated: The means and variances after projection onto θ are: = k k L 1 M = k k y y L 1 M ( )( ) = k T k k M M L 1 S ( )( ) = k T k k M M L 1 S T T M θ µ and M θ µ = = θ M θ and θ M θ T T = =

Formal Derivation - Cont. The optimal θ minimizes the following term: ( ) E θ = ( µ µ ) y + + y = = θ T θ S θ A θ [( )( ) ] T T M M M M + S + S θ θ B θ T = θ T This term can be rewritten as a generalized eigenvalue problem: A θ = λbθ The solution is the eigenvector corresponding to the smallest eigenvalue.

Proposed Algorithm There are two phases to the algorithm: Training: Compute the projection kernels, and their thresholds. This process is performed ONCE and off line. Testing: Given an image, find targets using the above found kernels and thresholds.

Testing Stage No more Kernels Target { θ d 1, d } J j 1, = Project onto the net Kernel Is value in,d? [ d ] 1 j No Clutter es j=j+1

D eample 1 875 178 falsealarmalarms 136 false- falsealarms 3 4 66 falsealarms 5 37 falsealarms 6 35 falsealarms

Limitations The discriminated zone is a parallelogram polytope. Thus, if the target set is non-conve, zero false alarm discrimination is impossible!! Even if the target-set is conve, convergence to zero false-alarms is not guaranteed.

Face Detection Results Kernels for finding faces (1515) and eyes (715). Searching for eyes and faces sequentially - very efficient! Face DB: 04 images of 40 people (ORL-DB). Each image is also rotated ±5 and vertically flipped. This produced 14 Face images. Non-Face DB: 54 images - All the possible positions in all resolution layers and vertically flipped - about 40E6 non-face images.

Results Out of 44 faces, 10 faces are undetected, and 1 false alarm (the undetected faces are circled - they are either rotated or shadowed)

Results All faces detected with no false alarms

Results All faces detected with 1 false alarm(looking closer, this false alarm can be considered as a face)

Compleity For a set of 15 kernels (with appropriate decision levels), the first kernel typically removes about 90% of the piels from further consideration. Other kernels give typically a rejection factor of 50%. Thus, the algorithm requires slightly more that one convolution of the image with a kernel (per each resolution layer). The algorithm gives very good results (probability of detection and false alarm rate) for the tests we did on frontal faces in images.

Relation to Anti-Faces (Keren & Gotsman 98) Projection kernels are in Null() and smooth. This can be seen as a case where the pdf of the Clutter class is defined parametrically P ( ) T T Z ( D D)Z Z e where D is a derivative operator. In this case S =(D T D) -1, where in the Fourier basis (D T D) -1 diag(1, 1/, 1/3, ) If M M y, minimizing the following term tends to give θ Null(S ), and smooth: E ( θ ) = θ T θ [ T ( M M )( M M ) + S + S ]θ T S θ

Conclusions MRC: projection onto pre-trained kernels, and thresholding. The process is a rejection based classification. Appropriate for pattern detection where pdf separation is impossible. Eploits the fact that P(clutter)>>P(target) Gives a very fast clutter labeling at the epense of slow target labeling. Can also deal with non linearly separable classes (convely separable). Simple to apply (linear), with promising results for facedetection in images. A generalization of the Fisher Linear Classifier. END