2D Image Processing Face Detection and Recognition

Similar documents
Face detection and recognition. Detection Recognition Sally

Reconnaissance d objetsd et vision artificielle

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

PCA FACE RECOGNITION

Face recognition Computer Vision Spring 2018, Lecture 21

COS 429: COMPUTER VISON Face Recognition

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Image Analysis. PCA and Eigenfaces

Boosting: Algorithms and Applications

Face Detection and Recognition

CS 4495 Computer Vision Principle Component Analysis

ECE 661: Homework 10 Fall 2014

Example: Face Detection

CITS 4402 Computer Vision

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Dr. Ulas Bagci

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Two-Layered Face Detection System using Evolutionary Algorithm

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Principal Component Analysis

Lecture 13 Visual recognition

Linear Subspace Models

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

2D Image Processing (Extended) Kalman and particle filter

Visual Object Detection

Lecture: Face Recognition

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Machine Learning for Signal Processing Detecting faces in images

CS7267 MACHINE LEARNING

Learning theory. Ensemble methods. Boosting. Boosting: history

Corners, Blobs & Descriptors. With slides from S. Lazebnik & S. Seitz, D. Lowe, A. Efros

Advanced Introduction to Machine Learning CMU-10715

Background. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan

20 Unsupervised Learning and Principal Components Analysis (PCA)

STA 414/2104: Lecture 8

Deriving Principal Component Analysis (PCA)

Representing Images Detecting faces in images

An overview of Boosting. Yoav Freund UCSD

Boosting & Deep Learning

Boosting. Acknowledgment Slides are based on tutorials from Robert Schapire and Gunnar Raetsch

Face Recognition Using Eigenfaces

Eigenface-based facial recognition

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Advances in Computer Vision. Prof. Bill Freeman. Image and shape descriptors. Readings: Mikolajczyk and Schmid; Belongie et al.

Face Recognition Using Multi-viewpoint Patterns for Robot Vision

Eigenimaging for Facial Recognition

1 Principal Components Analysis

Unsupervised Learning: K- Means & PCA

Robotics 2 AdaBoost for People and Place Detection

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

Edges and Scale. Image Features. Detecting edges. Origin of Edges. Solution: smooth first. Effects of noise

Voting (Ensemble Methods)

Eigenfaces. Face Recognition Using Principal Components Analysis

Vlad Estivill-Castro (2016) Robots for People --- A project for intelligent integrated systems

CSE 473/573 Computer Vision and Image Processing (CVIP)

INTEREST POINTS AT DIFFERENT SCALES

Hierarchical Boosting and Filter Generation

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India

Lecture 17: Face Recogni2on

Principal Component Analysis (PCA)

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Prof. Lior Wolf The School of Computer Science Tel Aviv University. The 38th Pattern Recognition and Computer Vision Colloquium, March 2016

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

STA 414/2104: Lecture 8

Boosting: Foundations and Algorithms. Rob Schapire

Detectors part II Descriptors

Lecture 8: Interest Point Detection. Saad J Bedros

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Lecture 17: Face Recogni2on

COMS 4721: Machine Learning for Data Science Lecture 13, 3/2/2017

FACE RECOGNITION BY EIGENFACE AND ELASTIC BUNCH GRAPH MATCHING. Master of Philosophy Research Project First-term Report SUPERVISED BY

The Mathematics of Facial Recognition

Expectation Maximization

Pattern Recognition 2

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

Methods for sparse analysis of high-dimensional data, II

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Ensemble Methods for Machine Learning

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

PCA, Kernel PCA, ICA

Subspace Methods for Visual Learning and Recognition

Convolutional Neural Networks

Data Mining und Maschinelles Lernen

Iterative face image feature extraction with Generalized Hebbian Algorithm and a Sanger-like BCM rule

Neural networks and optimization

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Methods for sparse analysis of high-dimensional data, II

Transcription:

2D Image Processing Face Detection and Recognition Prof. Didier Stricker Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche Intelligenz http://av.dfki.de 1

Long range and high-speed eye tracking Video: IR long range, high speed eye-tracking Video: Eye- Tracking from mobile phone camera 2

Sliding window detection Detection

Consumer application: Apple iphoto http://www.apple.com/ilife/iphoto/

Face Detection

Sliding Windows 1. Hypothesize: try all possible rectangle locations, sizes 2. Test: classify if rectangle contains a face (and only the face) Note: 1000's more false windows then true ones. For computational efficiency, we should try to spend as little time as possible on the negative windows

Classification (Discriminative) Background Faces In some feature space

The Viola/Jones Face Detector A seminal approach to real-time object detection Training is slow, but detection is very fast Key ideas Integral images for fast feature evaluation Boosting for feature selection Attentional cascade for fast rejection of non-face windows P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001. P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.

Image Features Rectangle filters Value = (pixels in white area) (pixels in black area)

Fast computation with integral images The integral image computes a value at each pixel (x,y) that is the sum of the pixel values above and to the left of (x,y), inclusive. This can quickly be computed in one pass through the image (x,y)

Computing the integral image

Compu&ng the integral image ii(x, y-1) s(x-1, y) i(x, y) Cumula<ve row sum: s(x, y) = s(x 1, y) + i(x, y) Integral image: ii(x, y) = ii(x, y 1) + s(x, y) MATLAB: ii = cumsum(cumsum(double(i)), 2);

Computing sum within a rectangle Let A,B,C,D be the values of the integral image at the corners of a rectangle What is the sum of pixel values within the rectangle? sum = A B C + D Only 3 additions are required for any size of rectangle! D C B A

Scaling Integral image enables us to evaluate all rectangle sizes in constant time. Therefore, no image scaling is necessary. 1 2 3 4 5 6 Scale the rectangular features instead!

Feature selection For a 24x24 detection region, the number of possible rectangle features is ~160,000! At test time, it is impractical to evaluate the entire feature set. Can we create a good classifier using just a small subset of all possible features?

AdaBoost Adaptive Boosting A learning algorithm Building a strong classifier a lot of weaker ones

Boosting Boosting is a classification scheme that works by combining weak learners into a more accurate ensemble classifier A weak learner need only do better than chance Training consists of multiple boosting rounds During each boosting round, we select a weak learner that does well on examples that were hard for the previous weak learners Hardness is captured by weights attached to training examples Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence, 14(5):771-780, September, 1999.

AdaBoost Concept h( x) {1, + 1} 1 h2 ( x) {1, + 1}. h ( x) {1, + 1} T weak classifiers slightly better than random T HT ( x) = sign αtht ( x) t= 1 strong classifier

Weaker Classifiers h( x) {1, + 1} 1 h2 ( x) {1, + 1}. h ( x) {1, + 1} T weak classifiers slightly better than random l l l Each weak classifier learns by considering one simple feature T most beneficial features for classification should be selected T HT ( x) = sign αtht ( x) How to t= 1 define features? select beneficial features? train weak classifiers? manage (weight) training samples? strong associate weight classifier to each weak classifier?

The Strong Classifiers h( x) {1, + 1} 1 h2 ( x) {1, + 1}. h ( x) {1, + 1} T weak classifiers slightly better than random How good the strong one will be? T HT ( x) = sign αtht ( x) t= 1 strong classifier

Boosting It is a sequential procedure: x t=1 x t=2 x t Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1

Toy example Weak learners from the family of lines Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1 h => p(error) = 0.5 it is at chance

Toy example Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1 This one seems to be the best This is a weak classifier : It performs slightly better than chance.

Toy example Each data point has a class label: y t = +1 ( ) -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example Each data point has a class label: y t = +1 ( ) -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example Each data point has a class label: y t = +1 ( ) -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example Each data point has a class label: y t = +1 ( ) -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example f 1 f 2 f 4 f 3 The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.

AdaBoost Algorithm Given: m examples (x 1, y 1 ),, (x m, y m ) where x i X, y i Y={-1, +1} Initialize w 1 (i) = 1/m For t = 1 to T The goodness of h t is calculated over D t and the bad guesses. 1. Train learner h t with min error ε 2. Compute the hypothesis weight 3. For each example i = 1 to m w t+1 (i) = w (i) $ & t % Z t '& Output T H( x) = sign α tht t= 1 ( x) e α t e α t t = Pr i~ α = t 1 2 D ifh t (x i ) = y i ifh t (x i ) y i t [ h t ( x 1 ε t ln εt i ) y i ] The weight Adapts. The bigger e t becomes the smaller a t becomes. Boost example if incorrectly predicted. Z t is a normalization factor. Linear combination of models.

Weak Learners for Face Detection value of rectangle feature Parity (indicating the direction of the inequality sign) threshold h( x) t 1 if pf( x) > = 0 otherwise p θ t t t t window

Boosting Training set contains face and nonface examples Initially, with equal weight For each round of boosting: Evaluate each rectangle filter on each example Select best threshold for each filter Select best filter/threshold combination Reweight examples Computational complexity of learning: O(MNK) M rounds, N examples, K features

Features Selected by Boosting First two features selected by boosting: This feature combination can yield 100% detection rate and 50% false positive rate

ROC Curve for 200-Feature Classifier A 200-feature classifier can yield 95% detection rate and a false positive rate of 1 in 14.084. Not good enough! To be practical for real application, the false positive rate must be closer to 1 in 1,000,000.

Attentional Cascade We start with simple classifiers which reject many of the negative sub-windows while detecting almost all positive sub-windows Positive response from the first classifier triggers the evaluation of a second (more complex) classifier, and so on A negative outcome at any point leads to the immediate rejection of the sub-window IMAGE SUB-WINDOW Classifier 1 T T T Classifier 2 Classifier 3 FACE F NON-FACE F NON-FACE F NON-FACE

Attentional Cascade Chain classifiers that are progressively more complex and have lower false positive rates % Detection ROC Curve 100 0 % False Pos 50 0 IMAGE SUB-WINDOW Classifier 1 T T T Classifier 2 Classifier 3 FACE F NON-FACE F NON-FACE F NON-FACE

Cascaded Classifier IMAGE SUB-WINDOW 50% 1 Feature 5 Features 20% 2% 20 Features FACE F F F NON-FACE NON-FACE NON-FACE A 1 feature classifier achieves 100% detection rate and about 50% false positive rate. A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) using data from previous stage. A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative)

Training the Cascade Set target detection and false positive rates for each stage Keep adding features to the current stage until its target rates have been met Need to lower AdaBoost threshold to maximize detection (as opposed to minimizing total classification error) Test on a validation set If the overall false positive rate is not low enough, then add another stage Use false positives from current stage as the negative training examples for the next stage

The Implemented System Training Data 5000 faces All frontal, rescaled to 24x24 pixels 300 million non-faces 9500 non-face images Faces are normalized Scale, translation Many variations Across individuals Illumination Pose

Structure of the Detector Cascade Combining successively more complex classifiers in cascade 38 stages included a total of 6060 features All Sub-Windows T T T T T T T T T 1 2 3 4 5 6 7 8 38 Face F F F F F F F F F Reject Sub-Window

Structure of the Detector Cascade All Sub-Windows 2 features, reject 50% non-faces, detect 100% faces 10 features, reject 80% non-faces, detect 100% faces 25 features 50 features by algorithm T T T T T T T T T 1 2 3 4 5 6 7 8 38 Face F F F F F F F F F Reject Sub-Window

Speed of the Final Detector On a 700 Mhz Pentium III processor, the face detector can process a 384 288 pixel image in about.067 seconds 15 Hz 15 times faster than previous detector of comparable accuracy (Rowley et al., 1998) Average of 8 features evaluated per window on test set

Output of Face Detector on Test Images

Other Detection Tasks Facial Feature Localization Profile Detection Male vs. female

Viola Jones face detector Adaboost for face detection Rectangle features Integral images for fast computation Boosting for feature selection Attentional cascade for fast rejection of negative windows

Face detection and recognition Detection Recognition Sally

Face recognition: overview Typical scenario: few examples per face, identify or verify test example What s hard: changes in expression, lighting, age, occlusion, viewpoint Basic approaches (all nearest neighbor) 1. Project into a new subspace (or kernel space) (e.g., Eigenfaces =PCA) 2. Measure face features 3. Make 3d face model, compare shape+appearance (e.g., AAM)

Typical face recognition scenarios Verification: a person is claiming a particular identity; verify whether that is true E.g., security Closed-world identification: assign a face to one person from among a known set General identification: assign a face to a known person or to unknown

What makes face recognition hard? Expression

What makes face recognition hard? Lighting

What makes face recognition hard? Occlusion

What makes face recognition hard? Viewpoint

Simple idea for face recognition 1. Treat face image as a vector of intensities 2. Recognize face by nearest neighbor in database x y...y 1 n k = argmin k k y x

The space of all face images When viewed as vectors of pixel values, face images are extremely high-dimensional 100x100 image = 10,000 dimensions Slow and lots of storage But very few 10,000-dimensional vectors are valid face images We want to effectively model the subspace of face images

The space of all face images Eigenface idea: construct a low-dimensional linear subspace that best explains the variation in the set of face images

Principal Component Analysis (PCA) Given: N data points x 1,,x N in R d We want to find a new set of features that are linear combinations of original ones: u(x i ) = u T (x i µ) (µ: mean of data points) Choose unit vector u in R d that captures the most data variance Forsyth & Ponce, Sec. 22.3.1, 22.3.2

Principal Component Analysis Direction that maximizes the variance of the projected data: Maximize N Projection of data point N subject to u =1 Covariance matrix of data The direction that maximizes the variance is the eigenvector associated with the largest eigenvalue of Σ (can be derived using Raleigh s quotient or Lagrange multiplier)

Principal component analysis The direction that captures the maximum covariance of the data is the eigenvector corresponding to the largest eigenvalue of the data covariance matrix Furthermore, the top k orthogonal directions that capture the most variance of the data are the k eigenvectors corresponding to the k largest eigenvalues

Eigenfaces: Key idea Assume that most face images lie on a low-dimensional subspace determined by the first k (k<d) directions of maximum variance Use PCA to determine the vectors or eigenfaces u 1, u k that span that subspace Represent all face images in the dataset as linear combinations of eigenfaces M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991

Eigenfaces example Training images x 1,,x N

Eigenfaces example Top eigenvectors: u 1, u k Mean: µ

Eigenfaces example Face x in face space coordinates: =

Eigenfaces example Face x in face space coordinates: = Reconstruction: = + ^ x = µ + w 1 u 1 +w 2 u 2 +w 3 u 3 +w 4 u 4 +

Recognition with eigenfaces Process labeled training images: Find mean µ and covariance matrix Σ Find k principal components (eigenvectors of Σ) u 1, u k Project each training image x i onto subspace spanned by principal components: (w i1,,w ik ) = (u 1T (x i µ),, u kt (x i µ)) Given novel image x: Project onto subspace: (w 1,,w k ) = (u 1T (x µ),, u k T (x µ)) Optional: check reconstruction error x x ^ to determine whether image is really a face Classify as closest training face in k-dimensional subspace M. Turk and A. Pentland, Face Recognition using Eigenfaces, CVPR 1991

Limitations Global appearance method: not robust to misalignment, background variation

Limitations PCA assumes that the data has a Gaussian distribution (mean µ, covariance matrix Σ) The shape of this dataset is not well described by its principal components

Limitations The direction of maximum variance is not always good for classification

Face verification using deep networks Facebook s DeepFace approach: elaborate 2D-3D alignment followed by a deep network achieves near-human accuracy (for cropped faces) on face verification Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR 2014.

Face verification using deep networks Alignment pipeline. (a) The detected face, with 6 initial fiducial points. (b) The induced 2D-aligned crop. (c) 67 fiducial points on the 2D-aligned crop with their corresponding Delaunay triangulation, we added triangles on the contour to avoid discontinuities. (d) The reference 3D shape transformed to the 2Daligned crop image-plane. (e) Triangle visibility w.r.t. to the fitted 3D-2D camera; darker triangles are less visible. (f) The 67 fiducial points induced by the 3D model that are used to direct the piece-wise affine warping. (g) The final frontalized crop. (h) A new view generated by the 3D model (not used in this paper). Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR 2014.

Face verification using deep networks Outline of the DeepFace architecture. A front-end of a single convolution-pooling-convolution filtering on the rectified input, followed by three locally-connected layers and two fully-connected layers. Colors illustrate feature maps produced at each layer. The net includes more than 120 million parameters, where more than 95% come from the local and fully connected layers. Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR 2014.

Face verification using deep networks This method reaches an accuracy of 97.35% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 27%, closely approaching human-level performance. Y. Taigman, M. Yang, M. Ranzato, L. Wolf, DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR 2014.

Thank you!