COS 429: COMPUTER VISON Face Recognition

Similar documents
PCA FACE RECOGNITION

Lecture: Face Recognition

Face recognition Computer Vision Spring 2018, Lecture 21

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Lecture 13 Visual recognition

Example: Face Detection

Face detection and recognition. Detection Recognition Sally

2D Image Processing Face Detection and Recognition

Lecture 17: Face Recogni2on

ECE 661: Homework 10 Fall 2014

Reconnaissance d objetsd et vision artificielle

Lecture 17: Face Recogni2on

Boosting: Algorithms and Applications

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

CITS 4402 Computer Vision

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis

Advanced Introduction to Machine Learning CMU-10715

Deriving Principal Component Analysis (PCA)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Face Detection and Recognition

Non-parametric Classification of Facial Features

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Unsupervised Learning: K- Means & PCA

Dr. Ulas Bagci

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Principal Component Analysis (PCA)

Two-Layered Face Detection System using Evolutionary Algorithm

Eigenimaging for Facial Recognition

Linear Subspace Models

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Image Analysis. PCA and Eigenfaces

CS7267 MACHINE LEARNING

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

CS 4495 Computer Vision Principle Component Analysis

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

What is Principal Component Analysis?

A Unified Bayesian Framework for Face Recognition

Random Sampling LDA for Face Recognition

Principal Component Analysis (PCA)

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

Machine Learning - MT & 14. PCA and MDS

Face Recognition Using Eigenfaces

Advanced Machine Learning & Perception

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

STA 414/2104: Lecture 8

Visual Object Detection

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition. Lauren Barker

Advanced Machine Learning & Perception

Eigenface-based facial recognition

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Learning theory. Ensemble methods. Boosting. Boosting: history

Eigenfaces. Face Recognition Using Principal Components Analysis

Comparative Assessment of Independent Component. Component Analysis (ICA) for Face Recognition.

Boosting & Deep Learning

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

CS 559: Machine Learning Fundamentals and Applications 5 th Set of Notes

Multi-Layer Boosting for Pattern Recognition

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

PCA and LDA. Man-Wai MAK

Face Recognition Using Multi-viewpoint Patterns for Robot Vision

Eigenimages. Digital Image Processing: Bernd Girod, 2013 Stanford University -- Eigenimages 1

Learning with multiple models. Boosting.

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

W vs. QCD Jet Tagging at the Large Hadron Collider

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

INTRODUCTION HIERARCHY OF CLASSIFIERS

Hierarchical Boosting and Filter Generation

An Efficient Pseudoinverse Linear Discriminant Analysis method for Face Recognition

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

An overview of Boosting. Yoav Freund UCSD

L11: Pattern recognition principles

Pattern Recognition 2

FACE RECOGNITION BY EIGENFACE AND ELASTIC BUNCH GRAPH MATCHING. Master of Philosophy Research Project First-term Report SUPERVISED BY

20 Unsupervised Learning and Principal Components Analysis (PCA)

Discriminant Uncorrelated Neighborhood Preserving Projections

Dimensionality reduction

Kernel Machine Based Learning For Multi-View Face Detection and Pose Estimation

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Subspace Methods for Visual Learning and Recognition

Lecture: Face Recognition and Feature Reduction

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Lecture 7: Con3nuous Latent Variable Models

1 Principal Components Analysis

6.867 Machine Learning

Symmetric Two Dimensional Linear Discriminant Analysis (2DLDA)

Transcription:

COS 429: COMPUTER VISON Face Recognition Intro to recognition PCA and Eigenfaces LDA and Fisherfaces Face detection: Viola & Jones (Optional) generic object models for faces: the Constellation Model Reading: Turk & Pentland,???

Face Recognition Digital photography

Face Recognition Digital photography Surveillance

Face Recognition Digital photography Surveillance Album organization

Face Recognition Digital photography Surveillance Album organization Person tracking/id.

Face Recognition Digital photography Surveillance Album organization Person tracking/id. Emotions and expressions

Face Recognition Digital photography Surveillance Album organization Person tracking/id. Emotions and expressions Security/warfare Tele-conferencing Etc.

What s recognition? vs. Identification or Discrimination

What s recognition? vs. Identification or Discrimination Categorization or Classification

What s recognition? Yes, there are faces No localization Identification or Discrimination Categorization or Classification

What s recognition? Yes, there is John Lennon No localization Identification or Discrimination Categorization or Classification

What s recognition? Detection or Localizatoin John Lennon No localization Identification or Discrimination Categorization or Classification

What s recognition? No localization Detection or Localizatoin Identification or Discrimination Categorization or Classification

What s recognition? No localization Detection or Localizatoin Identification or Discrimination Categorization or Classification

Today s agenda Detection or Localizatoin 3. AdaBoost 4. Constellation model No localization 1. PCA & Eigenfaces 2. LDA & Fisherfaces Identification or Discrimination Categorization or Classification

Eigenfaces and Fishfaces Introduction Techniques Principle Component Analysis (PCA) Linear Discriminant Analysis (LDA) Experiments

The Space of Faces = + An image is a point in a high dimensional space An N x M image is a point in R NM We can define vectors in this space as we did in the 2D case [Thanks to Chuck Dyer, Steve Seitz, Nishino]

Key Idea χ = xˆ { P RL Images in the possible set are highly correlated. So, compress them to a low-dimensional subspace that captures key appearance characteristics of the visual DOFs. } EIGENFACES: [Turk and Pentland] USE PCA!

Two simple but useful techniques For example, a generative graphical model: P(identity,image) = P(identiy image) P(image) Preprocessing model (can be performed by PCA)

Principal Component Analysis (PCA) PCA is used to determine the most representing features among data points. It computes the p-dimensional subspace such that the projection of the data points onto the subspace has the largest variance among all p-dimensional subspaces.

Illustration of PCA x2 4 x2 4 1 2 6 1 2 6 3 5 3 5 X1 x1 x1 One projection PCA projection

Illustration of PCA x 2 2rd principal component x 1 1 st principal component

Eigenface for Face Recognition PCA has been used for face image representation/compression, face recognition and many others. Compare two faces by projecting the images into the subspace and measuring the EUCLIDEAN distance between them.

Mathematical Formulation Find a transformation, W, m-dimensional Orthonormal n-dimensional Total scatter matrix: W opt corresponds to m eigenvectors of S T

Eigenfaces PCA extracts the eigenvectors of A Gives a set of vectors v 1, v 2, v 3,... Each one of these vectors is a direction in face space what do these look like?

Projecting onto the Eigenfaces The eigenfaces v 1,..., v K span the space of faces A face is converted to eigenface coordinates by

Algorithm Training 1. Align training images x 1, x 2,, x N Note that each image is formulated into a long vector! 2. Compute average face u = 1/N Σ x i 3. Compute the difference image φ i = x i u

Algorithm 4. Compute the covariance matrix (total scatter matrix) S T = 1/NΣφ i φ it = BB T, B=[φ 1, φ 2 φ N ] 5. Compute the eigenvectors of the covariance matrix, W Testing 1. Projection in Eigenface Projection ω i = W (X u), W = {eigenfaces} 2. Compare projections

Illustration of Eigenfaces The visualization of eigenvectors: These are the first 4 eigenvectors from a training set of 400 images (ORL Face Database). They look like faces, hence called Eigenface.

Eigenfaces look somewhat like generic faces.

Eigenvalues

Reconstruction and Errors P = 4 P = 200 P = 400 Only selecting the top P eigenfaces reduces the dimensionality. Fewer eigenfaces result in more information loss, and hence less discrimination between faces.

Summary for PCA and Eigenface Non-iterative, globally optimal solution PCA projection is optimal for reconstruction from a low dimensional basis, but may NOT be optimal for discrimination

Linear Discriminant Analysis (LDA) Using Linear Discriminant Analysis (LDA) or Fisher s Linear Discriminant (FLD) Eigenfaces attempt to maximise the scatter of the training images in face space, while Fisherfaces attempt to maximise the between class scatter, while minimising the within class scatter.

Illustration of the Projection Using two classes as example: x2 x2 x1 x1 Poor Projection Good Projection

Comparing with PCA

Variables N Sample images: c classes: Average of each class: Total average: { x,, } μ 1 L x N { χ,, } i = μ = 1 L 1 N 1 i χ c x k x k χ i N x k N k= 1

Scatters Scatter of class i: ( )( ) T i k x i k i x x S i k μ μ χ = = = c i S W S i 1 ( )( ) = = c i T i i i S B 1 μ μ μ μ χ B W T S S S + = Within class scatter: Between class scatter: Total scatter:

Illustration x2 S W = S 1 + S 2 S 1 S B S 2 x1

Mathematical Formulation (1) After projection: y = W k T x k Between class scatter (of y s): Within class scatter (of y s): ~ S ~ S B = W = W W T T S S B W W W

Mathematical Formulation (2) The desired projection: W opt ~ SB = arg max ~ = W S W arg max W W W T T S S B W W W How is it found? Generalized Eigenvectors S w = λ S w B i i W i,, i = 1 K Data dimension is much larger than the number of samples n >> N The matrix is singular: Rank( SW ) N c S W m

Fisherface (PCA+FLD) Project with PCA to N c space z = W k T pca x k W pca = arg maxw W T S T W Project with FLD to c 1 space y = W k T fld z k W fld = arg max W W W T T W W T pca T pca S S B W W W pca pca W W

Illustration of FisherFace Fisherface

Results: Eigenface vs. Fisherface (1) Input: Train: Test: 160 images of 16 people 159 images 1 image Variation in Facial Expression, Eyewear, and Lighting With glasses Without glasses 3 Lighting conditions 5 expressions

Eigenface vs. Fisherface (2)

discussion Removing the first three principal components results in better performance under variable lighting conditions The Firsherface methods had error rates lower than the Eigenface method for the small datasets tested.

Today s agenda Detection or Localizatoin 3. AdaBoost 4. Constellation model No localization 1. PCA & Eigenfaces 2. LDA & Fisherfaces Identification or Discrimination Categorization or Classification

Robust Face Detection Using AdaBoost Brief intro on (Ada)Boosting Viola & Jones, 2001 Weak detectors: Haar wavelets Integral image Cascade Exp. & Res. Reference: P. Viola and M. Jones (2001) Robust Real-time Object Detection, IJCV.

Discriminative methods Object detection and recognition is formulated as a classification problem. The image is partitioned into a set of overlapping windows and a decision is taken at each window about if it contains a target object or not. Where are the screens? Background Decision boundary Bag of image patches Computer screen In some feature space

A simple object detector with Boosting Download Toolbox for manipulating dataset Code and dataset Matlab code Gentle boosting Object detector using a part based model Dataset with cars and computer monitors http://people.csail.mit.edu/torralba/iccv2005/

Why boosting? A simple algorithm for learning robust classifiers Freund & Shapire, 1995 Friedman, Hastie, Tibshhirani, 1998 Provides efficient algorithm for sparse visual feature selection Tieu & Viola, 2000 Viola & Jones, 2003 Easy to implement, not requires external optimization tools.

Boosting Defines a classifier using an additive model: Strong classifier Features vector Weight Weak classifier

Boosting Defines a classifier using an additive model: Strong classifier Features vector Weight Weak classifier We need to define a family of weak classifiers from a family of weak classifiers

Boosting It is a sequential procedure: x t=1 x t=2 x t Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1

Toy example Weak learners from the family of lines Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1 h => p(error) = 0.5 it is at chance

Toy example Each data point has a class label: y t = +1 ( ) -1 ( ) and a weight: w t =1 This one seems to be the best This is a weak classifier : It performs slightly better than chance.

Toy example Each data point has a class label: +1 ( ) y t = -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example Each data point has a class label: +1 ( ) y t = -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example Each data point has a class label: +1 ( ) y t = -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example Each data point has a class label: +1 ( ) y t = -1 ( ) We update the weights: w t w t exp{-y t H t } We set a new problem for which the previous weak classifier performs at chance again

Toy example f 1 f 2 f 4 f 3 The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.

Real-time Face Detection Integral Image New image representation to compute the features very quickly AdaBoost Selecting a small number of important feature Cascade A method for combining classifiers Focussing attention on promising regions of the image Implemented on 700MHz Intel Pentium Ⅲ, face detection proceeds at 15f/s. Working only with a single grey scale image

Features Three kinds of rectangle features The sum of the pixels which lie within the white rectangles are subtracted from the sum of pixels in the gray rectangles

Integral Image ii( x, y) = i( x, y ) x x, y y The sum within D=4-(2+3)+1

Learning Classification Function (1) Selecting a small number of important features α 1 α 2 1.. ( x, y ) n n ( 1,1) x ( x2,1) ( x3,0) x4 (,0) ( 5,0) x 6 ( x,0) 2 Initialize weights w 1, i = 1 1, for yi 1, 0 respectively 2l 2m = # of face # of nonface

Learning Classification Function (2) 3 For t=1,.,t a. Normalize the weights w ti, w ti, n j = 1 w t, j 1 if f j ( x) >θ j hj ( x) = 0 otherwise b. For each feature, j ε = w h ( x ) y i j i j i i c. Choose the classifier, h t with the lowest error εt d. Update the weights 1 or ( 0) t+ 1, i = t, i t h x y w w β t i i εt βt = 1 ε t

Learning Classification Function (3) 4 The final strong classifier is T 1 1 α h ( x) h( x) 2 0 otherwise = t= 1 t t t= 1 t T α α = t log 1 β t The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors α 1 α 2

Learning Results 200 features Detection rate: 95% - false positive 1/14,804 The first and second features selected by AdaBoost Requiring 0.7 seconds to scan 384x288 pixel image

The Attentional Cascade Reject many of the negative sub-windows Two-feature strong classifier Detect: 100% False positive:40%

A Cascaded Detector f, d f, d f, d T T T 1 2 3 F F F F target

Detector Cascade Discussion The speed of the cascaded classifier is almost 10 times faster

Experimental Results (1) Training dataset Face training set: 4916 hand labeled faces Scaled and aligned to a base resolution of 24ⅹ24 Structure of the detector cascade 32layer, 4297 feature Training time for the entire 32 layer detector on the order of weeks 2 20 50 100 a single 466 MHz AlphaStation XP900 200 T T T T T 1 3 2 5 20 F F F Reject about 60% of non-faces Correctly detecting close to 100% of faces F F

Face Image Databases Databases for face recognition can be best utilized as training sets Each image consists of an individual on a uniform and uncluttered background Test Sets for face detection MIT, CMU (frontal, profile), Kodak

Training dataset: 4916 images

Experimental Results Test dataset MIT+CMU frontal face test set 130 images with 507 labeled frontal faces False detection AdaBoost Neural-net 10 78.3 83.2 31 85.2 86.0 -> not 50 significantly 65 78 95different 110 accuracy 88.8 89.8 90.1 90.8 91.1 167 91.8 -> but the cascade class. almost - - - 89.2-90.1 10 times faster 422 93.7 89.9 MIT test set: 23 images with 149 faces Sung & poggio: detection rate 79.9% with 5 false positive AdaBoost: detection rate 77.8% with 5 false positives

Today s agenda Detection or Localizatoin 3. AdaBoost 4. Constellation model No localization 1. PCA & Eigenfaces 2. LDA & Fisherfaces Identification or Discrimination Categorization or Classification

Parts and Structure Literature Fischler & Elschlager 1973 Yuille 91 Brunelli & Poggio 93 Lades, v.d. Malsburg et al. 93 Cootes, Lanitis, Taylor et al. 95 Amit & Geman 95, 99 et al. Perona 95, 96, 98, 00, 03 Huttenlocher et al. 00 Agarwal & Roth 02 etc

Deformations A B C D

Background clutter

Frontal faces

Face images

3D Object recognition Multiple mixture components

3D Orientation Tuning 100 Orientation Tuning % Correct 95 90 85 80 75 70 65 60 55 % Correct 50 0 20 40 60 80 100 angle in degrees Frontal Profile