ECE 661: Homework 10 Fall 2014

Similar documents
COS 429: COMPUTER VISON Face Recognition

Reconnaissance d objetsd et vision artificielle

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Face detection and recognition. Detection Recognition Sally

Example: Face Detection

PCA FACE RECOGNITION

CITS 4402 Computer Vision

2D Image Processing Face Detection and Recognition

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Boosting: Algorithms and Applications

Face recognition Computer Vision Spring 2018, Lecture 21

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

PCA and LDA. Man-Wai MAK

Pattern Recognition 2

Principal Component Analysis and Linear Discriminant Analysis

Machine Learning 11. week

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

Lecture 13 Visual recognition

Outline: Ensemble Learning. Ensemble Learning. The Wisdom of Crowds. The Wisdom of Crowds - Really? Crowd wiser than any individual

Joint Factor Analysis for Speaker Verification

Advanced Introduction to Machine Learning CMU-10715

Lecture: Face Recognition

CS 231A Section 1: Linear Algebra & Probability Review

Principal Component Analysis

Dimension Reduction and Low-dimensional Embedding

PCA and LDA. Man-Wai MAK

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Linear Dimensionality Reduction

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

Image Analysis. PCA and Eigenfaces

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Machine learning for pervasive systems Classification in high-dimensional spaces

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Face Recognition and Biometric Systems

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

A Statistical Analysis of Fukunaga Koontz Transform

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Two-Layered Face Detection System using Evolutionary Algorithm

Constructing Optimal Subspaces for Pattern Classification. Avinash Kak Purdue University. November 16, :40pm. An RVL Tutorial Presentation

Principal Component Analysis (PCA)

L11: Pattern recognition principles

HomeWork Assignment 7 ECE661

Discriminant Uncorrelated Neighborhood Preserving Projections

Keywords Eigenface, face recognition, kernel principal component analysis, machine learning. II. LITERATURE REVIEW & OVERVIEW OF PROPOSED METHODOLOGY

Data Mining and Analysis: Fundamental Concepts and Algorithms

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Iterative Laplacian Score for Feature Selection

Eigenfaces. Face Recognition Using Principal Components Analysis

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Machine Learning 2nd Edition

Regularized Discriminant Analysis and Reduced-Rank LDA

Motivating the Covariance Matrix

Introduction to Signal Detection and Classification. Phani Chavali

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Introduction to Machine Learning

Principal Component Analysis (PCA)

Gopalkrishna Veni. Project 4 (Active Shape Models)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

Lecture 17: Face Recogni2on

Visual Object Detection

CS 4495 Computer Vision Principle Component Analysis

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Fisher s Linear Discriminant Analysis

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Dimensionality Reduction

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Linear Classifiers as Pattern Detectors

5. Discriminant analysis

Deriving Principal Component Analysis (PCA)

Classification. Sandro Cumani. Politecnico di Torino

Unsupervised Learning: K- Means & PCA

Machine Learning 2017

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Face Detection and Recognition

Lec 12 Review of Part I: (Hand-crafted) Features and Classifiers in Image Classification

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

Distance Metric Learning

Lecture 17: Face Recogni2on

Methods for sparse analysis of high-dimensional data, II

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Image Analysis & Retrieval. CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W Lec 15. Graph Laplacian Embedding

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Linear & nonlinear classifiers

Face Recognition. Lecture-14

Problem Session #5. EE368/CS232 Digital Image Processing

STA 414/2104: Lecture 8

Machine Learning for Software Engineering

Transcription:

ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification; and (2) Object detection with the cascaded AdaBoost classifier. Part 1: Face Recognition using PCA/LDA face space and NN classification 1. Introduction For the images that we are dealing with, the dimensionality tends to be very high. Let a face image I(x,y) be a two-dimensional N by N array of intensity values. An image 2 may also be considered as a vector of dimension N, so that a typical image of size 128 by 128 becomes a vector of dimension 2 28, or equivalently, a point in 2 28 -dimensional space. An ensemble of images, then, maps to a collection of points in this huge space. Images of faces, being similar in overall configuration, will not be randomly distributed in this huge image space and thus can be described by a relatively low dimensional subspace. The main idea of the PCA/LDA analysis is to find the vector that best account for the distribution of face images within the entire image space. These vectors define the subspace of face images, which we call face space (see Figure 1). Each vector is 2 of length N, describes an N by N image, and is a linear combination of the original face images. Because these vectors are the eigenvectors of the covariance matrix corresponding to the original face images, and because they are face-like in appearance, they are referred to as Eigen Faces for PCA and Fisher Faces for LDA. (a) (b) Figure 1. Illustration of reduced dimensional face space (a) The face space and the three projected images on it. Here u1 and u2 are the eigenfaces. (b) The projected face from the training database. 1 of 15

The goal of this part is to classify an unknown face image given a database of labeled face images using two different approaches, i.e. Principle Component Analysis (PCA) and Linear Discriminant Analysis (LDA), to form a data subspace with reduced dimensions. 2. Method of Solution 2.1 PCA Eigen Faces Recognition In PCA, one would use the eigenvectors corresponding to the p largest eigenvalues of the covariance matrix to span the subspace. One needs to calculate the covariance matrix: N C = 1 N (x i m)(x i m) T i=1 where m is the mean vector of all training images x i and N is the number of training samples. The next step is to do the eigen-decomposition to get the eigenvectors and eigenvalues of the covariance matrix. If we want to construct a p-dimension subspace, we should use the eigenvectors corresponding to the p largest eigenvalues. However, since the dimension of covariance is very high, we need an algebraic trick to compute the eigenvalues and eigenvectors. Let X = [x 1 m x 2 m x N m] C = 1 N XXT What we need to compute is XX T u = λu, however XX T is huge. Instead we compute X T Xv = λ v where X T X is N by N. Then λ = λ and u = Xv, the p largest u are used as the new bases that span the subspace. Figure 2 shows the results of the mean face and top 8 eigenfaces using PLA and NN Neighbor classification. It can be found that instead of converting all images to gray level images, we keep the rgb color value which gives us more feature information than gray images only. 2 of 15

Figure 2. Mean face and top 8 PCA eigenfaces 2.2 LDA Fisher Faces Recognition For LDA, the main idea is to find the p-dimension subspace that maximize the distance between the sets of classes and minimize the distance within each set of the classes. The between class covariance matrix, S B and within class covariance matrix, S W, are defined as the following N C S B = 1 N C (m j m)(m j m) T j=1 where m j is the mean of class j, and m is the mean of all training data, N C is the number of classes. N C N j S W = 1 ( 1 (x N C N ij m j ) (x ij m j ) T ) j j=1 i=1 where N j is the number of elements of class j, x ij is the ith element of class j. In order 3 of 15

to find the p-dimensional subspace on which the images projects results in maximizing the between-class variance and minimizing the within-class variance, we want to maximize the fisher discriminant function defined as the following. J(w) = wt S B w w T S W w which can be converted to the equivalent problem of solving the following equation. S B w = λs W w I tried to use the method that professor told in class, which is finding eigenvectors S B w = λw, then compute w = S W 1 w. However, we know that S W is huge, and computing its inverse is nearly impossible. So there is no way but to use Yu and Yang s algorithm to directly solve the eigen-decomposition problem. i) Using the same trick to get the eigenvectors, eigenvalues of S B S B = Y T D B Y ii) Discard the smallest eigenvector and eigenvalue of from Y and D B iii) Construct Z as Z = YD B 1 2 iv) Define matrix S BW = Z T S W Z v) Using the same trick to get the eigenvectors, eigenvalues of S BW S BW = U T D W U vi) Sort the eigenvectors in the increasing order. vii) Define matrix W = ZU viii) The bases are the first p vectors of W Figure 3 shows the results of the mean face and top 8 eigenfaces using LDA and NN Neighbor classification. It can be found that instead of converting all images to gray level images, we keep the rgb color value which gives us more feature information than gray images only. 4 of 15

Figure 3. Mean face and top 8 LDA fisherfaces 2.3 Nearest Neighbor Classification After getting the p bases of the subspace, all training images and testing images are projected onto the subspace. The similarity score based on cosine mahalanobis distance is calculated between an input face image and each of the training images. Defining the accuracy of classification as: accuracy = # of corrected classified testing samples # of testing samples 5 of 15

Figure 4. An example of the similarity matrix Figure 5. ROC and CMC curves for PCA+MAHCOS 6 of 15

Figure 6. ROC and CMC curves for LDA+MAHCOS 7 of 15

1 0.9 accuracypca accuracylda 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 5 10 15 20 25 30 Figure 7. PCA/LDA accuracy versus subspace dimensions using gray images only Table 1. Some performance metrics rank one recognition rate equal error rate minimal half total error rate verification rate at 1% FAR PCA 3.33% 48.54% 48.04% 63.33% LDA 100% 0% 0% 100% 2.4 Discussion (1) In this homework, we use R,G,B information instead of gray image only. The merit is more useful information are kept to help improve the accuracy of face detection. By comparing my results with the results in previous homework which used gray image only (see Figure 5,6,7), I found that it is more efficient and accurate using more useful information for classification, however, the cost is as the features are tripled, the time efficiency is relatively lower. This challenge can be solved by using parallel computing, such as MPI or OpenMP in the future. (2) During feature matching process, Cosine Mahalanobis Distance, rather than Euclidean Distance, was adopted for Nearest neighbor classification. It is more intrinsic to measure similarity using Mahalanobis Distance, especially in abstract projected subspace, as the Euclidean Distance at this moment cannot represent the real differences 8 of 15

between two datasets. (3) PCA v.s. LDA By comparing the ROC and CMC curves (see Figure 5, 6 and Table 1), it can be found that LDA is much more efficient than PCA. The reason could be PCA does not separate the classes as well as LDA. 2.5 Source Codes (Matlab) See attached zip file. 9 of 15

Part 2: Object Detection with Cascaded AdaBoost Classification 1. Introduction This goal of this part is to design a car detector using the Viola and Jones approach. It is a two-class classification for each sliding window that scans through the image, i.e., cars or non-cars. The merit of using such approach is it can cascade multiple strong AdaBoost classifiers, each of which comes from several weak classifiers. 2. Method of Solution Figure 8 illustrates the general outline for cascaded AdaBoost classification. It consists of four main modules, which will be explained step by step in the following. No cars F F F 4 4 4 Test image AdaBoost 1 T AdaBoost 2 T... T AdaBoost i Cars 3 Combining into one strong classifier...... Until no false positive samples 2 Weak classifiers Iteratively weighting and threshold control Until false positive rate <0.5 1 Haar-like features Integral image Raw training images Figure 8. General process for Cascaded AdaBoost Classification 2.1. Computing the integral image (summed area table) for Haar-like features In this project, we use Haar-like features consisting of two types, vertical ones and horizontal ones. In order to calculate feature values efficiently, integral image (or summed area table) is employed (see both equation and Figure 9 below). S(i, j) = I(x, y) x i y j 10 of 15

where I(x, y) is the grayscale image pixel value at position (x, y). Figure 9. Illustration of how integral image works Then the Haar-like features can be computed efficiently using a few values from summed area table. f = S(x 1, y 1 ) + S(x 2, y 2 ) + 2S(x 3, y 3 ) 2S(x 4, y 4 ) S(x 5, y 5 ) + S(x 6, y 6 ) Figure 10. Haar-like edge features 2.2 Constituting weak classifiers by iteratively weighting and threshold control For each feature, the images are sorted according to their feature values in ascending order. Computing the following number for each threshold: e = min (S + + (T S ), S + (T + S + )) where S + and S are the sums of the weights of positive and negative samples from the smallest one to current threshold respectively. T + and T are the sums of all positive and negative samples respectively. If the former is less than the later, the decision rule is the following if feature value is less than the threshold, the data point is classified as positive and we say the polarity is 1. 11 of 15

Figure 11. Weighting linear combination of T hypotheses and threshold control The algorithm shown in Figure 11 can be described as following: T hypotheses are constructed each using a single feature. The final hypothesis is a weighted linear combination of the T hypotheses where the weights are inversely proportional to the training errors. Given example images (x1, y1),, (xn, yn) where yi = 0,1 for negative and positive examples respectively. n are the total training set number. Initialize weights w1,j= 1 2m, 1 2l number of negatives and positives respectively. For t=1,,t: 1) Normalize the weight, ϖ t,i = for yi =0, 1 respectively, where m and l are the ϖ t,i n j=1 ϖ t,i 2) Select the best weak classifier with respect to the weighted error ε i = min f,p,θ ϖ i h(x i, f, p, θ) y i i 3) Define ht(x)= h(x, f i, p t, θ t ), where ft, pt and θ t are the minimizers of ε t 4) Update the weights: 1 e W t+1,i W t,i β i t Where ei=0 if example xi is classified correctly, ei=1 otherwise, and ε i β t = 1 ε i 2.3 Combining weak classifiers to a single AdaBoost strong classifier The decision of whether a sample is positive or negative is determined by the following: 12 of 15

T C(x) = { 1 if α th t (x) threshold t=1 0 otherwise Note that the threshold is chosen to be the minimum of T t=1 α t h t (x) for all positive training data being classified as positive. The procedure of finding best features continues until the false positive rate is less than 0.5 (see Figure 12). <0.5, stop iteration <0.5, stop iteration Figure 12. Output of the iteratively features selection process 2.4 Cascading AdaBoost classifiers for car detection The purpose of cascading classifiers is to lower the false positive rate. We feed all samples that have been classified as positive to the next classifier. By adding more stages, there will be less negative sample being classified as positive. We stop adding stages when all negative samples are correctly classified (see Figure 8 module 4). 13 of 15

3. Testing Procedure From the training process, we acquired all the parameters of the cascaded classifiers. Feed the query sample to the first classifier, if it is classified as negative, then it is done; if it is classified as positive, we feed this sample into the next classifier. Negative Sample # for Stage next iteration Table 2. Log of training process Feature # False Positive Rate week classifier 1 862 9 0.4903 2 316 14 0.3666 3 141 7 0.4462 4 65 9 0.4610 5 26 10 0.4000 6 6 5 0.2308 7 1 3 0.3333 8 0 1 0 False Positive Rate 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 Figure 13. Accumulative false-positive rate for training process 14 of 15

Table 3. Log of testing process Stage False False Cumulative False Cumulative False Positive # Negative # Positive Rate Negative Rate 1 209 4 0.4750 0.0225 2 87 11 0.1977 0.0843 3 83 1 0.1886 0.0899 4 74 4 0.1682 0.1124 5 52 20 0.1182 0.2247 6 35 12 0.0796 0.2921 7 35 0 0.0796 0.2921 8 209 0 0.0796 0.2921 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 False Positive False Negative 1 2 3 4 5 6 7 8 Figure 14. Accumulative false-positive and false-negative rate for test process 4. Source Codes (Matlab) See attached zip code, it was adapted based on the previous codes of student Sirui Hu and Chyuan-Tyng Wu. 15 of 15