A Novel PCA-Based Bayes Classifier and Face Analysis

Similar documents
Bayes Decision Theory

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

CSC411 Fall 2018 Homework 5

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Multiple Similarities Based Kernel Subspace Learning for Image Classification

PCA and LDA. Man-Wai MAK

When Fisher meets Fukunaga-Koontz: A New Look at Linear Discriminants

5. Discriminant analysis

L11: Pattern recognition principles

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Digital Image Processing Lectures 13 & 14

Motivating the Covariance Matrix

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Probabilistic & Unsupervised Learning

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Clustering VS Classification

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Introduction to Machine Learning

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Bayesian Decision and Bayesian Learning

Lecture 3: Pattern Classification

Multivariate Statistical Analysis

L5: Quadratic classifiers

Regularized Discriminant Analysis and Reduced-Rank LDA

Machine Learning 2nd Edition

Minimum Error Rate Classification

Probabilistic Latent Semantic Analysis

Announcements (repeat) Principal Components Analysis

High Dimensional Discriminant Analysis

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

PCA and LDA. Man-Wai MAK

Example: Face Detection

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

A Unified Bayesian Framework for Face Recognition

Discriminant analysis and supervised classification

STA 4273H: Statistical Machine Learning

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Bayesian Decision Theory

CS281 Section 4: Factor Analysis and PCA

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

CITS 4402 Computer Vision

Support Vector Machines

Introduction to Machine Learning

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

STA 414/2104: Lecture 8

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Principal Component Analysis

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Real Time Face Detection and Recognition using Haar - Based Cascade Classifier and Principal Component Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Advanced Introduction to Machine Learning CMU-10715

Introduction to Machine Learning

Pattern Recognition 2

Generative Model (Naïve Bayes, LDA)

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

COM336: Neural Computing

STA 414/2104: Machine Learning

SGN (4 cr) Chapter 5

p(x ω i 0.4 ω 2 ω

Principal Components Theory Notes

Introduction to Machine Learning

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Lecture 3: Review of Linear Algebra

Review (Probability & Linear Algebra)

Lecture 3: Review of Linear Algebra

Previously Monte Carlo Integration

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

MSA220 Statistical Learning for Big Data

Linear Regression and Discrimination

c Springer, Reprinted with permission.

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Principal Component Analysis (PCA)

Factor Analysis and Kalman Filtering (11/2/04)

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

COMP 408/508. Computer Vision Fall 2017 PCA for Recognition

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach

Lecture 13 Visual recognition

A Statistical Analysis of Fukunaga Koontz Transform

STA414/2104 Statistical Methods for Machine Learning II

Notes on Implementation of Component Analysis Techniques

Maximum variance formulation

Bayesian Decision Theory

LECTURE NOTE #10 PROF. ALAN YUILLE

Transcription:

A Novel PCA-Based Bayes Classifier and Face Analysis Zhong Jin 1,2, Franck Davoine 3,ZhenLou 2, and Jingyu Yang 2 1 Centre de Visió per Computador, Universitat Autònoma de Barcelona, Barcelona, Spain zhong.in@cvc.uab.es 2 Department of Computer Science, Naning University of Science and Technology, Naning, People s Republic of China yyang@mail.nust.edu.cn 3 HEUDIASYC - CNRS Mixed Research Unit, Compiègne University of Technology, 60205 Compiègne cedex, France franck.davoine@hds.utc.fr Abstract. The classical Bayes classifier plays an important role in the field of pattern recognition. Usually, it is not easy to use a Bayes classifier for pattern recognition problems in high dimensional spaces. This paper proposes a novel PCA-based Bayes classifier for pattern recognition problems in high dimensional spaces. Experiments for face analysis have been performed on CMU facial expression image database. It is shown that the PCA-based Bayes classifier can perform much better than the minimum distance classifier. And, with the PCA-based Bayes classifier, we can obtain a better understanding of data. 1 Introduction In recent years, many approaches have been brought to bear on pattern recognition problems in high dimensional space. Such high-dimensional problems occur frequently in many applications, including face recognition, facial expression analysis, handwritten numeral recognition, information retrieval, and contentbased image retrieval. The main approach applies an intermediate dimension reduction method, such as principal component analysis (PCA), to extract important components for linear discriminant analysis (LDA) [1, 2]. PCA is a classical, effective and efficient data representation technique. It involves a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components. The classical Bayes classifier plays an important role in statistical pattern recognition. Usually, it is not easy to use a Bayes classifier for pattern recognition problems in high dimensional space. The difficulty is in solving the singularity of covariance matrices since pattern recognition problems in high dimensional spaces are usually so-called undersampled problems. D. Zhang and A.K. Jain (Eds.): ICB 2006, LNCS 3832, pp. 144 150, 2005. c Springer-Verlag Berlin Heidelberg 2005

A Novel PCA-Based Bayes Classifier and Face Analysis 145 In this paper, we seek a PCA-based Bayes classifier by combining PCA technique and Bayesian decision theory. It is organized as follows. Section 2 gives an introduction to Bayesian decision theory. A PCA-based Bayes classifier is proposed in Section 3. Experiments for faceanalysisareperformedinsection4. Finally, conclusions are given in Section 5. 2 Bayesian Decision Theory Bayesian decision theory is fundamental in statistical pattern recognition. 2.1 Minimum-Error-Rate Rule Let {ω 1,,ω c } be the finite set of c states of nature ( categories ). Let the feature vector x be a d-dimensional vector-values random variable and let p(x ω ) be the state-conditional probability density function for x, with the probability density function for x conditioned on ω being the true state of nature. Let P (ω ) describe the prior probability that nature is in state ω. The target is to make a decision for the true state of nature. It is natural to seek a decision rule that minimizes the probability of error, that is, the error rate. The Bayes decision rule to minimize the average probability error calls for making a decision that maximizes the posterior probability P (ω i x). It can formally be written as the argument i that maximizes the posterior probability P (ω i x), that is, x ω i with i =argmaxp (ω x). (1) The structure of a Bayes classifier is determined by the conditional densities p(x ω ) as well as by the prior probabilities P (ω ). Under the assumption of the same prior probabilities P (ω )( =1,,c) for all the c classes, the minimumerror-rate rule of Eq. (1) can be achieved by use of the state-conditional probability density function p(x ω ) as follows x ω i with i =argmaxp(x ω ). (2) Of the various density functions that have been investigated, none has received more attention than the multivariate normal or Gaussian density. In this paper, it is assumed that p(x ω ) is a multivariate normal density in d dimensions as follows [ 1 p(x ω )= exp 1 ] (2π) d/2 Σ 1/2 2 (x µ ) t Σ 1 (x µ ), (3) where µ is the d-component mean vector and Σ is the d d covariance matrix. 2.2 Minimum Distance Classifier The simplest case occurs when the features are statistically independent and when each feature has the same variance, σ 2. In this case, the covariance matrix

146 Z. Jin et al. is diagonal, being merely σ 2 times the identity matrix I, thatis Σ = σ 2 I ( =1,,c). (4) Thus, the minimum-error-rate rule of Eqs. (2),(3) and (4) can be expressed as follows x ω i with i =argmin x µ 2, (5) where denotes the Euclidean norm. This is the commonly used minimum distance classifier. 2.3 Limitation of Bayes Classifier In a high dimensional space, some classes may lie on or near a low dimensional manifold. In other words, for some classes, the covariance matrices Σ may be singular in high dimensional space. Such a limitation exists even in 2-dimensional spaces. A two-class problem is shown in Fig. 1. In the example, one class de- Fig. 1. A two-class problem generates in a 1-dimensional line so that the Bayes classifier can not directly be used to perform classification. Anyway, the minimum distance classifier can be used to perform classifications. However, we fail to have a correct understanding of data since the constraint of Eq. (4) is not satisfied in the two-class problem. 3 PCA-Based Bayes Classifier One solution to the above limitation of Bayes classifier is to describe Gaussian density of Eq. (3) by using principal component analysis (PCA). We are going to propose a novel PCA-based Bayes classifier in this section.

3.1 PCA Model A Novel PCA-Based Bayes Classifier and Face Analysis 147 Let Ψ =(ψ 1,,ψ d ) be the matrix whose columns are the unit-norm eigenvectors of the covariance matrix Σ of Eq. (3). Let Λ = diag(λ 1,,λ d ) be the diagonal matrix of the eigenvalues of Σ,whereλ i are the eigenvalues corresponding to the eigenvectors ψ i (i =1,,d). We have Ψ t Σ Ψ = Λ. (6) If the covariance matrix Σ is non-singular, all the corresponding eigenvalues are positive. Otherwise, some eigenvalues may be zero. In general, assume that λ i (i =1,,d) are ranked in order from larger to smaller as follows: λ 1 λ d >λ (d +1) = = λ d =0, (7) where d is the number of non-zero eigenvalues of the covariance matrix Σ. Recently, a perturbation approach has been proposed [3]. However, for practical application problems, the dimension d may be too high to obtain all the d eigen-vectors. 3.2 Novel Perturbation Approach Assume that all the eigen-vectors corresponding to non-zero eigenvalues are available. Let z =(ψ 11,,ψ 1d1,,ψ c1,,ψ cdc ) t x. (8) This is a linear transformation from the original d-dimensional x spacetoanew d-dimensional z space, where c d = d. (9) Suppose =1 d <d. (10) Thus, the new z space can be regarded as a compact space of the original x space. Instead of the Bayes classifier of Eq. (2) in x space, a Bayes classifier can be introduced in the z space x ω i with i =argmaxp(z ω ). (11) Obviously, p(z ω 1 ) has formally a Gaussian distribution since the transformation of Eq. (8) is linear. We are going to propose a novel perturbation approach to determine p(z ω 1 ) in the rest of this section. Conditional Distribution p(z ω 1 ). We know that (ψ 11,,ψ 1d1 )areeigenvectors corresponding to the non-zero eigenvalues of the covariance matrix Σ 1. In general, the d d 1 eigen-vectors (ψ 21,,ψ 2d2,,ψ c1,,ψ cdc )arenot the eigen-vectors corresponding to zero eigenvalues of the covariance matrix Σ 1.

148 Z. Jin et al. Firstly, let (ξ 1,,ξ d) (ψ 11,,ψ 1d1,ψ 21,,ψ 2d2,,ψ c1,,ψ cdc ). (12) Then, perform the Gram-Schmit orthogonalization for each ( =2,, d) as follows. 1 ξ ξ (ξξ t i )ξ i, (13) i=1 ξ ξ / ξ. (14) Giving that (ψ 11,,ψ 1d1,ψ 21,,ψ 2d2,,ψ c1,,ψ cdc ) has a rank of d, that is, these eigen-vectors are linearly independent, the Gram-Schmit orthogonalization of Eqs. (12-14) is a linear transformation (ξ 1,,ξ d) =A(ψ 11,,ψ 1d1,,ψ c1,,ψ cdc ), (15) where A is a non-singular upper triangle d d matrix. Theorem 1. Let y =(ξ 1,,ξ d) t x. (16) The covariance matrix of p(y ω 1 ) is a diagonal matrix diag(λ 11,,λ 1d1, 0,, 0). (17) The proof of Theorem 1 is omitted here. Denote the diagonal elements of the covariance matrix in Eq. (17) as λ i (i = 1,, d). By changing the zero-diagonal elements of the covariance matrix in Eq. (17) with a perturbation factor ε, thatis we can determine p(y ω 1 ) as follows p(y ω 1 )= λ d1+1 = = λ d = ε, (18) d i=1 1 exp [ (y i µ i ) 2 ], (19) (2π λ i ) 1 2 2λ i where µ i = ξi t µ 1 (i =1,, d). (20) From Eqs. (8), (15) and (16), we have z = A 1 y. (21) Then, a novel perturbation approach to determine p(z ω 1 ) can be proposed p(z ω 1 )=p(y ω 1 ) A 1, (22) where A 1 is the determinant of the inverse matrix of A.

A Novel PCA-Based Bayes Classifier and Face Analysis 149 Conditional Distribution p(z ω ). It is now ready to propose an algorithm to determine the conditional distribution p(z ω )( =2,,c). Step 1. Initialize (ξ 1,,ξ d) firstly by assigning d eigen-vectors of the covariance matrix Σ andthenbyassigningalltheother d d eigen-vectors of the covariance matrix Σ i (i ), that is, (ξ 1,,ξ d) (ψ 1,,ψ d,ψ 11,,ψ 1d1,,ψ c1,,ψ cdc ). (23) Step 2. Perform the Gram-Schmit orthogonalization according to Eqs. (13) and (14). Thus, we obtain the matrix A in Eq. (15). Step 3. Substitute (λ 1,,λ d )for(λ 11,,λ 1d1 ) in Eq. (17). Substitute d for d 1 in Eq. (18). Substitute µ for µ 1 in Eq. (20). Thus, we can obtain the conditional distribution p(y ω ) by performing the transformation of Eq. (16) and substituting ω for ω 1 in Eq. (19). Step 4. Obtain the conditional distribution p(z ω ) by substituting ω for ω 1 in Eq. (22). 4 Experiments In this section, experiments for face analysis have been performed on CMU facial expression image database to test the effectiveness of the proposed PCA-based Bayes classifier. From CMU-Pittsburgh AU-Coded Facial Expression Database [4], 312 facial expression mask images can be obtained by using a spatial adaptive triangulation technique based on local Gabor filters [5]. Six facial expressions are concerned as follows: anger, disgust, fear, oy, unhappy, and surprise. For each expression, there are 52 images with a resolution of 55 59, the first 26 images of which have moderate expressions while the last 26 images of which have intensive expressions. In experiments, for each expression, the first k(k =5, 10, 15, 20, 25) images are for training and all the other images are for test. Experiments have been performed by using the proposed PCA-based Bayes classifier and the minimum distance classifier, respectively. Experimental results with different k are listed in Table 1. From Table 1, we can see that the proposed PCA-based Bayes classifier performs obviously better than the minimum distance classifier. As the number of Table 1. Classification rates on CMU facial expression image database Images k Minimum distance PCA-based Bayes 5 25.53% 27.30% 10 29.76% 63.89% 55 59 15 50.45% 73.87% 20 59.38% 88.02% 25 64.02% 95.68%

150 Z. Jin et al. training samples k increases, the classification rate by using the proposed classifier increases much faster than that by using the minimum distance classifier. It means that the proposed classifier can perform much more efficient than the minimum distance classifier. 5 Conclusions In this paper, we have proposed a novel PCA-based Bayes classifier in high dimensional spaces. Experiments for face analysis have been performed on CMU facial expression image database. It is shown that the proposed classifier performs much better than the minimum distance classifier. With the proposed classifier, we can not only improve the classification rate, but also obtain a better understanding of data. Acknowledgements This work was supported by Ramón y Caal research fellowship from the Ministry of Science and Technology, Spain and the National Natural Science Foundation of China under Grant No. 60473039. References 1. K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990. 2. Zhong Jin, Jingyu Yang, Zhongshan Hu, and Zhen Lou. Face recognition based on the uncorrelated discriminant transformation. Pattern Recognition, 34(7):1405-1416, 2001. 3. Z. Jin, F. Davoine, and Z. Lou. An effective EM algorithm for PCA mixture model. In Structural, Syntactic and Statistical Pattern Recongnition, volume 3138 of Lecture Notes in Computer Science, pp. 626-634, Lisbon, Portugal, Aug. 18-20 2004. Springer. 4. Takeo Kanade, Jeffrey F. Cohn, and Yingli Tian. Comprehensive database for facial expression analysis. In Proceedings of the Fourth International Conference of Face and Gesture Recognition, pages 46-53, Grenoble, France, 2000. 5. S. Dubuisson, F. Davoine, and M. Masson. A solution for facial expression representation and recognition. Signal Processing: Image Communication, 17(9):657-673, 2002.